Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 - - PowerPoint PPT Presentation

applications in visual object tracking
SMART_READER_LITE
LIVE PREVIEW

Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 - - PowerPoint PPT Presentation

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking Yuanwei Wu 10-21-2016 1 Outline Siamese Architecture Siamese Applications in Computer Vision Paper review Visual Object Tracking using Siamese


slide-1
SLIDE 1

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking

Yuanwei Wu 10-21-2016

1

slide-2
SLIDE 2

Outline

  • Siamese Architecture
  • Siamese Applications in Computer Vision
  • Paper review

 Visual Object Tracking using Siamese CNN

  • Future Work

2

slide-3
SLIDE 3

What does “Siamese” mean?

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 3

slide-4
SLIDE 4

Siamese Architecture

Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf 4

slide-5
SLIDE 5

Siamese Architecture and loss function

Source: Learning Hierarchies of Invariant Features. Yann LeCun. helper.ipam.ucla.edu/publications/gss2012/gss2012_10739.pdf 5

slide-6
SLIDE 6

Siamese Applications in Computer Vision:

  • 1. Signature Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 6

slide-7
SLIDE 7

Siamese Applications in Computer Vision:

  • 2. Dimensionality Reduction

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 7

slide-8
SLIDE 8

Siamese Applications in Computer Vision: 3.1 Learning Image Descriptors

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 8

CNN Model

slide-9
SLIDE 9

Siamese Applications in Computer Vision: 3.2 Learning Image Descriptors

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 9

slide-10
SLIDE 10

Siamese Applications in Computer Vision: 4.1 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 10

slide-11
SLIDE 11

Siamese Applications in Computer Vision: 4.2 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 11

slide-12
SLIDE 12

Siamese Applications in Computer Vision: 4.3 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 12

slide-13
SLIDE 13

Siamese Applications in Computer Vision: 4.4 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 13

slide-14
SLIDE 14

Siamese Applications in Computer Vision: 4.5 Face Verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 14

slide-15
SLIDE 15

@article{bertinetto2016fully, title={Fully-Convolutional Siamese Networks for Object Tracking}, author={Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS}, journal={arXiv preprint arXiv:1606.09549}, year={2016} }

Paper Review: Fully-Convolutional Siamese Networks for Object Tracking

15

slide-16
SLIDE 16

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

Architecture of Siamese CNN

16

slide-17
SLIDE 17

Details of the Architecture of Siamese CNN

Source: 1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

1.

17

slide-18
SLIDE 18

Details of the Architecture of Siamese CNN

Source: 1: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012. 2: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

1. 2.

18

Cross-correlation layer

slide-19
SLIDE 19

Training: dataset

  • ImageNet Video dataset of 2015:

 contains ~4000 videos  with ~1 million annotated frames

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 19

slide-20
SLIDE 20

Training: preprocessing on the images

  • Preprocessing: 2820 videos, examplar image: 127 x 127,

search image: 255 x 255

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 20

slide-21
SLIDE 21

Training: recap the steps

  • ImageNet Video dataset of 2015:

 contains ~4000 videos  with ~1 million annotated frames

  • Preprocessing:

2820 videos  examplar image: 127 x 127 search image: 255 x 255

  • Training with a standard Stochastic Gradient

Descent (SGD) solver using MathConvNet

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 21

slide-22
SLIDE 22

Training: loss function

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  • Employing a discriminative training approach

using positive and negative pairs and adopting the logistic loss:

22

slide-23
SLIDE 23

Training: loss function

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  • Employing a discriminative training approach

using positive and negative pairs and adopting the logistic loss:

  • The loss of a score map is the mean of the

individual losses:

23

slide-24
SLIDE 24

Training: loss function

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016.

  • Employing a discriminative training approach

using positive and negative pairs and adopting the logistic loss:

  • The loss of a score map is the mean of the

individual losses:

  • Applying SGD to find the conv-net Ѳ using

24

slide-25
SLIDE 25

Tracking algorithm

  • Use a search image centered at the previous

position of the target.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 25

slide-26
SLIDE 26

Tracking algorithm

  • Use a search image centered at the previous

position of the target.

  • Only search for the object within a region of

approximately four times its previous size.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 26

slide-27
SLIDE 27

Tracking algorithm

  • Use a search image centered at the previous

position of the target.

  • Only search for the object within a region of

approximately four times its previous size.

  • A cosine window is added to the score map to

penalize large displacements.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 27

slide-28
SLIDE 28

Tracking algorithm

  • Use a search image centered at the previous

position of the target.

  • Only search for the object within a region of

approximately four times its previous size.

  • A cosine window is added to the score map to

penalize large displacements.

  • The position of the maximum score relative to the

center of the score map, multiplied by the stride

  • f the network, gives the displacement of the

target from frame to frame.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 28

slide-29
SLIDE 29

Experiments: training dataset size

  • Accuracy: is calculated as the average

Intersection-over-Union (IoU)

  • Robustness: in terms of the total number of

failures

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 29

slide-30
SLIDE 30

Experiments: training dataset size

  • Accuracy: is calculated as the average Intersection-
  • ver-Union (IoU)
  • Robustness: in terms of the total number of failures
  • Using a larger video dataset could increase the

performance even further.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 30

slide-31
SLIDE 31

Experiments: OTB13 benchmark results

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 31

slide-32
SLIDE 32

Experiments: VOT15 benchmark results

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 32

slide-33
SLIDE 33

Experiments: VOT15 benchmark results

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 33

slide-34
SLIDE 34

Experiments: VOT15 benchmark results

  • Estimates the new position of the target object by merely cross-

correlating the embeddings of two patches over three scales.

  • Achieves real-time performance and state-of-the-art results.

Source: Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\~a}o F and Vedaldi, Andrea and Torr, Philip HS, fully-Convolutional Siamese Networks for Object Tracking, arXiv preprint, 2016. 34

slide-35
SLIDE 35

Future work: How to improve the performance?

  • By augmenting the online tracking pipeline:

 online model updating (i.e. tracking-by-detection)  bounding-box regression (i.e. YOLO, Faster-CNN)  fine-tuning (i.e. correlation filters + CNN features)  memory (i.e. add RNN, LSTM)

35

slide-36
SLIDE 36

Source: Guanghan Ning, Zhi Zhang, Chen Huang, Zhihai He, Xiaobo Ren, Haohong Wang, Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking, arXiv preprint, 2016. 36

slide-37
SLIDE 37

Future work: How to improve the performance?

  • By augmenting the online tracking pipeline:

 online model updating (i.e. tracking-by-detection)  bounding-box regression (i.e. YOLO, Faster-CNN)  fine-tuning (i.e. correlation filters + CNN features)  memory (i.e. add RNN, LSTM)

  • By introducing new architecture in the

framework of Siamese CNN, need to dig deeply in the structure of networks (i.e. regression network, triplet network).

37

slide-38
SLIDE 38

Triplet Network

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf 38

slide-39
SLIDE 39

Future work: How to improve the performance?

  • By augmenting the online tracking pipeline:

 online model updating (i.e. tracking-by-detection)  bounding-box regression (i.e. YOLO, Faster-CNN)  fine-tuning (i.e. correlation filters + CNN features)  memory (i.e. add RNN, LSTM)

  • By introducing new architecture in the framework
  • f Siamese CNN, need to dig deeply in the

structure of networks (i.e. regression network, triplet network).

  • By introducing new loss function is Siamese

network.

39

slide-40
SLIDE 40

40

Loss function used in face verification

Source: http://vision.ia.ac.cn/zh/senimar/reports/Siamese-Network-Architecture-and-Applications-in-Computer-Vision.pdf

slide-41
SLIDE 41

Thank you!

41