Pictorial structures for object recognition Josef Sivic - - PowerPoint PPT Presentation

pictorial structures for object recognition
SMART_READER_LITE
LIVE PREVIEW

Pictorial structures for object recognition Josef Sivic - - PowerPoint PPT Presentation

Pictorial structures for object recognition Josef Sivic http://www.di.ens.fr/~josef Equipe-projet WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire dInformatique, Ecole Normale Suprieure, Paris With slides from: A. Zisserman, M. Everingham and


slide-1
SLIDE 1

Pictorial structures for object recognition

Josef Sivic

http://www.di.ens.fr/~josef Equipe-projet WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d’Informatique, Ecole Normale Supérieure, Paris With slides from: A. Zisserman,

  • M. Everingham and P. Felzenszwalb
slide-2
SLIDE 2

Pictorial Structure

  • Intuitive model of an object
  • Model has two components
  • 1. parts (2D image fragments)
  • 2. structure (configuration of parts)
  • Dates back to Fischler & Elschlager 1973
slide-3
SLIDE 3
  • R. Fergus, P. Perona and A. Zisserman,

Object Class Recognition by Unsupervised Scale-Invariant Learning, CVPR 2003

Recall : Generative part-based models (Lecture 7)

slide-4
SLIDE 4

[Felsenszwalb et al. 2009]

Recall: Discriminative part-based model (Lecture 9)

slide-5
SLIDE 5

Localize multi-part objects at arbitrary locations in an image

  • Generic object models such as person or car
  • Allow for articulated objects
  • Simultaneous use of appearance and spatial information
  • Provide efficient and practical algorithms

To fit model to image: minimize an energy (or cost) function that reflects both

  • Appearance: how well each part matches at given location
  • Configuration: degree to which parts match 2D spatial layout
slide-6
SLIDE 6

Example: cow layout

slide-7
SLIDE 7

H T L1 Each vertex corresponds to a part - ‘Head’, ‘Torso’, ‘Legs’ 1

Assign a label to each vertex from H = {positions}

Graph G = (V,E) L2 L3 L4

Example: cow layout

Edges define a TREE

slide-8
SLIDE 8

2 Each vertex corresponds to a part - ‘Head’, ‘Torso’, ‘Legs’

Assign a label to each vertex from H = {positions}

Graph G = (V,E) Edges define a TREE H T L1 L2 L3 L4

Example: cow layout

slide-9
SLIDE 9

3 Each vertex corresponds to a part - ‘Head’, ‘Torso’, ‘Legs’

Assign a label to each vertex from H = {positions}

Graph G = (V,E) Edges define a TREE H T L1 L2 L3 L4

Example: cow layout

slide-10
SLIDE 10

Cost of a labelling L : V  H Unary cost : How well does part match image patch? Pairwise cost : Encourages valid configurations

Find best labelling L*

Graph G = (V,E) 3 H T L1 L2 L3 L4

Example: cow layout

slide-11
SLIDE 11

Find best labelling L* by minimizing energy:

Graph G = (V,E) 3 H T L1 L2 L3 L4

Example: cow layout

slide-12
SLIDE 12

The General Problem

b a e d c f

Graph G = ( V, E ) Discrete label set H = {1,2,…,h}

Assign a label to each vertex L: V  H

1 1 2 2 2 3

Cost of a labelling E(L) Unary Cost + n-nary cost (depends on the size of maximal cliques of the graph)

Find L* = arg min E(L)

[Bishop, 2006]

slide-13
SLIDE 13

Computational Complexity

e.g. h = number of pixels (512x300) ≈ 153600 Fitting

|H||V| = hn

n parts h positions

slide-14
SLIDE 14

Different graph structures

Fully connected 1 3 4 5 6 2

O(hn)

1 3 4 5 6 2 Star structure

O(nh2)

1 3 4 5 6 2 Tree structure

O(nh2)

n parts h positions (e.g. every pixel for translation) Can use dynamic programming

slide-15
SLIDE 15

Brute force solutions intractable

  • With n parts and h possible discrete locations per part, O(hn)
  • For a tree, using dynamic programming this reduces to O(nh2)

If model is a tree and has quadratic edge costs then complexity reduces to O(nh) (using a distance transform)

Felzenszwalb & Huttenlocher, IJCV, 2004

slide-16
SLIDE 16

Distance transforms for DP

slide-17
SLIDE 17

Special case of DP cost function

Distance transforms

  • O(nh2)  O(nh) for DP cost functions
  • Assume model is quadratic, i.e.
slide-18
SLIDE 18

x1 x2 a b

slide-19
SLIDE 19

x1 x2 Felzenszwalb and Huttenlocher ’05 For each x2

  • Finding min over x1 is equivalent finding minimum over set of offset parabolas
  • Lower envelope computed in O(h) rather than O(h2) via distance transform
slide-20
SLIDE 20

x1 x2 Felzenszwalb and Huttenlocher ’05 For each x2

  • Finding min over x1 is equivalent finding minimum over set of offset parabolas
  • Lower envelope computed in O(h) rather than O(h2) via distance transform
slide-21
SLIDE 21

x1 x2 Felzenszwalb and Huttenlocher ’05 For each x2

  • Finding min over x1 is equivalent finding minimum over set of offset parabolas
  • Lower envelope computed in O(h) rather than O(h2) via distance transform
slide-22
SLIDE 22
slide-23
SLIDE 23

1D Examples f(p) Df(q) p, q p, q

slide-24
SLIDE 24

1D Examples f(p) Df(q) p, q p, q

slide-25
SLIDE 25

Algorithm is non-examinable

slide-26
SLIDE 26

“Lower Envelope” Algorithm

Add first Add second Try adding third Remove second Try again and add

slide-27
SLIDE 27

Algorithm for Lower Envelope

  • Quadratics ordered left to right
  • At step j consider adding j-th quadratic to LE of first j-1 quadratics
  • Maintain two ordered lists

> Quadratics currently visible on LE > Intersections currently visible on LE

  • Compute intersection of j-th quadratic and rightmost quadratic visible on LE

> If to right of rightmost visible intersection, add quadratic and intersection to lists > If not, this quadratic hides at least rightmost quadratic, remove it and try again Code available online: http://people.cs.uchicago.edu/~pff/dt/

slide-28
SLIDE 28

Running Time of LE Algorithm

Considers adding each of h quadratics just once

  • Intersection and comparison constant time
  • Adding to lists constant time
  • Removing from lists constant time

> But then need to try again

Simple amortized analysis

  • Total number of removals O(h)

> Each quadratic once removed never considered for removal again

Thus overall running time O(h)

slide-29
SLIDE 29

Example: facial feature detection in images

  • Parts V= {v1, … vn}
  • Connected by springs in star configuration to nose
  • Quadratic cost for spring

high spring cost

1 - NCC with appearance template Spring extension from v1 to vj

v3 Model v1 v2 v4

slide-30
SLIDE 30

Appearance templates and springs

Each li=(xi, yi) ranges over h (x,y) positions in the image Requires pair wise terms for correct detection

slide-31
SLIDE 31

Fitting the model to an image

Find the configuration with the lowest energy

?

Model v1 v2 v4 v3

slide-32
SLIDE 32

Fitting the model to an image

Find the configuration with the lowest energy

?

Model v1 v2 v4 v3

slide-33
SLIDE 33

Fitting the model to an image

Find the configuration with the lowest energy

?

Model v1 v2 v4 v3

slide-34
SLIDE 34

Notation

slide-35
SLIDE 35

v1 v2 v4

slide-36
SLIDE 36

where

slide-37
SLIDE 37

Visualization: Compute part matching cost (dense)

Input image

Compute matching cost for each pixel

Nose Left eye Mouth Right eye Mouth

slide-38
SLIDE 38

Visualization: Combine appearance with relative shape

Part matching cost

  • 1. Nose
  • 2. Left eye
  • 3. Right eye
  • 4. Mouth

Combined matching cost

+

(Shifted) distance transform of

=

slide-39
SLIDE 39

Part matching cost

  • 1. Nose
  • 2. Left eye
  • 3. Right eye
  • 4. Mouth

(Shifted) distance transform of Combined matching cost

+

The best part configuration

Visualization: Combine appearance with relative shape

=

slide-40
SLIDE 40

Combine appearance with relative shape

The distance transform can be computed separately for rows and columns of the image (i.e. is “separable”), which results in the O(hn) running time Given the best location of the reference location (root), locations of leafs can be found by “back-tracking” (here

  • nly one level).

Simple part based face model demo code [Fei Fei, Fergus, Torralba]: http://people.csail.mit.edu/torralba/shortCourseRLOC/

slide-41
SLIDE 41

Example

slide-42
SLIDE 42

Example of a model with 9 parts

The goal: Localize facial features in faces output by face detector Support parts-based face descriptors Provide initialization for global face descriptors

Code available online: http://www.robots.ox.ac.uk/~vgg/research/nface/index.html

slide-43
SLIDE 43

Classifier for each facial feature

  • Linear combination of thresholded simple image filters

(Viola/Jones) trained discriminatively using AdaBoost

  • Applied in “sliding window” fashion to patch around every pixel
  • Similar to Viola&Jones face detector – see lecture 6

Ambiguity e.g. due to facial symmetry Resolve ambiguity using spatial model.

Classifier

Example of a model with 9 parts

slide-44
SLIDE 44

Results

Nine facial features, ~90% predicted positions within 2 pixels in 100×100 face image

slide-45
SLIDE 45

Results

slide-46
SLIDE 46

Each part represented as rectangle

  • Fixed width, varying length, uniform colour
  • Learn average and variation

> Connections approximate revolute joints

  • Joint location, relative part position,
  • rientation, foreshortening - Gaussian
  • Estimate average and variation

Learned 10 part model

  • All parameters learned

> Including “joint locations”

  • Shown at ideal configuration (mean locations)

Example II: Generic Person Model

slide-47
SLIDE 47

Learning

Manual identification of

  • rectangular parts in a set of
  • training images hypotheses

Learn

  • relative position (x & y),
  • relative angle,
  • relative foreshortening
slide-48
SLIDE 48

Example: Recognizing People

NB: requires background subtraction

slide-49
SLIDE 49

Variety of Poses

slide-50
SLIDE 50

Variety of Poses

slide-51
SLIDE 51

Example III: Hand tracking for sign language interpretation

Buehler et al. BMVC’2008

slide-52
SLIDE 52

Example results

slide-53
SLIDE 53

Example IV: Part based models for object detection (Recall from Lecture 9)

[Felsenszwalb et al. 2009]

Code available online: http://people.cs.uchicago.edu/~pff/latent/

slide-54
SLIDE 54

Bicycle model

slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57