[PPT] - and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou PowerPoint Presentation

SLIDE 1

Semantic Image Indexing and Retrieval

SLIDE 2

Source: H. Jegou

SLIDE 3

Source: H. Jegou

SLIDE 4

Source: H. Jegou

SLIDE 5

Source: H. Jegou

SLIDE 6

Source: H. Jegou

SLIDE 7

Source: H. Jegou

SLIDE 8

Source: H. Jegou

SLIDE 9

Source: H. Jegou

SLIDE 10

Outline

State of the nation
Early description methods
Local features detection and description
Matching local features
Large scale image retrieval
Image classification

SLIDE 11

Context

SLIDE 12

Context

Available online video content is ever increasing

– 35 hours of video uploaded every minute – 2 billion videos watched per day – 20 million videos uploaded each month – 2+ billion videos watched per month

– Archived TV content is growing

1.5 million hours = 120 km of shelves
300000 hours | 1 petabyte/year

Content difficult to search and reuse Barely invisible for search engines

SLIDE 13

Context

Query by example is reaching the general public

SLIDE 14

Context

Other fields are equally popular
Visual data-mining
Security applications
Advertising
Augmented reality
Assistive technologies
Autonomous vehicles
Landmark recognition

SLIDE 15

Why is it so difficult to find appropriate multimedia content, to reuse it and to present this content in interfaces that vary with user needs?

SLIDE 16

tree dog car humans machine tree dog car

SLIDE 17

tree dog car humans machine tree dog car

The semantic gap is the lack of coincidence between the information that one can extract from the sensory data and the interpretation that the same data has for a user in a given situation [Arnold Smeulders, PAMI, 2000]

SLIDE 18

Automatic semantics extraction

Generic principle : workflow

Low-level visual description Image- & mid-level feature aggregation Learning & Classification

knowledge

Sky Greek guard My ex-boss Greek Parliament

pixels

Introduction

SLIDE 19

Properties of ideal features

Local: features are local, so robust to occlusion and

clutter (no prior segmentation)

Invariant (or covariant)
Robust: noise, blur, discretization, compression, etc. do

not have a big impact on the feature

Distinctive: individual features can be matched to a large

database of objects

Quantity: many features can be generated even for

small objects

Accurate: precise localization
Efficient: close to real-time performance

SLIDE 20

Any other philosophical aspects? Let’s get to business.

SLIDE 21

Color Color space Color quantization Scalable color (Haar transform representation) Color structure (histogram of structuring elements) GoF/GoP color histograms Dominant colors Color Layout (DCT-based) Texture Gradient orientation histogram Homogeneous texture (multi-resolution Gabor filters) Texture browsing (Tamura features) Shape Region shape (2D) (Angular Radial Transform) Contour shape (2D) (contour scale space) 3D shape (3D shape spectrum) Motion Parametric motion Motion trajectory Camera motion (complete 3D camera model) Motion activity Others Face recognition (eigenfaces)

More visual descriptors …

MPEG-7 visual descriptors

SLIDE 22

Any other global appearance descriptors?

SLIDE 23

We need to establish a correspondence between the

target image and other images in the model database

Lowe, 1999 model Target image

How can we recognize specific

bjects?

SLIDE 24

Object class recognition

How can we recognize specific

bjects?

SLIDE 25

Find these landmarks ...in these images and other 1M more

How can we recognize specific

bjects?

SLIDE 26

We need to establish a correspondence between the

query image and all images from the database depicting the same object/scene

Query image Database image

Solution?

SLIDE 27

Source: J.Sivic

Finding the object despite possibly large changes in

scale, viewpoint, lighting and partial occlusion

Viewpoint Scale

Occlusion

Lighting

Main challenges?

SLIDE 28

Application

https://www.youtube.com/watch?v=Hhgfz0zPmH4

SLIDE 29

Application

https://www.youtube.com/watch?v=w95kwXy_MOY&feature=youtu.be&t=26m30s

SLIDE 30

Global representations have major limitations
Instead, describe and match only local regions
Increased robustness to

– Occlusions – Articulation – Intra-category variations

θq

φ

dq

φ θ

d

Local features

Motivation

SLIDE 31

1) Detection: identify the interest points 2) Description: extract vector feature descriptor around each point of interest 3) Matching: determine correspondence between descriptors in two views

Source: K.Grauman

Local features

Main components

SLIDE 32

Interest operator repeatability

– We want to detect at least the same points in both images, while running the detection procedure independently per image

Cannot find true matches here

Source: K.Grauman

We need a repeatable detector

Local features

Desired features

SLIDE 33

Descriptor distinctiveness

– We want to be able to reliably determine which point goes with which – We must provide some invariance to geometric and photometric differences between the two views

Source: K.Grauman

?

We need a reliable and distinctive detector

Local features

Desired features

SLIDE 34

1) Detection: identify the interest points 2) Description: extract vector feature descriptor around each point of interest 3) Matching: determine correspondence between descriptors in two views

Source: K.Grauman

Local features

Main components

SLIDE 35

What parts would you choose?

SLIDE 36

Let’s try the corners

SLIDE 37

Finding corners

Key property: in the region around a corner, image

gradient has two or more dominant directions

Corners are repeatable and distinctive

Source: L. Lazebnik

SLIDE 38

Many existing detectors available

Hessian & Harris [Beaudet ‘78], [Harris ‘88] Laplacian, DoG [Lindeberg ‘98], [Lowe 1999] Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01] Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04] EBR and IBR [Tuytelaars & Van Gool ‘04] MSER [Matas ‘02] Salient Regions [Kadir & Brady ‘01] Others…

SLIDE 39

Many existing detectors available

Hessian & Harris [Beaudet ‘78], [Harris ‘88] Laplacian, DoG [Lindeberg ‘98], [Lowe 1999] Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01] Harris-/Hessian-Affine [Mikolajczyk & Schmid ‘04] EBR and IBR [Tuytelaars & Van Gool ‘04] MSER [Matas ‘02] Salient Regions [Kadir & Brady ‘01] Others…

SLIDE 40

We should easily recognize the point by looking through a

small window

Shifting a window in any direction should give a large

change in intensity

Harris detector – basic idea

How to find corners?

SLIDE 41

Source: K. Grauman

“edge”: no change along the edge direction “corner”: significant change in all directions “flat” region: no change in all directions

Harris detector – basic idea

How to find corners?

SLIDE 42

Source: R. Szeliski

Window-averaged change of intensity induced by shifting the image date by [u,v]:

 

2 ,

( , ) ( , ) ( , ) ( , )

x y

E u v w x y I x u y v I x y    



Intensity Shifted intensity Window function

r

Window function w(x,y) = Gaussian 1 in window, 0 outside

Harris detector

Maths

SLIDE 43

Taylor series approximation to shifted image

Harris detector

Maths

SLIDE 44

Expanding I(x,y) in a Taylor series expansion, we have a bilinear approximation for small shifts [u,v]:

       v u M v u v u E ] [ ) , (

where M is a 2x2 matrix computed from image derivates:

Source: R. Szeliski

2 2 ,

( , )

x x y x y x y y

I I I M w x y I I I         



Sum over image region – area we are checking for corner Gradient with respect to x, times gradient with respect to y

M is also called “structure tensor”

Harris detector

Maths

SLIDE 45

Let’s consider an axis-aligned corner

Harris detector

What does this matrix reveal?

SLIDE 46

In this case:

The dominant gradient directions align with x or y axis
If either λ is close to 0, then this is not a corner, so look for

locations where both are large.

               

   

2 1 2 2

 

y y x y x x

I I I I I I M

Harris detector

What does this matrix reveal?

SLIDE 47

Intensity change in shifting window: eigenvalue analysis Ellipse E(u,v)

E(u,v) » [u v] M u v é ë ê ù û ú

eigenvalues of M

direction of the slowest change direction of the fastest change

(max)-1/2 (min)-1/2

Harris detector

General case

SLIDE 48

Ixx Iyy Ixy

2

)) ( det(

xy yy xx

I I I I Hessian  

      

yy xy xy xx

I I I I I Hessian ) (

Harris detector

Hessian determinant

SLIDE 49

Iy

Second moment matrix / autocorrelation matrix

          ) ( ) ( ) ( ) ( ) ( ) , (

2 2 D y D y x D y x D x I D I

I I I I I I g        

1. Image derivatives gx(D), gy(D), Ix Iy

Harris detector

Hessian determinant

SLIDE 50

Iy

Second moment matrix / autocorrelation matrix

          ) ( ) ( ) ( ) ( ) ( ) , (

2 2 D y D y x D y x D x I D I

I I I I I I g        

1. Image derivatives gx(D), gy(D), Ix Iy

2. Square of

derivatives Ix

2

Iy

2

IxIy

Harris detector

Hessian determinant

Ix

2

SLIDE 51

Second moment matrix / autocorrelation matrix

          ) ( ) ( ) ( ) ( ) ( ) , (

2 2 D y D y x D y x D x I D I

I I I I I I g        

1. Image

derivatives

2. Square of

derivatives

3. Gaussian filter

g(I) Ix Iy Ix

2

Iy

2

IxIy g(Ix

2)

g(Iy

2)

g(IxIy)

Harris detector

Hessian determinant

SLIDE 52

Second moment matrix / autocorrelation matrix
1. Image

derivatives

2. Square of

derivatives

3. Gaussian

filter g(I)

Ix Iy Ix

2

Iy

2

IxIy g(Ix

2)

g(Iy

2)

g(IxIy)

2 2 2 2 2 2

)] ( ) ( [ )] ( [ ) ( ) (

y x y x y x

I g I g I I g I g I g    

   ))] , ( [trace( )] , ( det[

D I D I

har       

4. Cornerness function – both eigenvalues are strong

har

5. Non-maxima suppression

Harris detector

Hessian determinant

SLIDE 53

large small

Harris detector

How to select features?

SLIDE 54

large large

Harris detector

How to select features?

SLIDE 55

small small

Harris detector

How to select features?

SLIDE 56

1 2 “Corner” 1 and 2 are large, 1 ~ 2; E increases in all

directions 1 and 2 are small; E is almost constant

in all directions

“Edge” 1 >> 2 “Edge” 2 >> 1 “Flat” region

Classification of image points using eigenvalues of M:

Source: K. Grauman

Harris detector

Interpretation of eigenvalues

SLIDE 57

Measure of corner response:

Does not require computing of the eigenvalues

α - constant α = 0.04 - 0.06

Harris detector

Corner response function

SLIDE 58

1 “Corner” R > 0 “Edge” R < 0 “Edge” R < 0 “Flat” region |R| small 2

Source: K. Grauman

Harris detector

Corner response function

SLIDE 59

Compute M matrix within all image windows to get their R

scores

Find points with large corner response (R > threshold)
Take the points of local maxima of R

Harris detector

Algorithm

SLIDE 60

Source: D. Frolova

Harris detector

Algorithm

SLIDE 61

Source: D. Frolova

Compute corner response R

Harris detector

Algorithm

SLIDE 62

Source: D. Frolova

Find points with large corner response: R>threshold

Harris detector

Algorithm

SLIDE 63

Source: D. Frolova

Take only the points of local maxima of R

Harris detector

Algorithm

SLIDE 64

Source: D. Frolova

Harris detector

Algorithm

SLIDE 65

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

Harris detector

Typical response

SLIDE 66

Properties of ideal feature

Local: features are local, so robust to occlusion and

clutter (no prior segmentation)

Invariant (or covariant)
Robust: noise, blur, discretization, compression, etc. do

not have a big impact on the feature

Distinctive: individual features can be matched to a

large database of objects

Quantity: many features can be generated for even

small objects

Accurate: precise localization
Efficient: close to real-time performance

Remember this?

SLIDE 67

Is the Harris corner detector rotation invariant?

Ellipse rotates but its shape remains the same Corner response R is invariant to image rotation

SLIDE 68

Is the Harris corner detector scale invariant?

SLIDE 69

Is the Harris corner detector scale invariant?

Not invariant to image scale! All points will be classified as edges This is a corner

SLIDE 70

How can we detect scale invariant interest points?

SLIDE 71

Source: T. Tuytelaars

Exhaustive search

A multi-scale approach

SLIDE 72

Source: T. Tuytelaars

Exhaustive search

A multi-scale approach

SLIDE 73

Source: T. Tuytelaars

Exhaustive search

A multi-scale approach

SLIDE 74

Source: T. Tuytelaars

Exhaustive search

A multi-scale approach

SLIDE 75

Source: T. Tuytelaars

Extract patch from each image individually

Exhaustive search

A multi-scale approach

SLIDE 76

Source: T. Tuytelaars

We want to extract the patches from each image independently.

Scale invariant detection

SLIDE 77

Lindeberg et. al, 1996

Design a function on the region, which is “scale invariant” (the same for corresponding regions, even if they are at different scales)

Example: average intensity. For corresponding regions (even of different sizes) it will be the same.

For a point in one image, we can consider it as a function of

region size (patch width)

scale = 1/2

f

region size Image 1

f

region size Image 2

Exhaustive search

Solution

SLIDE 78

Lindeberg et. al, 1996

Take a local maximum of this function Observation: region size, for which the maximum is achieved, should be invariant to image scale. This scale invariant region size is found in each image independently!

scale = 1/2

f

region size Image 1

f

region size Image 2

s1 s2

Scale invariant detection

Common approach

SLIDE 79

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

)) , ( ( )) , ( (

1 1

     x I f x I f

m m

i i i i  

Same operator responses if the patch contains the same image up to scale factor How to find corresponding patch sizes?

Scale invariant detection

SLIDE 80

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

Function responses for increasing scale (scale signature)

)) , ( (

1

 x I f

m

i i 

)) , ( (

1

 x I f

m

i i





Scale invariant detection

SLIDE 81

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

)) , ( (

1

 x I f

m

i i 

)) , ( (

1

 x I f

m

i i





Scale invariant detection

Function responses for increasing scale (scale signature)

SLIDE 82

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

)) , ( (

1

 x I f

m

i i 

)) , ( (

1

 x I f

m

i i





Scale invariant detection

Function responses for increasing scale (scale signature)

SLIDE 83

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

)) , ( (

1

 x I f

m

i i 

)) , ( (

1

 x I f

m

i i





Scale invariant detection

Function responses for increasing scale (scale signature)

SLIDE 84

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

)) , ( (

1

 x I f

m

i i 

)) , ( (

1

 x I f

m

i i





Scale invariant detection

Function responses for increasing scale (scale signature)

SLIDE 85

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

)) , ( (

1

 x I f

m

i i 

)) , ( (

1

  x I f

m

i i 

Scale invariant detection

Function responses for increasing scale (scale signature)

SLIDE 86

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial

Scale invariant detection

Function responses for increasing scale (scale signature)

SLIDE 87

scale = 1/2

f

region size Image 1

f

region size Image 2

s1 s2

Scale invariant detection

Common approach

SLIDE 88

A good function for scale detection has one stable

sharp peak

For usual images: a good function would be one

which responds to contrast (sharp local intensity change)

Scale invariant detection

Common approach

SLIDE 89

Source: L. Lazebnik

We define the characteristic scale that produces peak of Laplacian response

Scale invariant detection

Common approach

SLIDE 90

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 91

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 92

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 93

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 94

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 95

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 96

Interest points (blobs) are local maxima in both position and scale

scale

 List of (x, y, σ)

Source: K. Grauman

Scale invariant detection

Harris-Laplace

SLIDE 97

Scale-space blob detector example

Source: T. Lindeberg

Scale invariant detection

Harris-Laplace

SLIDE 98

Scale-space blob detector example

Source: L. Lazebnik

Scale invariant detection

Harris-Laplace

SLIDE 99

Harris points vs. Harris Laplace points

Source: C. Schmid

Harris points Harris-Laplace points

Scale invariant detection

Harris-Laplace

SLIDE 100

Functions for determining scale

Kernels:

 

2

( , , ) ( , , )

xx yy

L G x y G x y      ( , , ) ( , , ) DoG G x y k G x y    

(Laplacian) (Difference of Gaussians) where Gaussian

f = Kernel * Image

Lowe, 1999

Scale invariant detection

Technical detail

SLIDE 101

Source: T. Tuytelaars

Local maxima in scale space of

Laplacian of Gaussian LoG

) ( ) (  

yy xx

L L 

 2 3 4 5

list of x, y,  scale

=

Scale invariant detection

Technical detail

SLIDE 102

 

2

( , , ) ( , , )

xx yy

L G x y G x y      ( , , ) ( , , ) DoG G x y k G x y    

(Laplacian) (Difference of Gaussians)

Source: T. Tuytelaars

=

Scale invariant detection

Difference of Gaussians approximation

SLIDE 103

Source: T. Tuytelaars

LoG result

Scale invariant detection

Technical detail

SLIDE 104

 

Original image

4 1

2  

sampling with step 4 =2

Source: T. Tuytelaars

Scale invariant detection

Difference of Gaussians approximation

SLIDE 105

list of x, y, 

Detect maxima of difference-of-

Gaussian (DoG) in scale space

Then reject points with low contrast

(threshold)

Eliminate edge response

Scale invariant detection

Difference of Gaussians approximation

SLIDE 106

(a) 233x189 image (b) 832 DOG extrema (c) 729 left after peak value threshold (d) 536 left after testing ratio of principle curvatures (removing edge responses)

Source: D. Lowe

Scale invariant detection

Difference of Gaussians approximation

SLIDE 107

Harris corner detector Harris Laplace detector DoG detector

Scale invariant detection

SLIDE 108

Nonlinear SVMs

Simple, efficient scheme
Laplacian fires more on edges than

determinant of hessian

Scale invariant detection

Evaluation

SLIDE 109

Given: two images of the same scene with a large

scale difference between them

Goal: find the same interest points independently in

each image

Solution: search for maxima of suitable functions in

scale and in space (over the image)

Scale Invariant local features

Summary

SLIDE 110

Local features

Main components 1) Detection: identify the interest points 2) Description: extract vector feature descriptor around each point of interest 3) Matching: determine correspondence between descriptors in two views

SLIDE 111

We know how to detect points
Next question:

How to describe them for matching?

?

The point descriptor should be invariant and distinctive

Local features

SLIDE 112

Geometry:

– Rotation – Similarity (rotation + uniform scale) – Affine (scale dependent on direction)

Photometry:

– Affine intensity change (I -> aI + b)

Source: C. Snoek

Models of image change

SLIDE 113

The easiest way to describe the neighborhood around an

interest point is to write down the list of intensities to form a feature vector

Consider that this is very sensitive to even small shifts,

rotations

Local descriptors

SLIDE 114

Patches

– Disadvantage of patches as descriptors: small shifts can strongly affect the matching

Histograms

2 p

Source: L. Lazebnik

Local descriptors

SLIDE 115

SIFT descriptor

Scale Invariant Feature Transform
How does it work?

– Divide patch into 4x4 sub-patches: 16 cells – Compute histogram of gradient orientations (8 reference angles) for all pixels inside each sub-patch – Resulting descriptor: 4x4x8 = 128 dimensions

Lowe, 2004

SLIDE 116

SIFT descriptor

Compute orientation histogram
Select dominant orientation
Normalize: rotate to fixed orientation

2p

Rotation invariance

Lowe, 2004

SLIDE 117

SIFT descriptor

Multiple dominant orientations

Rotation invariance

Lowe, 2004

SLIDE 118

SIFT descriptor

Rotation invariance

SLIDE 119

SIFT descriptor

Normalize the descriptor to norm 1
A change in image contrast can change

the magitude but not the orientation

Reduce the influence of large

gradient magnitudes

Threshold the values in the unit

feature vector to be no larger than 0.2

Renormalize the feature vector after

thresholding.

Illumination invariance

SLIDE 120

Evaluation

Nonlinear SVMs

Very robust
80% Repeatability at:
10% image noise
45° viewing angle
1k-100k keypoints in database
Best descriptor in [Mikolajczyk &

Schmid 2005]’s extensive survey

11200+ citations on Google Scholar

already for [2004] paper

Source code available for download

SIFT descriptor

SLIDE 121

SIFT descriptor

Performance

SLIDE 122

SIFT descriptor

Performance

SLIDE 123

One image yields:
n 128-dimensional descriptors: each one is a histogram
f the gradient orientations within a patch

> [n x 128 matrix]

n scale parameters specifying the size of each patch

> [n x 1 vector]

n orientation parameters specifying the angle of the

patch > [n x 1 vector]

n 2d points giving positions of the patches

> [n x 2 matrix]

SIFT descriptor

Performance

SLIDE 124

Panorama stitching

SIFT descriptor

Applications

SLIDE 125

Panorama stitching – iPhone app

http://www.cloudburstresearch.com/

SIFT descriptor

Applications

SLIDE 126

SIFT descriptor

Applications

SLIDE 127

SIFT descriptor

Applications

SLIDE 128

SIFT descriptor

Applications

SLIDE 129

209

B. Leibe

Mobile tourist guide

Self-localization
Object/building recognition
Photo/video augmentation

Quack, 2008

SIFT descriptor

Applications

SLIDE 130

B. Leibe
Augmented reality