[PPT] - Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course PowerPoint Presentation

SLIDE 1

WST665/CS770A: Web-Scale Image Retrieval WST665/CS770A: Web Scale Image Retrieval

Recent Image Retrieval Techniques

Sung-Eui Yoon (윤성의) (윤성의)

C URL Course URL: http://sglab.kaist.ac.kr/~sungeui/IR

SLIDE 2

Today Today

Go over some of recent image retrieval

Go over some of recent image retrieval

techniques

2

SLIDE 3

G Video Google: A Text Retrieval Approach to Object Matching pp j g in Videos

Josef Sivic and Andrew Zisserman Josef Sivic and Andrew Zisserman Robotics Research Group, Department of Engineering Science University of Oxford United Kingdom University of Oxford, United Kingdom ICCV 03 ICCV 03 Citation: over 1300 at 2011

3

SLIDE 4

Motivations Motivations

Retrieve key frames and shots of a video

Retrieve key frames and shots of a video

containing a particular object

I nvestigate whether a text retrieval

approach can be successful for object approach can be successful for object recognition

4

SLIDE 5

Viewpoint Invariant Description Viewpoint Invariant Description

Fi d i i t i t i

Find viewpoint covariant regions
Produce elliptical affine invariant regions (e.g., Shape

Adapted (SA) and Maximally Stable(MS)) Adapted (SA) and Maximally Stable(MS))

SA regions centered on corner like features
MS regions correspond to high contrast with respect to

g p g p their surroundings (dark window, gray wall…)

Compute a SI FT descriptor for each region

p p g

5

SLIDE 6

MSER(M

i ll St bl E t l R i )

MSER(Maximally Stable Extremal Regions)

Affinely invariant stable regions in the

Affinely-invariant stable regions in the

image

can be used to localize regions around
can be used to localize regions around

keypoints

We will use only SI FT descriptors that are

y p inside of MSER regions

6

SLIDE 7

7

SLIDE 8

Visual Vocabulary Visual Vocabulary

Quantize descriptor vectors into clusters

Quantize descriptor vectors into clusters,

which are visual ‘word’ for text retrieval

Performed with K-means clustering
Performed with K-means clustering
Produce about 6K and 10K clusters for

Shape adapted and Maximally Stable p p y regions respectively

Chosen empirically to maximize retrieval

lt results

8

SLIDE 9

K Means Clustering K-Means Clustering

Minimize the within cluster sum of squares

Minimize the within-cluster sum of squares

(WCSS)

9

SLIDE 10

Distance Function Distance Function

U M h l bi di t th di t f ti

Use Mahalanobis distance as the distance function

for clustering: , where S is covariance matrix

I f S is the identify matrix, it reduces to Euclidean

distance

Decorrelate components of SI FT
Decorrelate components of SI FT
I nstead, Euclidean distance may be used

10

SLIDE 11

Visual Indexing Visual Indexing

E h d t i t d b k t

Each document is represented by k-vector
Weighting by tf-idf

f * l (i d f )

term frequency * log (inverse document frequency)
nid : # of occurrences of word i in document d
nd : total # of words in the document d
nd : total # of words in the document d
ni : # of occurrences of term i in the whole database
N: # of documents in the whole database
At the retrieval stage documents are ranked by

their normalized scalar product between query

11

p q y vector Vq and Vd in database

SLIDE 12

Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003]

AP i i

mAP: mean average precision

12

SLIDE 13

Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003]

P f hi hl d d d b f

Performance highly depended on number of

k(visual words) : not scalable

13

SLIDE 14

Scalable Recognition with a Vocabulary Tree Vocabulary Tree

David Niter et al. CVPR 2006 CVPR 2006 Citation: over 1000 at 2011

14

SLIDE 15

Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06]

Hi hi l k l t i

Hierarchical k-means clustering

15

SLIDE 16

Vocabulary tree with branch factor 10 factor 10

16

SLIDE 17

Inverted File Inverted File

17

SLIDE 18

Retrieval Algorithm Retrieval Algorithm

Compute a histogram of visual words with

Compute a histogram of visual words with

SI FTs

I dentify images that contain words of the
I dentify images that contain words of the

input query image

Can be done with the inverted file
Can be done with the inverted file
Sort images based on a similarity function

18

SLIDE 19

Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06]

On 8GB RAM machine(40000 images)queries took

1s, database creation took 2.5 days

19

1s, database creation took 2.5 days

SLIDE 20

Vocabulary Tree Vocabulary Tree

Benefits:

Benefits:
Allow faster image retrieval (and pre-

computation) computation)

Scales efficiently to a large number of images
Problems:
Too much memory requirement

Too much memory requirement

Quantization effects

20

SLIDE 21

Object retrieval with large j g vocabularies and fast spatial matching matching

Philbin et al. CVPR 2007 CVPR 2007 Citation: over 350 at 2011

21

SLIDE 22

Approximating K means Approximating K-means

Use a forest of 8 randomized k d trees

Use a forest of 8 randomized k-d trees
Randomize splitting dimension among a set of

the dimensions with highest variance the dimensions with highest variance

Randomly choose a point close to the median

for split value p

Helps to mitigate quantization effects
Each tree is descending to leaf, distance

g , from boundaries are recorded in a prior queue

Si il t b t bi fi t h

Similar to best-bin-first search

22

SLIDE 23

Appro imate K means Approximate K-means

Al ith i l it f i l k

Algorithmic complexity of a single k-means

iteration

Reduces from O(NK) to O(NlogK) where N is the # of
Reduces from O(NK) to O(NlogK), where N is the # of

features

Achieved by multiple random kd-trees
Find images with kd-trees too
But using approximate K-means, performance is

superior! p

Due to reduction of quantization effect)

23

SLIDE 24

Spatial Re Ranking with RANSAC Spatial Re-Ranking with RANSAC

Generate hypotheses with pairs of

Generate hypotheses with pairs of

corresponding features

Assume a restricted transformation since many
Assume a restricted transformation, since many

images on the web are captured in particular ways (axis-aligned ways)

Evaluate other pairs and measure errors
Re-ranking images by scoring the # of

g g y g inliers

24

SLIDE 25

Results Results

25

SLIDE 26

Results Results

26

SLIDE 27

Total Recall: Automatic Query Q y Expansions with a Generative Feature Model for Object Feature Model for Object Retrieval

Chum et al Chum et al. ICCV 2007 ICCV 2007 Citation: over 150 at 2011

27

SLIDE 28

Query Expansion Query Expansion

I mprove recall with re querying

I mprove recall with re-querying

combination of the original query and result with spatial verification result with spatial verification

query input DB result

28

s

SLIDE 29

Query Expansion Query Expansion

Spatial verification

Spatial verification
Similar with the technique used in [Philbin et

al 07]; Uses a RANSAC-like algorithm

al. 07]; Uses a RANSAC-like algorithm
I dentify a set of images that are very similar to

the original query image g q y g

29

SLIDE 30

BoW interpreted Probabilistically BoW interpreted Probabilistically

Extracts a generative model of an object

Extracts a generative model of an object

from the query region

Compute a response set that are likely to
Compute a response set that are likely to

have been generated from the model

The generative model
The generative model
Spatial configuration of visual words with a

background clutter g

30

SLIDE 31

Generative Models Generative Models

Q i b li

Query expansion baseline
Average term frequency vectors from the top 5 queries

without verification without verification

Transitive closure expansion
A priority queue of verified images is keyed by # of
A priority queue of verified images is keyed by # of

inliers

Take the top image and query it as a new query
Average query expansion
A new query is constructed by averaging the top 50

verified results (di is the term frequency vector of ith verified image)

31

SLIDE 32

Generative Models Generative Models

M lti l i l ti i

Multiple image resolution expansion
Consider images with different resolutions; higher

resolutions give more detailed information resolutions give more detailed information

Use a resolution band with (0, 4/ 5), (2/ 3, 3/ 2), and

(5/ 4, infinity)

Use averaged queries for each resolution band
Show the best result

32

SLIDE 33

Results Results

33

mAP

SLIDE 34

Results Results

Expanded results that were

34

Original query Top 4 images Expanded results that were not identified by the original query

SLIDE 35

Q Lost in Quantization: Improving Particular Object p g j Retrieval in Large Scale Image Databases Databases

Philbin et al. CVPR 2008 CVPR 2008 Citation: over 175 at 2011

35

SLIDE 36

Soft Quantization [Philbi

t l CVPR 08]

Soft Quantization [Philbin et al. CVPR 08]

3 and 4 will be never matched in hard assignment

g

No way of distinguishing 2 and 3 are closer than 1

and 2

Soft assignment: use a weight vector
A weight to a cluster is assigned proportional to the

36

distance between the descriptor and the center of the cluster

SLIDE 37

Results Results

37

SLIDE 38

Effect of Vocabulary Size and Number of Images Number of Images

For Oxford dataset with 1M vocabulary,

hard assignment index costs 36MB and soft 08 i h i costs 108MB with compression

38

SLIDE 39

City-Scale Location City Scale Location Recognition

Schindler et al Schindler et al. CVPR 2007 CVPR 2007 Citation: over 135 at 2011

39

SLIDE 40

City Scale Location Recognition City-Scale Location Recognition

40

SLIDE 41

Example Image Database Example Image Database

41

SLIDE 42

Challenges and Main Ideas Challenges and Main Ideas

Too many images

Too many images
Storage-space and search –time problems
Main approaches
Main approaches
Use a vocabulary tree to organize millions of

feature descriptors feature descriptors

Choose more informative image sets for

identifying locations, instead of organizing all the images

42

SLIDE 43

Informative Features Informative Features

Want to find features

Want to find features
Occur in all images of specific locations
But rarely or never occur anywhere outside of
But, rarely or never occur anywhere outside of

that single location

Can be captured formally in information
Can be captured formally in information

gain

How much uncertainty is removed by additional

y y knowledge

43

SLIDE 44

Information Gain Information Gain

How much uncertainty is removed by

How much uncertainty is removed by

additional knowledge

Entropy Conditional entropy Information gain Information gain Visual word We want to minimize it

44

Binary value when we are at a particular location word it

SLIDE 45

45

SLIDE 46

46

SLIDE 47

47

SLIDE 48

48

SLIDE 49

49

SLIDE 50

50

SLIDE 51

51

SLIDE 52

Informative Features Informative Features

a b

= a, = b
: # of images in the database

f

: # of images in each location
52

SLIDE 53

Results Results

1M VT k 10 L 6 7 5 million feature

1M VT, k= 10, L= 6, 7.5 million feature

points

53

SLIDE 54

Results Results

278 query images 324 VT 30K subset

278 query images, 324 VT, 30K subset

image database associated with GPS coordinate, 0.2s query time coordinate, 0.2s query time

54

SLIDE 55

Packing Bag-of-Features

Jegou et al Jegou et al. CVPR 2009 CVPR 2009 Citation: over 27 at 2011

55

SLIDE 56

Binary BOF Binary BOF

Binary BOF is good for large vocabulary

Binary BOF is good for large vocabulary

size

56

SLIDE 57

Memory Usage Memory Usage

10kb per image for raw binary BOF 1 2kb

10kb per image for raw binary BOF, 1-2kb

for compressed inverted file

57

SLIDE 58

MiniBOFs MiniBOFs

Split BOF vector, project it (aggregation:

di i d ti f k t d) dimension reduction from k to d)

Quantize each with k-means: use 4 bytes
For better results, use Hamming

Embedding[ECCV 2008] for each d i t f bit t d th

58

descriptors: a few more bits to encode the location with each cluster

SLIDE 59

Results Results

Achieves about 2 times lower memory given the y g similar mAP Improve its quality by using multiple BoF, while

59

g p , keeping memory low

SLIDE 60

Next Time Next Time…

Novel applications

Novel applications

60