Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course - - PowerPoint PPT Presentation

recent image retrieval techniques
SMART_READER_LITE
LIVE PREVIEW

Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course - - PowerPoint PPT Presentation

WST665/CS770A: Web-Scale Image Retrieval WST665/CS770A: Web Scale Image Retrieval Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course URL: URL http://sglab.kaist.ac.kr/~sungeui/IR Go over some of recent


slide-1
SLIDE 1

WST665/CS770A: Web-Scale Image Retrieval WST665/CS770A: Web Scale Image Retrieval

Recent Image Retrieval Techniques

Sung-Eui Yoon (윤성의) (윤성의)

C URL Course URL: http://sglab.kaist.ac.kr/~sungeui/IR

slide-2
SLIDE 2

Today Today

Go over some of recent image retrieval

  • Go over some of recent image retrieval

techniques

2

slide-3
SLIDE 3

G Video Google: A Text Retrieval Approach to Object Matching pp j g in Videos

Josef Sivic and Andrew Zisserman Josef Sivic and Andrew Zisserman Robotics Research Group, Department of Engineering Science University of Oxford United Kingdom University of Oxford, United Kingdom ICCV 03 ICCV 03 Citation: over 1300 at 2011

3

slide-4
SLIDE 4

Motivations Motivations

Retrieve key frames and shots of a video

  • Retrieve key frames and shots of a video

containing a particular object

  • I nvestigate whether a text retrieval

approach can be successful for object approach can be successful for object recognition

4

slide-5
SLIDE 5

Viewpoint Invariant Description Viewpoint Invariant Description

Fi d i i t i t i

  • Find viewpoint covariant regions
  • Produce elliptical affine invariant regions (e.g., Shape

Adapted (SA) and Maximally Stable(MS)) Adapted (SA) and Maximally Stable(MS))

  • SA regions centered on corner like features
  • MS regions correspond to high contrast with respect to

g p g p their surroundings (dark window, gray wall…)

  • Compute a SI FT descriptor for each region

p p g

5

slide-6
SLIDE 6

MSER(M

i ll St bl E t l R i )

MSER(Maximally Stable Extremal Regions)

Affinely invariant stable regions in the

  • Affinely-invariant stable regions in the

image

  • can be used to localize regions around
  • can be used to localize regions around

keypoints

  • We will use only SI FT descriptors that are

y p inside of MSER regions

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

Visual Vocabulary Visual Vocabulary

Quantize descriptor vectors into clusters

  • Quantize descriptor vectors into clusters,

which are visual ‘word’ for text retrieval

  • Performed with K-means clustering
  • Performed with K-means clustering
  • Produce about 6K and 10K clusters for

Shape adapted and Maximally Stable p p y regions respectively

  • Chosen empirically to maximize retrieval

lt results

8

slide-9
SLIDE 9

K Means Clustering K-Means Clustering

Minimize the within cluster sum of squares

  • Minimize the within-cluster sum of squares

(WCSS)

9

slide-10
SLIDE 10

Distance Function Distance Function

U M h l bi di t th di t f ti

  • Use Mahalanobis distance as the distance function

for clustering: , where S is covariance matrix

  • I f S is the identify matrix, it reduces to Euclidean

distance

  • Decorrelate components of SI FT
  • Decorrelate components of SI FT
  • I nstead, Euclidean distance may be used

10

slide-11
SLIDE 11

Visual Indexing Visual Indexing

E h d t i t d b k t

  • Each document is represented by k-vector
  • Weighting by tf-idf

f * l (i d f )

  • term frequency * log (inverse document frequency)
  • nid : # of occurrences of word i in document d
  • nd : total # of words in the document d
  • nd : total # of words in the document d
  • ni : # of occurrences of term i in the whole database
  • N: # of documents in the whole database
  • At the retrieval stage documents are ranked by

their normalized scalar product between query

11

p q y vector Vq and Vd in database

slide-12
SLIDE 12

Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003]

AP i i

  • mAP: mean average precision

12

slide-13
SLIDE 13

Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003]

P f hi hl d d d b f

  • Performance highly depended on number of

k(visual words) : not scalable

13

slide-14
SLIDE 14

Scalable Recognition with a Vocabulary Tree Vocabulary Tree

David Niter et al. CVPR 2006 CVPR 2006 Citation: over 1000 at 2011

14

slide-15
SLIDE 15

Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06]

Hi hi l k l t i

  • Hierarchical k-means clustering

15

slide-16
SLIDE 16

Vocabulary tree with branch factor 10 factor 10

16

slide-17
SLIDE 17

Inverted File Inverted File

17

slide-18
SLIDE 18

Retrieval Algorithm Retrieval Algorithm

Compute a histogram of visual words with

  • Compute a histogram of visual words with

SI FTs

  • I dentify images that contain words of the
  • I dentify images that contain words of the

input query image

  • Can be done with the inverted file
  • Can be done with the inverted file
  • Sort images based on a similarity function

18

slide-19
SLIDE 19

Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06]

  • On 8GB RAM machine(40000 images)queries took

1s, database creation took 2.5 days

19

1s, database creation took 2.5 days

slide-20
SLIDE 20

Vocabulary Tree Vocabulary Tree

Benefits:

  • Benefits:
  • Allow faster image retrieval (and pre-

computation) computation)

  • Scales efficiently to a large number of images
  • Problems:
  • Too much memory requirement

Too much memory requirement

  • Quantization effects

20

slide-21
SLIDE 21

Object retrieval with large j g vocabularies and fast spatial matching matching

Philbin et al. CVPR 2007 CVPR 2007 Citation: over 350 at 2011

21

slide-22
SLIDE 22

Approximating K means Approximating K-means

Use a forest of 8 randomized k d trees

  • Use a forest of 8 randomized k-d trees
  • Randomize splitting dimension among a set of

the dimensions with highest variance the dimensions with highest variance

  • Randomly choose a point close to the median

for split value p

  • Helps to mitigate quantization effects
  • Each tree is descending to leaf, distance

g , from boundaries are recorded in a prior queue

Si il t b t bi fi t h

  • Similar to best-bin-first search

22

slide-23
SLIDE 23

Appro imate K means Approximate K-means

Al ith i l it f i l k

  • Algorithmic complexity of a single k-means

iteration

  • Reduces from O(NK) to O(NlogK) where N is the # of
  • Reduces from O(NK) to O(NlogK), where N is the # of

features

  • Achieved by multiple random kd-trees
  • Find images with kd-trees too
  • But using approximate K-means, performance is

superior! p

  • Due to reduction of quantization effect)

23

slide-24
SLIDE 24

Spatial Re Ranking with RANSAC Spatial Re-Ranking with RANSAC

Generate hypotheses with pairs of

  • Generate hypotheses with pairs of

corresponding features

  • Assume a restricted transformation since many
  • Assume a restricted transformation, since many

images on the web are captured in particular ways (axis-aligned ways)

  • Evaluate other pairs and measure errors
  • Re-ranking images by scoring the # of

g g y g inliers

24

slide-25
SLIDE 25

Results Results

25

slide-26
SLIDE 26

Results Results

26

slide-27
SLIDE 27

Total Recall: Automatic Query Q y Expansions with a Generative Feature Model for Object Feature Model for Object Retrieval

Chum et al Chum et al. ICCV 2007 ICCV 2007 Citation: over 150 at 2011

27

slide-28
SLIDE 28

Query Expansion Query Expansion

I mprove recall with re querying

  • I mprove recall with re-querying

combination of the original query and result with spatial verification result with spatial verification

query input DB result

28

s

slide-29
SLIDE 29

Query Expansion Query Expansion

Spatial verification

  • Spatial verification
  • Similar with the technique used in [Philbin et

al 07]; Uses a RANSAC-like algorithm

  • al. 07]; Uses a RANSAC-like algorithm
  • I dentify a set of images that are very similar to

the original query image g q y g

29

slide-30
SLIDE 30

BoW interpreted Probabilistically BoW interpreted Probabilistically

Extracts a generative model of an object

  • Extracts a generative model of an object

from the query region

  • Compute a response set that are likely to
  • Compute a response set that are likely to

have been generated from the model

  • The generative model
  • The generative model
  • Spatial configuration of visual words with a

background clutter g

30

slide-31
SLIDE 31

Generative Models Generative Models

Q i b li

  • Query expansion baseline
  • Average term frequency vectors from the top 5 queries

without verification without verification

  • Transitive closure expansion
  • A priority queue of verified images is keyed by # of
  • A priority queue of verified images is keyed by # of

inliers

  • Take the top image and query it as a new query
  • Average query expansion
  • A new query is constructed by averaging the top 50

verified results (di is the term frequency vector of ith verified image)

31

slide-32
SLIDE 32

Generative Models Generative Models

M lti l i l ti i

  • Multiple image resolution expansion
  • Consider images with different resolutions; higher

resolutions give more detailed information resolutions give more detailed information

  • Use a resolution band with (0, 4/ 5), (2/ 3, 3/ 2), and

(5/ 4, infinity)

  • Use averaged queries for each resolution band
  • Show the best result

32

slide-33
SLIDE 33

Results Results

33

mAP

slide-34
SLIDE 34

Results Results

Expanded results that were

34

Original query Top 4 images Expanded results that were not identified by the original query

slide-35
SLIDE 35

Q Lost in Quantization: Improving Particular Object p g j Retrieval in Large Scale Image Databases Databases

Philbin et al. CVPR 2008 CVPR 2008 Citation: over 175 at 2011

35

slide-36
SLIDE 36

Soft Quantization [Philbi

t l CVPR 08]

Soft Quantization [Philbin et al. CVPR 08]

  • 3 and 4 will be never matched in hard assignment

g

  • No way of distinguishing 2 and 3 are closer than 1

and 2

  • Soft assignment: use a weight vector
  • A weight to a cluster is assigned proportional to the

36

distance between the descriptor and the center of the cluster

slide-37
SLIDE 37

Results Results

37

slide-38
SLIDE 38

Effect of Vocabulary Size and Number of Images Number of Images

  • For Oxford dataset with 1M vocabulary,

hard assignment index costs 36MB and soft 08 i h i costs 108MB with compression

38

slide-39
SLIDE 39

City-Scale Location City Scale Location Recognition

Schindler et al Schindler et al. CVPR 2007 CVPR 2007 Citation: over 135 at 2011

39

slide-40
SLIDE 40

City Scale Location Recognition City-Scale Location Recognition

40

slide-41
SLIDE 41

Example Image Database Example Image Database

41

slide-42
SLIDE 42

Challenges and Main Ideas Challenges and Main Ideas

Too many images

  • Too many images
  • Storage-space and search –time problems
  • Main approaches
  • Main approaches
  • Use a vocabulary tree to organize millions of

feature descriptors feature descriptors

  • Choose more informative image sets for

identifying locations, instead of organizing all the images

42

slide-43
SLIDE 43

Informative Features Informative Features

Want to find features

  • Want to find features
  • Occur in all images of specific locations
  • But rarely or never occur anywhere outside of
  • But, rarely or never occur anywhere outside of

that single location

  • Can be captured formally in information
  • Can be captured formally in information

gain

  • How much uncertainty is removed by additional

y y knowledge

43

slide-44
SLIDE 44

Information Gain Information Gain

How much uncertainty is removed by

  • How much uncertainty is removed by

additional knowledge

Entropy Conditional entropy Information gain Information gain Visual word We want to minimize it

44

Binary value when we are at a particular location word it

slide-45
SLIDE 45

45

slide-46
SLIDE 46

46

slide-47
SLIDE 47

47

slide-48
SLIDE 48

48

slide-49
SLIDE 49

49

slide-50
SLIDE 50

50

slide-51
SLIDE 51

51

slide-52
SLIDE 52

Informative Features Informative Features

a b

  • = a, = b
  • : # of images in the database

f

  • : # of images in each location
  • 52
slide-53
SLIDE 53

Results Results

1M VT k 10 L 6 7 5 million feature

  • 1M VT, k= 10, L= 6, 7.5 million feature

points

53

slide-54
SLIDE 54

Results Results

278 query images 324 VT 30K subset

  • 278 query images, 324 VT, 30K subset

image database associated with GPS coordinate, 0.2s query time coordinate, 0.2s query time

54

slide-55
SLIDE 55

Packing Bag-of-Features

Jegou et al Jegou et al. CVPR 2009 CVPR 2009 Citation: over 27 at 2011

55

slide-56
SLIDE 56

Binary BOF Binary BOF

Binary BOF is good for large vocabulary

  • Binary BOF is good for large vocabulary

size

56

slide-57
SLIDE 57

Memory Usage Memory Usage

10kb per image for raw binary BOF 1 2kb

  • 10kb per image for raw binary BOF, 1-2kb

for compressed inverted file

57

slide-58
SLIDE 58

MiniBOFs MiniBOFs

  • Split BOF vector, project it (aggregation:

di i d ti f k t d) dimension reduction from k to d)

  • Quantize each with k-means: use 4 bytes
  • For better results, use Hamming

Embedding[ECCV 2008] for each d i t f bit t d th

58

descriptors: a few more bits to encode the location with each cluster

slide-59
SLIDE 59

Results Results

Achieves about 2 times lower memory given the y g similar mAP Improve its quality by using multiple BoF, while

59

g p , keeping memory low

slide-60
SLIDE 60

Next Time Next Time…

Novel applications

  • Novel applications

60