Large-Scale Video Retrieval Using Image Queries Andr Filgueiras de - - PowerPoint PPT Presentation

large scale video retrieval using image queries
SMART_READER_LITE
LIVE PREVIEW

Large-Scale Video Retrieval Using Image Queries Andr Filgueiras de - - PowerPoint PPT Presentation

Large-Scale Video Retrieval Using Image Queries Andr Filgueiras de Araujo Department of Electrical Engineering Stanford University Andre Araujo Large-Scale Video Retrieval Using Image Queries 1 The Dark Matter of the Digital Age


slide-1
SLIDE 1

1 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Large-Scale Video Retrieval Using Image Queries

André Filgueiras de Araujo Department of Electrical Engineering Stanford University

slide-2
SLIDE 2

2 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

The “Dark Matter” of the Digital Age

85% of data in the form of multimedia 400+ hours of video uploaded per minute 8+ billion video views per day 100+ hours of video uploaded per minute

Key problem: How can we make sense of these data?

slide-3
SLIDE 3

3 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Automatic Visual Recognition

Image classification

  • Is this an urban landscape?

Object detection

  • Does this image contain a

bus? Where?

Instance recognition (a.k.a. “visual search”)

  • Does this image contain the

“Wicked” billboard?

slide-4
SLIDE 4

4 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Visual Search

Product recognition

[Tsai et al., MM’08, MM’10]

Location recognition

[Chen et al., CVPR’11]

Commercial applications

Retrieval ¡ System ¡ Database of images Image query

slide-5
SLIDE 5

5 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Video Retrieval Using Image Queries

Retrieval ¡ System ¡ Database of video clips Image query

Applications:

  • Brand monitoring: search YouTube using product images
  • News videos: search event footage using photos
  • Online education: search lectures using slides
slide-6
SLIDE 6

6 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Online Prototype http://videosearch.stanford.edu

slide-7
SLIDE 7

7 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Query image Query descriptor Query-to- frames Frame short-list 1 2 3 Geometric verification Final result 1 2

Frame index Feature index

Feature matching

Simple Architecture

Too many frames à does not scale

slide-8
SLIDE 8

8 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Query image Query descriptor Query-to- clips 1 2 3 Query-to- frames Frame short-list 1 2 3 Geometric verification Final result 1 2

Clip index Frame index Feature index

Feature matching

Large-Scale Architecture

Clip short-list Focus of this work

slide-9
SLIDE 9

9 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Query descriptor Query-to- clips 1 2 3

Clip index

Video Retrieval Using Image Queries

Clip short-list

Main challenges:

  • Asymmetry: how can we compare images to videos?
  • Temporal aggregation: how can we describe a video clip

for query-by-image retrieval?

slide-10
SLIDE 10

10 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Contributions

Fisher Vector Comparisons Fisher Vector Aggregation Bloom Filter Aggregation

  • Asymmetric comparisons for Fisher vectors
  • Cluttered query or database images
  • Fisher vector descriptors for video segments
  • Compact database for large-scale retrieval
  • Bloom filter descriptors for video segments
  • Fast and accurate large-scale retrieval
slide-11
SLIDE 11

11 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Related Work: Visual Search Query Database

Image Video Videos Images

FV [Perronnin et al., ’07] SIFT [Lowe, ’04] TCD [Makar et al., ’12] Hybrid Vis. Search [Chen et al., ’14] Frame Mat. + ST [Douze et al., ’10] TRECVID-CCD [Over et al., ’12]

Traditional Visual Search Augmented Reality Content Tracking

BoW [Sivic et al., ’03]

Video Retrieval by Image

Discussed on next slide

slide-12
SLIDE 12

12 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Related Work: Video Retrieval Using Images

  • Early work

– BoW retrieval of movie frames [Sivic and Zisserman, ICCV’03] – Object-level retrieval of movie shots [Sivic et al., ECCV’04]

  • TRECVID Instance Search Challenge [Over et al., TRECVID’10-15]

– Frame-based BoW with Color SIFT [Le et al., ’10-11] – Shot-based aggregation using BoW [Zhu et al., ’13] [Ballas et al., ’14] – BoW query-adaptive asymmetrical dissimilarities [Zhu et al., ’13]

  • Object localization in videos

– SURF-based matching per shot [Apostolidis et al., ICME’13] – Optimal path using dynamic programming [Meng et al. ICIP’15]

slide-13
SLIDE 13

13 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Background: Pairwise Image Matching

Image features

Descriptor1 Descriptor2 Descriptorn

Query image Database image

Interest Point Detection Local Descriptor Extraction Descriptor Matching

slide-14
SLIDE 14

14 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

  • State-of-the-art technique for large-scale retrieval
  • Key property: represent a set of local descriptors by a

compact fixed-length vector

à Two images can be compared by comparing their Fisher vectors

  • Construction: describe an image with aggregated Fisher

scores of its local descriptors

– Local descriptor distribution: Gaussian Mixture Model (GMM) – Usually only Gaussian means are taken into account

  • Extension of Bag-of-Words technique [Sivic and Zisserman, ICCV’03]

Background: Fisher Vector (FV)

[Perronnin and Dance, CVPR’07]

slide-15
SLIDE 15

15 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Background: Fisher Vector (FV)

[Perronnin and Dance, CVPR’07] Descriptor space Query FV

  • 0.2

0.2

  • 0.3 -0.3 -0.3

0.8

Query image Database image 1 DB Im. 1 FV

  • 0.3

0.3 0.3

  • 0.6 -0.3

0.3

Database image 2 DB Im. 2 FV

0.5

  • 0.2 -0.7

0.1

  • 0.6

… …

slide-16
SLIDE 16

16 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Background: Binarized Fisher Vector (FV*)

[Perronnin et al., CVPR’10] Descriptor space Query FV*

1

Query image Database image 1 DB Im. 1 FV*

1 1 1

Database image 2 DB Im. 2 FV*

1 1

… …

slide-17
SLIDE 17

17 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Contribution 1

Fisher Vector Comparisons Fisher Vector Aggregation Bloom Filter Aggregation

  • Asymmetric comparisons for Fisher vectors
  • Cluttered query or database images
  • Fisher vector descriptors for video segments
  • Compact database for large-scale retrieval
  • Bloom filter descriptors for video segments
  • Fast and accurate large-scale retrieval
slide-18
SLIDE 18

18 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Asymmetric Image Comparison

How can we incorporate asymmetry in FV comparisons?

Query image Database image Object retrieval application Video bookmarking application

slide-19
SLIDE 19

19 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Asymmetric Comparison for FV

Fisher vector = [v1, v2, …, vK] … Regions and have different statistics à features from are usually not present in

slide-20
SLIDE 20

20 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Asymmetric Comparison for FV

  • FV comparison metric: cosine similarity
  • We want:

θ1 < θ2

  • Common failure case:

θ1 > θ2 but θ1’ < θ2

  • Insight:

Compare query and database based on their projections to the x-y plane (i.e., using only Gaussians visited by query) x y z q m n m'

q query m correct match in database n incorrect match in database θ1 = angle(q, m) θ2

= angle(q, n)

θ1’ = angle(q, m’)

θ1 θ2 θ1’

slide-21
SLIDE 21

21 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Asymmetric Comparison for FV

Descriptor space Image Original FV

0.7 0.2

  • 0.5

0.2

  • 0.2

0.2

Modified FV

0.8 0.3

  • 0.5

0.3

Re-norm. Zero Gaussian not visited by this image

slide-22
SLIDE 22

22 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Asymmetric Comparison for FV

  • Two retrieval problems

– Query contained in database

All database images compared to query based on the same subspace

– Database contained in query

Problem: each database image is compared to the query based on different subspaces Solution: introduce weight to favor database images with more visited Gaussians Query Database Query Database

Query image defines projection Database image defines projection

slide-23
SLIDE 23

23 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Dataset: Query Contained in Database

Query Reference Clutter Distractor 200

… … …

9,800

+ + + + … … … …

From 0 to 40 clutter images

Query Database

slide-24
SLIDE 24

24 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Dataset: Database Contained in Query

Query Reference Distractor 200

… … …

9,800 From 0 to 40 clutter images Clutter

+ + … …

Query Database

slide-25
SLIDE 25

25 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Number of clutter images

100 101

mAP (%)

10 20 30 40 50 60 70 80 90

2048 Gaussians

FV Asym. FV⋆ Asym. FV Baseline FV⋆ Baseline

Number of clutter images

100 101

mAP (%)

10 20 30 40 50 60 70 80 90

2048 Gaussians

FV Asym. FV⋆ Asym. FV Baseline FV⋆ Baseline

Experiments: Asymmetric FV Comparisons

25% 25%

Query contained in database Database contained in query

slide-26
SLIDE 26

26 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Contribution 2

Fisher Vector Comparisons Fisher Vector Aggregation Bloom Filter Aggregation

  • Asymmetric comparisons for Fisher vectors
  • Cluttered query or database images
  • Fisher vector descriptors for video segments
  • Compact database for large-scale retrieval
  • Bloom filter descriptors for video segments
  • Fast and accurate large-scale retrieval
slide-27
SLIDE 27

27 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Query image Query descriptor Query-to- clips 1 2 3 Query-to- frames Frame short-list 1 2 3 Geometric verification Final result 1 2

Clip index Frame index Feature index

Feature matching

Large-Scale Architecture

Clip short-list

slide-28
SLIDE 28

28 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Temporal Structure

Frames

1 fps

Shots

Contain similar frames Length of seconds

Clips

Contain diverse shots Length of minutes

slide-29
SLIDE 29

29 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Clip Fisher Vector

Frames Local descriptors

0.2 0.3 0.04 0.2 0.26 0.03

Clip FV

FV Aggregation

slide-30
SLIDE 30

30 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Clip Fisher Vector with Tracked Features

Frames Local descriptors

0.2 0.4 0.03 0.2 0.27 0.03

Clip FV-TF FV Aggregation Feature Tracking + Aggreg.

slide-31
SLIDE 31

31 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Averaged Shot Fisher Vectors

Frames Local descriptors

0.2 0.4 0.03 0.2 0.27 0.03

FV Aggregation

  • Avg. Shot FV

FV Aggregation FV Aggregation

0.1 0.5 0.1 0.2 0.4

Shot FVs

0.2 0.3 0.2 0.2 0.1

0.3 0.4 0.2 0.2

slide-32
SLIDE 32

32 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Datasets

Query Database

News Videos Video Bookmarking Lecture Videos

  • 229 queries (from web)
  • 2.7 minutes/clip
  • 50.6 shots/clip
  • Versions

– 600k frames, 164h, 3.4k clips – 4M frames, 1,079h, 24.3k clips

Query Database

  • 282 queries (smartphone pics)
  • 2.7 minutes/clip
  • 50.6 shots/clip
  • Versions

– 600k frames, 164h, 3.4k clips – 4M frames, 1,079h, 24.3k clips

Query Database

  • 258 queries (slides)
  • 8.2 minutes/clip
  • 58.8 shots/clip
  • Versions

– 600k frames, 169h, 1.1k clips – 1.5M frames, 408h, 2.9k clips

slide-33
SLIDE 33

33 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Experiments: Comparison of Techniques

News

mAP (%)

10 20 30 40 50

  • Vid. Bookm.

mAP (%)

10 20 30 40 50 60

Clip FV⋆ Clip FV⋆-TF

  • Avg. Shot FV⋆

Lectures

mAP (%)

10 20 30 40 50 60 70

slide-34
SLIDE 34

34 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Number of Gaussians

500 1000 1500 2000

mAP (%)

25 30 35 40 45 50 55 60 65 70

Clip FV⋆ Asym. Clip FV⋆ Sym.

Experiments: Lecture Videos Dataset

30%

slide-35
SLIDE 35

35 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Index size (bytes)

106 107 108 109

mAP (%)

35 40 45 50 55 60 65 70 75 80

Clip FV⋆ Frame FV⋆ (Baseline)

Experiments: Lecture Videos Dataset

~100X

slide-36
SLIDE 36

36 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Index size (bytes)

106 107 108 109

mAP (%)

40 45 50 55 60 65 70 75 80 85 90

Clip FV⋆ Frame FV⋆ (Baseline)

Experiments: Video Bookmarking Dataset

~43X 26%

slide-37
SLIDE 37

37 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Index size (bytes)

106 107 108 109

mAP (%)

20 30 40 50 60 70 80

Clip FV⋆ Frame FV⋆ (Baseline)

Experiments: News Videos Dataset

~43X 33%

slide-38
SLIDE 38

38 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Contribution 3

Fisher Vector Comparisons Fisher Vector Aggregation Bloom Filter Aggregation

  • Asymmetric comparisons for Fisher vectors
  • Cluttered query or database images
  • Fisher vector descriptors for video segments
  • Compact database for large-scale retrieval
  • Bloom filter descriptors for video segments
  • Fast and accurate large-scale retrieval
slide-39
SLIDE 39

39 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Aggregation Methods

Descriptor space Frame residuals Clip residuals Zoom

slide-40
SLIDE 40

40 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Aggregation Methods

Clip residuals 1) Clip FV à increase number of Gaussians 2) Frame FV + temporal aggregation by hashing 3) Spatio-temporal hashing Aggregation using Bloom filters

slide-41
SLIDE 41

41 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Bloom Filter (BF)

1 2 3 4 5 6 1 2 3 4 5 6 7 7 d1 d2 d3

Blue: h2

q1 q2 8 8 = { d1 , d2 , d3 } b1 b2

q1: 2 matches (TP) q2: 2 matches (FP)

Red: h1

0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0

slide-42
SLIDE 42

42 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

BF using Global Descriptors (BF-GD)

FV aggregation per frame

Hash functions

hm(v), m = 1,…, M

Bloom filter

Features per frame

0.1

  • 0.3

0.05

  • 0.2

0.2 0.03

… Video clip Frames Fisher embedding per feature

0.2

  • 0.6

0.1

  • 0.4

0.4 0.06

… …

slide-43
SLIDE 43

43 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

BF using Point-Indexed Descriptors (BF-PI)

Hash functions

hm(v), m = 1,…, M

Bloom filter

Features per frame Video clip Frames Fisher embedding per feature

0.2

  • 0.6

0.1

  • 0.4

0.4 0.06

… …

slide-44
SLIDE 44

44 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Hash Functions

Locality-Sensitive Hashing (LSH) Random hyperplanes Vector Quantization (VQ) Trained using Approximate K-Means One hyperplane per bit # bits = log2(# centroids)

slide-45
SLIDE 45

45 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Number of bits per hash

5 10 15 20 25 30

mAP (%)

10 20 30 40 50 60 70 80 BF-PI LSH BF-GD LSH

Experiments: BF-GD vs BF-PI

Visual Discriminativeness Visual Invariance Dataset: News Videos – 600k

slide-46
SLIDE 46

46 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Number of bits per hash

5 10 15 20 25 30

mAP (%)

10 20 30 40 50 60 70 80 BF-PI VQ BF-PI LSH

Experiments: BF-PI with Different Hashes

Dataset: News Videos – 600k

slide-47
SLIDE 47

47 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Experiments: Results on 600k Datasets

News

  • Vid. Bookm.

Lectures mAP (%) 10 20 30 40 50 60 70 80

BF-PI Clip FV⋆

26% 10%

slide-48
SLIDE 48

48 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Index size (GB) 5 10 15 20 25

Experiments: Large-Scale with Re-Ranking

Dataset: News Videos – 4M

mAP (%) 10 20 30 40 50 60 70 80

BF-PI and Clip FV* results are re-ranked using Shot-FV* descriptors

  • Ret. latency (secs)

0.2 0.4 0.6 0.8 1

BF-PI Clip FV⋆ Frame FV⋆

24% 10X 3X 2X 7X

slide-49
SLIDE 49

49 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Index size (GB) 5 10 15 20 25

Experiments: Large-Scale with Re-Ranking

Dataset: Lecture Videos – 1.5M

mAP (%) 10 20 30 40 50 60 70

BF-PI and Clip FV* results are re-ranked using Shot-FV* descriptors

  • Ret. latency (secs)

0.2 0.4 0.6 0.8 1

BF-PI Clip FV⋆ Frame FV⋆

6X 5X 6X 18X

slide-50
SLIDE 50

50 Andre Araujo – Large-Scale Video Retrieval Using Image Queries

Conclusions

Fisher Vector Comparisons Fisher Vector Aggregation Bloom Filter Aggregation

  • Asymmetric comparisons by projecting cluttered FVs
  • Studied two asymmetric retrieval problems
  • Large retrieval gains (up to 25% mAP) in both cases
  • Fisher vector aggregation over video segments
  • Simple aggregation outperforms other techniques
  • Effective retrieval with 100X compression for lectures dataset
  • Bloom filter aggregation over video segments
  • Studied hash functions and spatio-temporal aggregation schemes
  • Lighter, faster than frame-based schemes, with similar accuracy