[PPT] - L Leveraging Internet Data i I t t D t IM2GPS: Estimating PowerPoint Presentation

SLIDE 1

L i I t t D t Leveraging Internet Data

IM2GPS: Estimating Geographic Information from a Single Image

(by James Hays and Alexei Efros) (by James Hays and Alexei Efros)

Adriana Kovashka CS PhD Student

SLIDE 2

Wh Where is this? is this?

Italy

SLIDE 3

d thi ? … and this?

Wales

SLIDE 4

O i f IM2GPS Overview of IM2GPS

Intuition

“What is it like?” vs. “What is it?”

Data

6 million geo-tagged images from Flickr

g gg g

Method

Represent images in 6 ways, compare

p g y , p

Result

Estimated image location

st ated age oca o

SLIDE 5

R t ti i IM2GPS Representations in IM2GPS

Tiny Images Color histograms Color histograms Texton histograms

Li f t

Line features Gist descriptor with color Geometric context

SLIDE 6

IM2GPS R lt IM2GPS Results

Hays 2008

SLIDE 7

N t th T k Note on the Task

This is not scene categorization

Specific locations used Specific locations used “Urban vs. natural” insufficient Can think of current task as place recognition* Can think of current task as place recognition

SLIDE 8

D O i Demo Overview

Data

50096 images (incl. 237 test images) 50096 images (incl. 237 test images) 100 most populated cities in the world

Representations Representations

Gist, color, Tiny Images

C i

Comparison

K-nn

SLIDE 9

P d Procedure

Use code by Hays to query/download

Flickr images

about 3 days

Download, modify, run Gist code

about 30 hours

Test

about 6 hours for 7000 images 10 min for 237 test images

SLIDE 10

R t ti Representations

Gist (512 dim)

Used Torralba’s scene recognition code

Color (32 dim)

Computed histograms in Lab* color space

p g p

4 bins for L, 14 for a and b

Tiny Images (768 dim)

y g ( )

Resized images to 16x16x3 Vectors of color pixels

p

SLIDE 11

C i M th d Comparison Methods

Method One

Sim(x, y) = inner product between concatenation of

three representations of x and y

Method Two*

Sim(x, y) = exp(-distA/σA)*exp(-distB/σB)*exp(-distC/σC) distA = Euclidian distance between representations A

f x and y
f x and y

σA = mean of distances for representation A

SLIDE 12

N t th C t ti f Note on the Computation of σ

C t t ti

Current computation

X – matrix of n-dim features for all m images Subtract mean(X) from all rows of X Subtract mean(X) from all rows of X Square result Sum rows Take square roots of sums Take mean of resulting column

Better computation Better computation

Average of Euclidian distance between i and j for

each pair of images (i, j)

Computationally very expensive

SLIDE 13

D t t Dataset

Queried for 104 city tags Negative tags to remove duplicates, noise

g g p ,

Downloaded images uploaded over 2

weeks

50096 images from Flickr (237 test)

6M in IM2GPS (more tags, time) 6M in IM2GPS (more tags, time)

Disproportionate image set sizes per city!

SLIDE 14

'Abidjan' [0] 'Chongqing' [37] 'London' [2891] 'RiodeJaneiro' [1135] 'Ahmedabad' [3] 'Alexandria' [152] 'Ankara' [10] 'Athens' [213] 'Atlanta' [843] 'Dallas' [459] 'Delhi' [169] 'Detroit' [263] 'Dhaka' [55] 'Dongguan' [0] 'LosAngeles' [1442] 'Madras' [1] 'Madrid' [1822] 'Manila' [230] 'Medellin' [0] 'Riverside' [215] 'Riyadh' [1] 'Rome' [1328] 'Ruhr' [53] 'Saigon' [252] Atlanta [843] 'Baghdad' [3] 'Bandung' [114] 'Bangalore' [477] 'Bangkok' [428] 'B l ' [2221] Dongguan [0] 'Guadalajara' [71] 'Guangzhou' [68] 'Guiyang' [0] 'Hanoi' [158] 'H bi ' [76] Medellin [0] 'Melbourne' [529] 'MexicoCity' [59] 'Miami' [1280] 'Milan' [362] 'M t ' [26] Saigon [252] 'SaintPetersburg' [44] 'Salvador' [867] 'SanFrancisco' [2204] 'Santiago' [365] 'S P l ' [229] 'Barcelona' [2221] 'Beijing' [658] 'BeloHorizonte' [3] 'Berlin' [1655] 'Bogota' [404] 'Harbin' [76] 'HoChiMinhCity' [9] 'HongKong' [835] 'Houston' [461] 'Hyderabad' [19] 'Monterrey' [26] 'Montreal' [0] 'Moscow' [291] 'Mumbai' [270] 'NYC' [2383] 'SaoPaulo' [229] 'Seoul' [364] 'Shanghai' [118] 'Shenyang' [0] 'Shenzhen' [12] g [ ] 'Bombay' [16] 'Boston' [1631] 'Brasilia' [97] 'BuenosAires' [132] 'Busan' [0] y [ ] 'Istanbul' [681] 'Jakarta' [50] 'Johannesburg' [300] 'Karachi' [9] 'Khartoum' [6] [ ] 'Nagoya' [23] 'Nanjing' [17] 'NewYorkCity' [483] 'Osaka' [222] 'Paris' [3052] [ ] 'Singapore' [1118] 'Surat' [0] 'Sydney' [1541] 'Taipei' [546] 'Tehran' [19] Busan [0] 'Cairo' [107] 'Calcutta' [4] 'Chengdu' [225] 'Chennai' [114] Khartoum [6] 'Kinshasa' [0] 'Kolkata' [91] 'KualaLumpur' [56] 'Lagos' [25] Paris [3052] 'Philadelphia' [883] 'Phoenix' [504] 'PortoAlegre' [69] 'Pune' [5] Tehran [19] 'Tianjin' [8] 'Tokyo' [1992] 'Toronto' [2009] 'WashingtonDC' [2031] 'Chicago' [2796] 'Chittagong' [0] 'Lahore' [8] 'Lima' [97] 'Pyongyang' [13] 'Recife' [221] 'Wuhan' [18] 'Yangon' [3]

SLIDE 15

Bangalore Bangalore

SLIDE 16

Boston Boston

SLIDE 17

Boston Boston

SLIDE 18

Cairo Cairo

SLIDE 19

Istanbul Istanbul

SLIDE 20

London London

SLIDE 21

London London

SLIDE 22

Los Angeles Los Angeles

SLIDE 23

Madrid Madrid

SLIDE 24

Milan Milan

SLIDE 25

Moscow Moscow

SLIDE 26

Mumbai Mumbai

SLIDE 27

Paris Paris

SLIDE 28

Rome Rome

SLIDE 29

San Francisco San Francisco

SLIDE 30

San Francisco San Francisco

SLIDE 31

Sao Paolo Sao Paolo

SLIDE 32

Tokyo Tokyo

SLIDE 33

Tokyo Tokyo

SLIDE 34

Query 1 - Greece Query 1 Greece

SLIDE 35

Query 2 - Arizona Query 2 Arizona

SLIDE 36

Query 3 - Switzerland Query 3 Switzerland

SLIDE 37

O i f R lt Overview of Results

Evaluation

Percentage of correct classifications Percentage of correct classifications Percentage of top m neighbors within n km of

query image q y g

Average distance of neighbors

Tests Tests

n 237 test images
n 7000 images from dataset
n 7000 images from dataset

SLIDE 38

Ch f T t I (200k ) Chance for Test Images (200km)

er all k per image ove Chance Images 1 to 237

Chance is pretty low for this data.

SLIDE 39

Ch f T t I ( t’d) Chance for Test Images (cont’d)

er all k nce per run ove Average chan Run number

Chance is pretty low for this data.

SLIDE 40

T t I % /i 200k M1 Test Images, % w/in 200km, M1

0 14 0.16 0.18 0.2 0.06 0.08 0.1 0.12 0.14 % within 200km k=1 k=4 k=8 0.02 0.04 Gist C

lor

T iny Images Gist + C

lor

Gist + T iny C

lor +

T iny All k=8 k=12 k=16 Images C

lor

T iny Images T iny Images Feature Types

Gist seems to perform best with M1.

SLIDE 41

T t I % /i 200k M2 Test Images, % w/in 200km, M2

0 14 0.16 0.18 0.2 0.06 0.08 0.1 0.12 0.14 % within 200km k=1 k=4 k=8 0.02 0.04 Gist C

lor

T iny Images Gist + C

lor

Gist + T iny C

lor +

T iny All k=8 k=12 k=16 Images C

lor

T iny Images T iny Images Feature Types

M2 works worse than M1.

SLIDE 42

T t I % /i 1000k M1 Test Images, % w/in 1000km, M1

0 14 0.16 0.18 0.2 0.06 0.08 0.1 0.12 0.14 % within 1000km k=1 k=4 k=8 0.02 0.04 Gist C

lor

T iny Images Gist + C

lor

G ist + T iny C

lor +

T iny All k=8 k=12 k=16 Images C

lor

T iny Images T iny Images Feature Types

Results are naturally much better with larger distance allowed.

SLIDE 43

IM2GPS R lt IM2GPS Results

Hays 2008

SLIDE 44

D t t A M1 Dataset, Accuracy, M1

0 16 0.18 0.2 0 08 0.1 0.12 0.14 0.16 A ccuracy Images 501-4000 0.02 0.04 0.06 0.08 Images 4001-7500 k=1 k=4 k=8 k=12 k=16 A ll Feature Types

Results are much better with more test images.

SLIDE 45

D t t A M2 Dataset, Accuracy, M2

0 16 0.18 0.2 0 08 0.1 0.12 0.14 0.16 A ccuracy Images 501 4000 0.02 0.04 0.06 0.08 Images 501-4000 k=1 k=4 k=8 k=12 k=16 A ll Feature Types

M2 performs worse than M1.

SLIDE 46

D t t % /i 200k M1 Dataset, % w/in 200km, M1

0 16 0.18 0.2 0.08 0.1 0.12 0.14 0.16 % within 200km Images 501-4000 0.02 0.04 0.06 k 1 k 4 k 8 k 12 k 16 Images 4001-7500 k=1 k=4 k=8 k=12 k=16 A ll Feature Types

Again, with more test images, results are more similar to the authors’.

SLIDE 47

D t t % /i 500k M1 Dataset, % w/in 500km, M1

0 16 0.18 0.2 0.08 0.1 0.12 0.14 0.16 % within 500km Images 501-4000 0.02 0.04 0.06 k 1 k 4 k 8 k 12 k 16 Images 4001-7500 k=1 k=4 k=8 k=12 k=16 A ll Feature Types

As expected, results improve when larger distance allowed.

SLIDE 48

D t t % /i 1000k M1 Dataset, % w/in 1000km, M1

0 16 0.18 0.2 0.08 0.1 0.12 0.14 0.16 % within 1000km Images 501-4000 0.02 0.04 0.06 k=1 k=4 k=8 k=12 k=16 Images 4001-7500 k=1 k=4 k=8 k=12 k=16 A ll Feature Types

As expected, results improve when larger distance allowed.

SLIDE 49

Sydney Sydney Query Image (Argentina/Paraguay/Brazil) Cairo Features: Tiny Images

SLIDE 50

Chicago g Query Image (Barcelona) Toronto Features: Tiny Images

SLIDE 51

Recife Recife Query Image (Barcelona) Tokyo Features: Tiny Images

SLIDE 52

Sydney Sydney S d Query Image (Nassau, near Havana) Sydney Features: Tiny Images

SLIDE 53

Washington DC Washington DC Boston Query Image (Hyderabad) Features: Tiny Images

SLIDE 54

Dallas Query Image (Athens) Rome Features: Gist

SLIDE 55

Rio de Janeiro Rio de Janeiro B l Query Image (Guatemala) Barcelona Features: Gist

SLIDE 56

Barcelona Barcelona B l Barcelona Query Image (Barcelona) Features: Gist

SLIDE 57

Chi Chicago Query Image (Aruba) Features: Gist Chicago

SLIDE 58

Paris Moscow Query Image (Florida) Features: Gist

SLIDE 59

Los Angeles Query Image (Iceland) Melbourne Features: Gist

SLIDE 60

Toronto Query Image (Germany) Features: Color Toronto

SLIDE 61

Hays 2008

SLIDE 62

Hays 2008

SLIDE 63

Hays 2008

SLIDE 64

Hays 2008

SLIDE 65

Hays 2008

SLIDE 66

Ob ti Observations

The image set is rather difficult Some suggestions are useful in various Some suggestions are useful in various

ways, some are very bad

Scaling might improve results with a Scaling might improve results with a

differently set σ Thi h i

This approach requires an enormous

dataset to work well!

SLIDE 67

Di i Discussion

In what ways are the returned suggestions

useful?

Can we say the dataset is “noisy”? How can this method be improved? How can this method be improved?

SLIDE 68

R f d Li k References and Links

J. Hays and A. Efros. IM2GPS: Estimating Geographic

Information from a Single Image. CVPR 2008. http://graphics.cs.cmu.edu/projects/im2gps/ http://graphics.cs.cmu.edu/projects/im2gps/

A. Torralba, R. Fergus, and W. Freeman. 80 Million Tiny

Images: a Large Dataset for Non-Parametric Object and Scene Recognition PAMI 2008 Scene Recognition. PAMI 2008. http://people.csail.mit.edu/torralba/tinyimages/

A. Oliva and A. Torralba. Modeling the Shape of the

S H li ti R t ti f th S ti l Scene: a Holistic Representation of the Spatial

Envelope. IJCV 2001.

http://people.csail.mit.edu/torralba/code/spatialenvelope/

SLIDE 69

R f d Li k ( t’d) References and Links (cont’d)

P. Getreuer. Color Space Converter. Matlab Central.

http://www.mathworks.com/matlabcentral/fileexchange/7744

Distance Calculation. Meridian World Data.

Distance Calculation. Meridian World Data. http://www.meridianworlddata.com/Distance-Calculation.asp

Online Conversion – Unix time conversion.

http://www.onlineconversion.com/unix time.htm http://www.onlineconversion.com/unix_time.htm

A. Mehrtash. demo links.

http://users.ece.utexas.edu/~mehrtash/SceneRecognitionDemo/

A Kovashka IM2GPS (Hays and Efros) Demo

A. Kovashka. IM2GPS (Hays and Efros) Demo.

http://www.cs.utexas.edu/~adriana/im2gps_demo.html

L i I t t D t Leveraging Internet Data

IM2GPS: Estimating Geographic Information from a Single Image

(by James Hays and Alexei Efros) (by James Hays and Alexei Efros)

Adriana Kovashka CS PhD Student

Wh Where is this? is this?

d thi ? … and this?

O i f IM2GPS Overview of IM2GPS

Intuition

“What is it like?” vs. “What is it?”

Data

6 million geo-tagged images from Flickr

g gg g

Method

Represent images in 6 ways, compare

p g y , p

Result

Estimated image location

st ated age oca o

R t ti i IM2GPS Representations in IM2GPS

Tiny Images Color histograms Color histograms Texton histograms

Li f t

Line features Gist descriptor with color Geometric context

IM2GPS R lt IM2GPS Results

N t th T k Note on the Task

This is not scene categorization

Specific locations used Specific locations used “Urban vs. natural” insufficient Can think of current task as place recognition* Can think of current task as place recognition

D O i Demo Overview

Data

50096 images (incl. 237 test images) 50096 images (incl. 237 test images) 100 most populated cities in the world

Representations Representations

Gist, color, Tiny Images

C i

Comparison

K-nn

P d Procedure

Use code by Hays to query/download

Flickr images

about 3 days

Download, modify, run Gist code

about 30 hours

Test

about 6 hours for 7000 images 10 min for 237 test images

R t ti Representations

Gist (512 dim)

Used Torralba’s scene recognition code

Color (32 dim)

Computed histograms in L*a*b* color space

p g p

4 bins for L, 14 for a and b

Tiny Images (768 dim)

y g ( )

Resized images to 16x16x3 Vectors of color pixels

p

C i M th d Comparison Methods

three representations of x and y

N t th C t ti f Note on the Computation of σ

C t t ti

each pair of images (i, j)

D t t Dataset

Queried for 104 city tags Negative tags to remove duplicates, noise

g g p ,

Downloaded images uploaded over 2

weeks

50096 images from Flickr (237 test)

6M in IM2GPS (more tags, time) 6M in IM2GPS (more tags, time)

Disproportionate image set sizes per city!

Bangalore Bangalore

Boston Boston

Boston Boston

Cairo Cairo

Istanbul Istanbul

London London

London London

Los Angeles Los Angeles

Madrid Madrid

Milan Milan

Moscow Moscow

Mumbai Mumbai

Paris Paris

Rome Rome

Computed histograms in Lab* color space