Different Modes of Semantic Representation in Image Retrieval By - - PowerPoint PPT Presentation

▶

Mar 10, 2024 129 likes •344 views

Different Modes of Semantic Representation in Image Retrieval By Rory Bennett Advisor: Kristina Striegnitz Image Retrieval dog war Concreteness & Imageability Abstract(less concrete), less Concrete, less imageable: concept imageable:

SLIDE 1

Different Modes of Semantic Representation in Image Retrieval

By Rory Bennett Advisor: Kristina Striegnitz

SLIDE 2

Image Retrieval

dog war

SLIDE 3

Concreteness & Imageability

Abstract(less concrete), less imageable: concept Abstract, more imageable: plead Concrete, less imageable: argue Concrete, more imageable:

SLIDE 4

Text-based Image Retrieval (TBIR)

This woman is giving her dog a kiss

dog; kiss

Images with captions Text-based image retrieval system

SLIDE 5

Text-based Image Retrieval (TBIR)

This woman is giving her dog a kiss

dog; kiss

Images with captions Text-based image retrieval system

love; war ???

SLIDE 6

Retrieval Based on Word Similarity

elegant

Image database Text-based image retrieval system Word comparison technique Words returned by comparison technique, that also tag images

The tuxedo is the perfect formal garb.

SLIDE 7

Semantic Vector Representations

elegant: [-0.081428, 0.102486, -0.198815 , -0.145852 , -0.148051, …] tuxedo: [-0.116671, -0.163012, -0.094523, -0.108007, 0.084851, …] fear: [0.121500, -0.413079, -0.040310, 0.113604, -0.353846, …]

elegant tuxedo elegant fear elegant tuxedo

Sample Text

SLIDE 8

Semantic Vector Representations (cont.)

All vectors are mapped to a common vector space, to compare

vector cosines and thus find words with similar meanings

*a, b represent cosine distances between semantic vectors y x elegant majestic tuxedo swan chocolate fear a b

SLIDE 9

Vector Comparison, Approach A

Query term

Entire Image Dataset

Image 1 Image n

Caption word 1 Caption word 2

. .

Caption word k

. .

Semantic Vector k

Normalized average semantic vector

. . . . . . .

Semantic Vector 1

Query term’s semantic vector Vector comparison

SLIDE 10

Vector Comparison, Approach B

Query term Images directly tagged by words most similar to query term Image 1 Image i . . . Image n . .

Caption word 1 Caption word 2

. .

Caption word k

. .

Semantic Vector k

Normalized average semantic vector

. .

Semantic Vector 1

Query term’s semantic vector Vector comparison

SLIDE 11

Abstract Words’ Meanings Encapsulate Concrete Words’ Meanings

Lawrence W. Barsalou, Katja Wiemer-Hastings: abstract

terms provide more general, overarching descriptions of images related to concrete terms

Google query for abstract term, “love”:

SLIDE 12

Augmenting Textual Data With Perceptual Information

Felix Hill and Anna Korhonen used the Text8 textual corpus,

and perceptual datasets comprising captioned images and feature-annotations of cue words.

The dog sits happily on the porch ...

Images with captions

dog, fur, tail, kibble, ... Insert words into text corpus

Text Corpus

. . . . . . .

SLIDE 13

Experiment – Five Approaches

Retrieve images directly tagged by query term
Apply Approach A on plain Text8 corpus
Apply Approach B on plain Text8
Apply Approach A on augmented Text8
Apply Approach B on augmented Text8

SLIDE 14

Experiment – Query Terms

Less concrete, less imageable nouns Less concrete, more imageable nouns More concrete, less imageable nouns More concrete, more imageable nouns More concrete, less imageable verbs Less concrete, more imageable verbs Less concrete, less imageable verbs More concrete, more imageable verbs

SLIDE 15

Experiment – Results, Part I

SLIDE 16

Results – Part II

SLIDE 17

Results – Part III

SLIDE 18

Conclusions

Utilizing perceptual information to form semantic vectors does

not significantly inhibit, and can actually improve, the relevance of returned images.

There is at least some (if insignificant) increase in the

relevance of retrieved images when switching from applying Approach A to applying Approach B for a single textual corpus.

If we assume that results from direct tagging are ideal,

regardless of their paucity, then this indicates that including perceptual data brings retrieval closer to this ideal

SLIDE 19

Future Work

Focus on vector representations for words whose part of

speech is typically very abstract, e.g., adverbs

Better account for representation words with multiple diverse