Harbin Institute of Technology Microsoft Research Asia Microsoft - - PowerPoint PPT Presentation

harbin institute of technology microsoft research asia
SMART_READER_LITE
LIVE PREVIEW

Harbin Institute of Technology Microsoft Research Asia Microsoft - - PowerPoint PPT Presentation

ACM MM 2010 Dong Liu , Xian-Sheng Hua, Meng Wang and Hong-Jiang Zhang Harbin Institute of Technology Microsoft Research Asia Microsoft Advanced Technology Center medici chapel, Firenze, Italy... Loggia dei lanzi, sword, honeymoon, ...


slide-1
SLIDE 1

Dong Liu, Xian-Sheng Hua, Meng Wang and Hong-Jiang Zhang Harbin Institute of Technology Microsoft Research Asia Microsoft Advanced Technology Center

ACM MM 2010

slide-2
SLIDE 2

2 medici chapel, Firenze, Italy... Loggia dei lanzi, sword, honeymoon, ... Status, building, sky, Italy, ... Cathedral, tower, Italy...

slide-3
SLIDE 3

Social tags are good, but they are

Noisy Ambiguous Incomplete No relevance information

Two directions to improve tag quality

Tag Ranking (Liu, Hua, Zhang. Tag Ranking. WWW 09) Retagging (Liu, Hua, Wang, Zhang. Image Retagging. MM 10)

3

slide-4
SLIDE 4

kitty boy top101 young lovely

Imprecise Tags Subjective Tags Missing Tags

Tags associated with social images are imprecise, subjective and incomplete.

grass flower cat animal

4

slide-5
SLIDE 5

5

To improve:

Tag-based image search Image annotation (automatic tagging)

What we are going to do:

Improve the quality of the tags to better describe content.

dog house tree sky ground cloud top 101 tour tiger sweet big cloud

slide-6
SLIDE 6

6

But how can we make it? Automatically.

slide-7
SLIDE 7

Similar images similar tags User-provided tags correlate with the image content with high probability

7

bear water wildlife nature river bear animal bath

power boy zoo tiger father Nikon cat animal garden rabit

Visual and Semantic Consistency Prior Knowledge

slide-8
SLIDE 8

Tag Refinement

The consistency between visual similarity and semantic similarity should be maximized. The deviation from the initially user-provided tags should be minimized.

8

slide-9
SLIDE 9

9

Notations

slide-10
SLIDE 10

Visual and semantic consistency User-provided tags are relevant with high probability

10

Modeling the basic assumptions

Overall formulation

slide-11
SLIDE 11

Optimizing with iterative updating

11

Bound the objective function Derive the solution Iterative updating until convergence

slide-12
SLIDE 12

12

Is It Reliable ?

slide-13
SLIDE 13

13

bird

Content -Related Tag Content -Unrelated Tag Describe the REAL visual content of the images. Informative for ALL general users. Describe the CONTEXTUAL information about the images . Only informative to the image owners.

beach

bike

sunset

flower

animal baby

autumn

night

dog

  • cean

cat

grass

fun

deleteme

top101

best Nikon photo

  • ld

my macro raw

science

live

slide-14
SLIDE 14

14

Similar images have similar tags. Involving the content-unrelated tags will

Introduce lots of noises. Degrade the algorithmic performance.

Only applicable for “content-related” tags. RECALL These tags should be removed from the automatic learning procedure.

slide-15
SLIDE 15

Filter out all content-unrelated tags.

15

All words

Non-noun Noun

content-unrelated content-related

Organism Natural Pho. Thing Artifact Color kitty feline mammal animal

  • rganism

building structure artifact

Construct a content-related tag dictionary by using the lexical and domain knowledge Traverse along the path until one pre- defined category is matched

slide-16
SLIDE 16

16

Is It Enough?

slide-17
SLIDE 17

17

kitty kitty kitten cat pussy synonym feline animal hypernym kitty

The missing of such tags will degrade the performance

  • f tag-based applications.
slide-18
SLIDE 18

Make use of Wordnet lexicon

18 domestic cat chordate

  • rganism

kitty cat animal feline vertebrate kitty-cat kitty kitten pussy pussycat

Use each tag to perform tag-based image search on Flickr. The tags with more than 10,000 returned images are retained.

slide-19
SLIDE 19

19

Three-step strategy

slide-20
SLIDE 20

In term of average precision,recall and F1-Measure

50,000 Flickr images with 4,556 content-related tags. 2,500 test images. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Precision Recall F1-Measure Original CBAR Our Method

20

slide-21
SLIDE 21
slide-22
SLIDE 22

22

Method Precision Recall F1-measure Relevant tag num Before Enrichment 0.71 0.34 0.46 3.09 (4,80 in all ) After Enrichment 0.90 0.66 0.76 9.34 (10.38 in all)

Tagging quality is further improved rafter the tag enrichment procedure.

slide-23
SLIDE 23

Use the learnt confidence scores as relevance measure Ranking results for query “cat”

23

slide-24
SLIDE 24

24

Our confidence score based ranking strategy outperforms the other image ranking strategies on Flickr

slide-25
SLIDE 25

Use top tags of the images after retagging to predict the tags of the unlabeled images

25

slide-26
SLIDE 26

26

Using top tags after image retagging can obtain better results than using the original images directly

slide-27
SLIDE 27

27

User-provided tags are imprecise and incomplete, which limits the performance of tag-based applications We propose an image retagging strategy to solve this problem:

Tag filtering to remove the content-unrelated tags Tag refinement to automatically refine the tags Tag enrichment to expand the tags with synonyms and hypernyms.

Image retagging benefits a series of tag-based applications

slide-28
SLIDE 28

Extend it to online videos Using more fruitful information cues such as image regions and surrounding texts

28

slide-29
SLIDE 29

29