MediaMeter: A Global Monitor for Online News Coverage Tadashi - - PowerPoint PPT Presentation

mediameter a global monitor for online news coverage
SMART_READER_LITE
LIVE PREVIEW

MediaMeter: A Global Monitor for Online News Coverage Tadashi - - PowerPoint PPT Presentation

MediaMeter: A Global Monitor for Online News Coverage Tadashi Nomoto National Institute of Japanese Literature What we are aiming at Finding novel topics in news streams So far not much success in the literature (extraction, machine


slide-1
SLIDE 1

MediaMeter: A Global Monitor for Online News Coverage

Tadashi Nomoto National Institute of Japanese Literature

slide-2
SLIDE 2

Finding novel topics in news streams

So far not much success in the literature (extraction, machine learning)

What we are aiming at

slide-3
SLIDE 3

2000 4000 6000 8000 100 200 300 400 500 600 Rank of Topic Descriptor Frequency

Frequencies of (manually assigned) topic descriptors that appeared in the New York Times from June to December, 2013.

Problem

slide-4
SLIDE 4

Statistics

≤ 1 ≤ 2 ≤ 5 ≤ 10 42.3% 62.0% 78.9% 87.4%

Frequency > 10 : 12.6%

slide-5
SLIDE 5

SVM cannot handle a huge taxonomy (Liu, 2005) The number of unique topics in NYT over 6 months exceeds 8,000.

slide-6
SLIDE 6

Approach Memory Based Topic Label Generation WikiLabel

slide-7
SLIDE 7
  • 1. Look up Wikipedia to find pages most relevant to a news story
  • 2. Generate label candidates from page titles
  • 3. Pick those that are deemed most fit to represent the content

How it works: Overview

slide-8
SLIDE 8

WikiLabel: Concept Generation with Wikipedia

slide-9
SLIDE 9

WikiLabel: Concept Generation with Wikipedia

slide-10
SLIDE 10

WikiLabel: Concept Generation with Wikipedia

slide-11
SLIDE 11

l∗

~ ✓ = arg max l:p[l]∈U

Prox(p[l], ~ ✓|N),

Prox(p[l], ~ ✓|N) = Sr(p[l], ~ ✓|N) + (1 − )Lo(l, ~ ✓)

content similarity relevance of label

Concept Dictionary news story

Mechanics

slide-12
SLIDE 12

Sr(r, q) = ✓ 1 +

N

X

t

(q(t) − r(t))2 ◆−1 Lo(l, v) = P|l|

i I(l[i], v)

| l | − 1 I(w, v) = ( 1 if w ∈ v

  • therwise.
slide-13
SLIDE 13

Use sentence compression to generalize

What if Wikipedia does not know the event ….

slide-14
SLIDE 14

Example

2009 detention of American hikers by Iran detention detention by Iran detention of hikers detention of hikers by Iran detention of American hikers by Iran 2009 detention 2009 detention by Iran 2009 detention of hikers 2009 detention of hikers by Iran 2009 detention of American hikers by Iran

Making it shorter makes it more general

slide-15
SLIDE 15

detention 2009

  • f

hikers American by Iran

detention 2009

  • f

hikers American by Iran C1 C2 C3

Dependency pruning

slide-16
SLIDE 16

detention 2009

  • f

hikers American by Iran

Use every NP in the title as a resource

slide-17
SLIDE 17

American hikers dentetion hikers Iran 2009 dentetion of American hikers by Iran 2009 dentetion of hikers by Iran 2009 dentetion of hikers 2009 dentetion by Iran dentetion of hikers 2009 dentetion

you start here What you get with extension Original approach

slide-18
SLIDE 18

country media outlets #outlets #stories us/uk the new york times, yahoo, cnn, msnbc, fox, washington post, abc, bbc, reuters 9

2,230 (239,844)

south-korea joongang ilbo (English edition), chosun ilbo (English edition) 2

2,271(19,008)

japan asahi, jcast, jiji.com, mainichi, nhk, nikkei, sankei, tbs, tokyo, tv- asahi, yomiuri 11

2,815 (259,364)

Testing it out in the field

slide-19
SLIDE 19

0.001! 0.001! 0.001! 0.001! 0.002! 0.003! 0.006! 0.008! 0.011! 0.011! 0.014! 0.015! 0.016! 0.017! 0.024! 0.035! 0.069! 0.09! 0.144! 0! 0.1! 0.2! North-Korean abductions of Japanese citizens! North-Koreans! province North-Korea! rocket North-Korea! North-Korean famine! North-Korean floods! North-Korean nuclear test! North-Korea weapons of mass destruction! North-Korean abductions! North-Korean test! North-Korean missile test! North-Korea program! North-Korea United-States relations! North-Korea weapons! North-Korea South-Korea relations! North-Korea Russia relations! North-Korean defectors! North-Korea nuclear program! North-Korea relations!

Topic Popularity (South Korea)!

0.001! 0.002! 0.002! 0.002! 0.003! 0.004! 0.005! 0.008! 0.009! 0.012! 0.018! 0.018! 0.021! 0.025! 0.028! 0.032! 0.039! 0.143! 0.18! 0! 0.1! 0.2! People's Republic North-Korea relations! Japan North-Korea relations! North-Korea women's team! North-Korean abductions of Japanese citizens! North-Korean floods! North-Korea weapons of mass destruction! North-Korean abductions! North-Korea program! North-Korean famine! North-Korean nuclear test! North-Korea South-Korea relations! North-Korean defectors! North-Korea weapons! North-Korean missile test! North-Korean test! North-Korea United-States relations! North-Korea Russia relations! North-Korea relations! North-Korea nuclear program!

Topic Popularity (US)!

0.002! 0.003! 0.003! 0.003! 0.005! 0.005! 0.008! 0.01! 0.013! 0.014! 0.015! 0.022! 0.023! 0.024! 0.033! 0.049! 0.071! 0.073! 0.585!

0! 0.2! 0.4! 0.6! 0.8! North-Korean Intelligence Agencies! Korean Language! Mount Kumgang>>Tourist Region! North Korean abductions of Japanese citizens>>Victims! North-South Summit! Prisons in North-Korea! First Secretary of the Workers' Party of Korea! North-Korea sponsored schools in Japan! North-Korean abductions! Human rights in North-Korea! North-Korean nuclear test! North-Korea United-States relations! North-Korean missile test! Yeonpyeongdo>>bombardment! Korean War! North-Korean abductions of Japanese citizens! North-Korean defectors! Workers' Party of Korea! North-Korean nuclear issue!

Topic Popularity (Japan)!

North-Korean Agenda

0.21! 0.082! 0.079! 0.317! 0.216! 0.025! 0.011! 0.039! 0.06! 0.071! 0.31! 0.427!

0! 0.2! 0.4! 0.6! North-South relations! Culture in North-Korea! Kim Jong-il's visit to China! Ryongchon disaster! North-Korean nuclear issues! Abductions of Japanese! News Coverage Ratio! Japan! South-Korea!

slide-20
SLIDE 20

rating explanation 5 Title is one of major topics in Article. Article gives a particular attention to Title. 4 Part of Article deals with Title. Article makes a clear reference to Title. 3 Part of Title has some relevance to a dominant theme of Article. Example: Title ‘European Tax System’ is partially relevant to an article discussing US Tax System. 2 Article makes a reference to part of Title. 1 Title has no relevance to Article, in whatever way.

language rating #instances english 4.63 97 japanese 4.41 92

Human Evaluation

slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

s1 s2 rouge-w

The United States of America The United States of America 1 The United States The United States of America 0.529 States The United States of America 0.077

Evaluation Metric: ROUGE-W

S(C|k, l) = 1 k X

c∈C|k

rouge-w(c, l)

slide-25
SLIDE 25

trank rm0 rm1 rm1/x NYT 0.000 0.056 0.056 0.069 TDT 0.030 0.042 0.048 0.051 FOX? 0.231 0.264 0.264 0.298

Results

New York Times (2013) : 19,952 TDT (1994) : 15,863 FOX (2015) :11,014 Wikipedia (2012) Text Rank vs. WikiLabel

slide-26
SLIDE 26
  • Talked about topic detection using WikiLabel
  • Leveraging Wikipedia
  • Generalizing concept with sentence compression
  • Use of sentence compression led to a huge improvement, producing

performance twice as good as that of TextRank

  • Online topic learning seems promising

Summary

slide-27
SLIDE 27

2000 4000 6000 8000 100 200 300 400 500 600 Rank of Topic Descriptor Frequency

Frequencies of (manually assigned) topic descriptors that appeared in the New York Times from June to December, 2013.

Solution to Problem

(Online) Learning WikiLabel

27