The Wikipedia Location Network: Overcoming Borders and Oceans - - PowerPoint PPT Presentation

the wikipedia location network overcoming borders and
SMART_READER_LITE
LIVE PREVIEW

The Wikipedia Location Network: Overcoming Borders and Oceans - - PowerPoint PPT Presentation

The Wikipedia Location Network: Overcoming Borders and Oceans Johanna Gei 1 , Andreas Spitz 1 , otgen 1 , 2 , and Michael Gertz 1 Jannik Str 1 Heidelberg University, Institute of Computer Science Database Systems Research Group, Heidelberg 2


slide-1
SLIDE 1

The Wikipedia Location Network: Overcoming Borders and Oceans

Johanna Geiß1, Andreas Spitz1, Jannik Str¨

  • tgen1,2, and Michael Gertz1

1Heidelberg University, Institute of Computer Science

Database Systems Research Group, Heidelberg

2Max-Planck-Institute for Informatics

Databases and Information Systems, Saarbr¨ ucken {geiss, spitz, stroetgen, gertz}@informatik.uni-heidelberg.de

9th GIR Workshop Paris, November 26, 2015

slide-2
SLIDE 2

Motivation Network Construction Properties and Applications Summary

What’s the difference between France and Illinois?

The Wikipedia Location Network Andreas Spitz 1 of 16

slide-3
SLIDE 3

Motivation Network Construction Properties and Applications Summary

Implicit Networks

B A29 A26 A34 A16 A1 N31 N31 N31 N31 A13 A4 A28 A26 A19 N 7 7 N 1 5 1 A19 A28 A85 A5 A11 N154 N154 N154 A10 A6 A11 A10 A71 A77 A6

Rouen Rouen

Auxerre Auxerre Troyes Troyes

Le Mans Le Mans

Angers Angers Nevers Nevers Chartres Chartres

Orléans Orléans

Amiens Amiens Abbeville Abbeville

Reims Reims

Le Havre Le Havre

T

  • urs

T

  • urs

Caen Caen Bourges Bourges Calais Calais 200km 100mi

The Wikipedia Location Network Andreas Spitz 2 of 16

slide-4
SLIDE 4

Motivation Network Construction Properties and Applications Summary

Overview

1 Motivation 2 Network Construction 3 Properties and Applications 4 Summary

The Wikipedia Location Network Andreas Spitz 3 of 16

slide-5
SLIDE 5

Motivation Network Construction Properties and Applications Summary

Foundations of Implicit Networks

“Most of the circuits currently in use are specially constructed for competition. The current street circuits are Monaco, Mel- bourne, Montreal, Singapore and Sochi, although races in

  • ther urban locations come and go (Las Vegas and Detroit,

for example) and proposals for such races are often discussed – most recently New Jersey.”

en.wikipedia.org/wiki/Formula One

The Wikipedia Location Network Andreas Spitz 4 of 16

slide-6
SLIDE 6

Motivation Network Construction Properties and Applications Summary

Multi-Graph Extraction

s(v, w) := distance in sentences between toponyms v and w d(v, w) := exp

  • −s(v, w)

2

  • The Wikipedia Location Network

Andreas Spitz 5 of 16

slide-7
SLIDE 7

Motivation Network Construction Properties and Applications Summary

Multi-Graph Extraction

s(v, w) := distance in sentences between toponyms v and w d(v, w) := exp

  • −s(v, w)

2

  • The Wikipedia Location Network

Andreas Spitz 5 of 16

slide-8
SLIDE 8

Motivation Network Construction Properties and Applications Summary

Edge Aggregation

Distance-based cosine for nodes v and w: dicos(v, w) :=

  • i di(v) di(w)
  • i di(v)2

i di(w)2

The Wikipedia Location Network Andreas Spitz 6 of 16

slide-9
SLIDE 9

Motivation Network Construction Properties and Applications Summary

Toponym Extraction in Wikipedia

The Wikipedia Location Network Andreas Spitz 7 of 16

slide-10
SLIDE 10

Motivation Network Construction Properties and Applications Summary

Network Overview

Node types: Network statistics: |V | |E| density clustering coefficient 723, 779 178, 890, 238 6.8 · 10−4 0.56

The Wikipedia Location Network Andreas Spitz 8 of 16

slide-11
SLIDE 11

Motivation Network Construction Properties and Applications Summary

Network Properties

  • % of remaining edges

clustering coefficient number of components assortativity

25 50 75 100 0.5 0.6 0.7 0.8 0.9 20000 40000 60000 0.0 0.2 0.4 0.6 0.8 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025

dicos threshold network metric

The Wikipedia Location Network Andreas Spitz 9 of 16

slide-12
SLIDE 12

Motivation Network Construction Properties and Applications Summary

Hierarchical Evaluation

Does the network contain classic geographical relations?

  • 1. Extract hierarchical

relations from Wikidata:

The Wikipedia Location Network Andreas Spitz 10 of 16

slide-13
SLIDE 13

Motivation Network Construction Properties and Applications Summary

Hierarchical Evaluation

Does the network contain classic geographical relations?

  • 1. Extract hierarchical

relations from Wikidata:

  • 2. Correspondence of highest

weighted incident edge in network with the link to parent in hierarchy:

  • cities: 81.6% precision for link

to parent country

  • countries: 80.3% precision for

link to parent continent

The Wikipedia Location Network Andreas Spitz 10 of 16

slide-14
SLIDE 14

Motivation Network Construction Properties and Applications Summary

The Network at a Glance

What’s the difference between France and Illinois?

The Wikipedia Location Network Andreas Spitz 11 of 16

slide-15
SLIDE 15

Motivation Network Construction Properties and Applications Summary

The Network at a Glance

What’s the difference between France and Illinois?

The Wikipedia Location Network Andreas Spitz 11 of 16

slide-16
SLIDE 16

Motivation Network Construction Properties and Applications Summary

Applications in NLP

Support for NLP tasks:

  • Disambiguation
  • Coreference analysis
  • Cross- and multilingual analysis

Data analysis by clustering of the network:

  • Finding place similarities
  • Categorization of places
  • Extraction of hierarchies

The Wikipedia Location Network Andreas Spitz 12 of 16

slide-17
SLIDE 17

Motivation Network Construction Properties and Applications Summary

Applications in Event Analysis

November

Sun Mon T ue Wed Thu Fri Sat

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 3 4 5 6 7 8 9 10 11 12

The network supports spatial components of:

  • Event detection
  • Event extraction
  • Event correlation
  • Event similarity

The Wikipedia Location Network Andreas Spitz 13 of 16

slide-18
SLIDE 18

Motivation Network Construction Properties and Applications Summary

Summary

New method for implicit network extraction that is

  • based on text distances of toponyms,
  • applicable to any geo-tagged corpus.

Application to Wikipedia / Wikidata results in

  • negligible number of mistags,
  • accurate and reliable network,
  • useful resource for NLP tasks.

The Wikipedia Location Network Andreas Spitz 14 of 16

slide-19
SLIDE 19

Motivation Network Construction Properties and Applications Summary

Thank you! Questions? The Wikipedia Location Network is available for download. http://dbs.ifi.uni-heidelberg.de/index.php?id=data

The Wikipedia Location Network Andreas Spitz 15 of 16

slide-20
SLIDE 20

Motivation Network Construction Properties and Applications Summary

Bibliography

Johanna Geiß, Andreas Spitz, and Michael Gertz. Beyond Friendships and Followers: The Wikipedia Social Network. In ASONAM’15, 2015. Yu Liu, Fahui Wang, Chaogui Kang, Yong Gao, and Yongmei Lu. Analyzing Relatedness by Toponym Co-Occurrences on Web Pages.

  • T. GIS, 18(1), 2014.

Gianluca Quercini and Hanan Samet. Uncovering the Spatial Relatedness in Wikipedia. In SIGSPATIAL ’14, 2014. Denny Vrandeˇ ci´ c and Markus Kr¨

  • tzsch.

Wikidata: A Free Collaborative Knowledgebase.

  • C. ACM, 57(10), 2014.

The Wikipedia Location Network Andreas Spitz 16 of 16