YAGO3: A Knowledge Base from Multilingual Wikipedias Farzaneh - - PowerPoint PPT Presentation

yago3
SMART_READER_LITE
LIVE PREVIEW

YAGO3: A Knowledge Base from Multilingual Wikipedias Farzaneh - - PowerPoint PPT Presentation

YAGO3: A Knowledge Base from Multilingual Wikipedias Farzaneh Mahdisoltani Joanna Biega Fabian M. Suchanek CIDR 2015 2 2 John_Coltrane 2 John_Coltrane wasBornOnDate wasBornIn label 1926-09-23 Hamlet_(Town) John William


slide-1
SLIDE 1

YAGO3:

A Knowledge Base from Multilingual Wikipedias

Farzaneh Mahdisoltani Joanna Biega Fabian M. Suchanek

CIDR 2015

slide-2
SLIDE 2

2

slide-3
SLIDE 3

2

slide-4
SLIDE 4

John_Coltrane

2

slide-5
SLIDE 5

John_Coltrane

wasBornOnDate wasBornIn

“1926-09-23” Hamlet_(Town)

label

“John William Coltrane”

2

slide-6
SLIDE 6

John_Coltrane

wasBornOnDate wasBornIn

“1926-09-23” Hamlet_(Town)

label type

“John William Coltrane” American_ Jazz_Composer

2

slide-7
SLIDE 7

John_Coltrane

wasBornOnDate wasBornIn

“1926-09-23” Hamlet_(Town)

label type

“John William Coltrane”

locatedIn

United_States

subclassOf

wordnet_composer

locatedIn

North_America

subclassOf

wordnet_musician

2

American_ Jazz_Composer

slide-8
SLIDE 8

John_Coltrane

wasBornOnDate wasBornIn

“1926-09-23” Hamlet_(Town)

label type

“John William Coltrane”

locatedIn

United_States

subclassOf

wordnet_composer

locatedIn

North_America

subclassOf

wordnet_musician

2

American_ Jazz_Composer

120M facts 10M entities 100 relations 95% precision

slide-9
SLIDE 9

YAGO can be used in many ways

Named Entity Disambiguation

  • J. Hoffart et al., Robust Disambiguation of Named Entities in Text, EMNLP2011

3

slide-10
SLIDE 10

YAGO can be used in many ways

Named Entity Disambiguation Semantic Culturomics

  • F. M. Suchanek, N. Preda, Semantic Culturomics,

VLDB2014

  • T. Huet, J. Biega, F. M. Suchanek, Mining History with Le Monde, AKBC2013
  • J. Hoffart et al., Robust Disambiguation of Named Entities in Text, EMNLP2011

3

slide-11
SLIDE 11

YAGO can be used in many ways

Named Entity Disambiguation Semantic Culturomics Extending YAGO coverage would yield better results!

  • F. M. Suchanek, N. Preda, Semantic Culturomics,

VLDB2014

  • T. Huet, J. Biega, F. M. Suchanek, Mining History with Le Monde, AKBC2013
  • J. Hoffart et al., Robust Disambiguation of Named Entities in Text, EMNLP2011

3

slide-12
SLIDE 12

Multilingual wikipedias

4

slide-13
SLIDE 13

Multilingual wikipedias

Izabella_Olszewska

Local entities

Tadeusz_Jurasz

4

slide-14
SLIDE 14

Multilingual wikipedias

Izabella_Olszewska

Local entities

Tadeusz_Jurasz

Local facts

isMarriedTo

4

slide-15
SLIDE 15

Running YAGO on multilingual wikipedias

Extraction EN

5

?

slide-16
SLIDE 16

Running YAGO on multilingual wikipedias

Extraction EN

Duplicate entities

5

?

slide-17
SLIDE 17

Running YAGO on multilingual wikipedias

Extraction EN

Entities with no type discarded Duplicate entities

5

?

slide-18
SLIDE 18

Running YAGO on multilingual wikipedias

Extraction EN

No facts extracted from foreign inboxes Entities with no type discarded Duplicate entities

5

?

slide-19
SLIDE 19

Running YAGO on multilingual wikipedias

6

Extractor Extractor Extractor Extractor Extractor Extractor Theme Theme Theme Theme Theme

slide-20
SLIDE 20

Running YAGO on multilingual wikipedias

Extractor Extractor Extractor Extractor Extractor Extractor Theme Theme Theme Theme Theme Extractor Extractor Extractor Theme Theme Theme

6

Extractor Extractor Theme Theme

Raw extraction Clean-up

slide-21
SLIDE 21

Tasks

  • 2. Types
  • 3. Facts
  • 1. Entities

7

slide-22
SLIDE 22
  • 1. Set of Entities

=? =?

8

slide-23
SLIDE 23
  • 1. Set of Entities

specifies the abstraction classes

8

slide-24
SLIDE 24
  • 1. Set of Entities

specifies the abstraction classes

8

slide-25
SLIDE 25
  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers"

9

slide-26
SLIDE 26
  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer

9

slide-27
SLIDE 27
  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer American_Composer subclassOf wordnet_composer

9

slide-28
SLIDE 28
  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer American_Composer subclassOf wordnet_composer

English-centric!

9

slide-29
SLIDE 29

9

  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer American_Composer subclassOf wordnet_composer pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani

slide-30
SLIDE 30

9

  • 2. Taxonomy construction

pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani en/John_Coltrane inCategory en/American_Jazzmen en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer American_Composer subclassOf wordnet_composer

slide-31
SLIDE 31

9

  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer en/John_Coltrane type American_Jazzman American_Composer subclassOf wordnet_composer American_Jazzman subclassOf wordnet_jazzman pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani en/John_Coltrane inCategory en/American_Jazzmen

slide-32
SLIDE 32

9

  • 2. Taxonomy construction

en/John_Coltrane inCategory "Jazz Music" en/John_Coltrane inCategory "American Composers" en/John_Coltrane type American_Composer en/John_Coltrane type American_Jazzman American_Composer subclassOf wordnet_composer American_Jazzman subclassOf wordnet_jazzman pl/John_Coltrane inCategory pl/Amerykańscy_Jazzmani en/John_Coltrane inCategory en/American_Jazzmen

slide-33
SLIDE 33
  • 3. Fact extraction

en/infobox/married

10

slide-34
SLIDE 34
  • 3. Fact extraction

isMarriedTo en/infobox/married

Manually defined in YAGO-EN

10

slide-35
SLIDE 35
  • 3. Fact extraction

isMarriedTo pl/infobox/małżonek en/infobox/married

10

slide-36
SLIDE 36
  • 3. Fact extraction

isMarriedTo pl/infobox/małżonek en/infobox/married hasChild wasBornOnDate ? ? ?

10

slide-37
SLIDE 37

Infobox attributes mapping

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane) (Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

EisMarriedT o Fmalzonek

11

slide-38
SLIDE 38

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek

Corresponding attributes will share some subject-object pairs

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

Infobox attributes mapping

11

slide-39
SLIDE 39

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek support(Fa, Er) = |matches(Fa, Er)|

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

Infobox attributes mapping

12

slide-40
SLIDE 40

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek

Too restrictive for attributes with few contributions

support(Fa, Er) = |matches(Fa, Er)|

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

Infobox attributes mapping

12

slide-41
SLIDE 41

confidence(Fa, Er) = |matches(Fa, Er)| |contrib(Fa)| pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

Infobox attributes mapping

13

slide-42
SLIDE 42

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane) (Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz) pl/Krystyna_Pyrkosz, pl/Witold_Pyrkosz pl/Grażyna_Torbicka, pl/Adam_Torbicki pl/Szymon_Majewski, pl/Magda_Majewska

EisMarriedT o Fmalzonek

Too restrictive for attributes with a lot of new facts but few matches

confidence(Fa, Er) = |matches(Fa, Er)| |contrib(Fa)|

Infobox attributes mapping

13

slide-43
SLIDE 43

14

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek pca(Fa, Er) = |matches(Fa, Er)| |matches(Fa, Er)| + |clashes(Fa, Er)|

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz) pl/Krystyna_Pyrkosz, pl/Witold_Pyrkosz pl/Grażyna_Torbicka, pl/Adam_Torbicki pl/Szymon_Majewski, pl/Magda_Majewska

Open-world assumption

  • L. Galarraga, C. Teflioudi, K. Hose, F. M. Suchanek, AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases, WWW2013

Infobox attributes mapping

slide-44
SLIDE 44

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek

Can get mislead by clashes

pca(Fa, Er) = |matches(Fa, Er)| |matches(Fa, Er)| + |clashes(Fa, Er)|

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz) pl/Krystyna_Pyrkosz, pl/Witold_Pyrkosz pl/Grażyna_Torbicka, pl/Adam_Torbicki pl/Szymon_Majewski, pl/Magda_Majewska

Open-world assumption

Infobox attributes mapping

14

  • L. Galarraga, C. Teflioudi, K. Hose, F. M. Suchanek, AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases, WWW2013
slide-45
SLIDE 45

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

F ∗

malzonek

Random Sample

Infobox attributes mapping

15

slide-46
SLIDE 46

pl/infobox/małżonek =? isMarriedTo

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Alice_Coltrane)

EisMarriedT o Fmalzonek

(Barack_Obama, Michelle_Obama) (Elvis_Presley, Priscilla_Presley) (John_Coltrane, Ravi Coltrane) (pl/Izabella_Olszewska, pl/Tadeusz_Jurasz)

F ∗

malzonek

Infobox attributes mapping

wilson(Fa, Er) = c − δ

With 95% probability
 the true proportion

  • f matches falls into

[c − δ, c + δ]

15

slide-47
SLIDE 47
  • 3. Fact extraction

isMarriedTo pl/infobox/małżonek en/infobox/married hasChild wasBornOnDate ? ? ?

16

slide-48
SLIDE 48
  • 3. Fact extraction

isMarriedTo pl/infobox/małżonek en/infobox/married

16

slide-49
SLIDE 49

Mapping quality

Confidence 16% Wilson 4%

Estimated on a manually annotated sample

17

slide-50
SLIDE 50

Mapping quality

Confidence 16% Wilson 4%

Good performance across different languages

17

slide-51
SLIDE 51

Mapping quality

Confidence 16% Wilson 4%

Chosen so that we get high recall at precision > 95%

17

slide-52
SLIDE 52

Mapping quality

Confidence 16% Wilson 4% Prec Rec F1 Prec Rec F1 ar 100 73 85 100 82 90 de 100 37 54 98 56 72 es 96 19 32 95 29 45 fa 100 49 66 97 54 69 fr 100 16 27 100 69 82 it 100 7 12 98 23 37 nl 100 19 32 100 22 36 pl 95 10 19 97 64 77 ro 96 52 67 95 70 81

18

slide-53
SLIDE 53

Mapping quality

High precision consistent across languages. Confidence 16% Wilson 4% Prec Rec F1 Prec Rec F1 ar 100 73 85 100 82 90 de 100 37 54 98 56 72 es 96 19 32 95 29 45 fa 100 49 66 97 54 69 fr 100 16 27 100 69 82 it 100 7 12 98 23 37 nl 100 19 32 100 22 36 pl 95 10 19 97 64 77 ro 96 52 67 95 70 81

18

slide-54
SLIDE 54

Mapping quality

Higher recall for smaller wikipedias. Confidence 16% Wilson 4% Prec Rec F1 Prec Rec F1 ar 100 73 85 100 82 90 de 100 37 54 98 56 72 es 96 19 32 95 29 45 fa 100 49 66 97 54 69 fr 100 16 27 100 69 82 it 100 7 12 98 23 37 nl 100 19 32 100 22 36 pl 95 10 19 97 64 77 ro 96 52 67 95 70 81

18

slide-55
SLIDE 55

Mapping quality

Lower threshold for Wilson helps increase recall.

<

Confidence 16% Wilson 4% Prec Rec F1 Prec Rec F1 ar 100 73 85 100 82 90 de 100 37 54 98 56 72 es 96 19 32 95 29 45 fa 100 49 66 97 54 69 fr 100 16 27 100 69 82 it 100 7 12 98 23 37 nl 100 19 32 100 22 36 pl 95 10 19 97 64 77 ro 96 52 67 95 70 81

18

slide-56
SLIDE 56

YAGO3

19

slide-57
SLIDE 57

YAGO3

de/Kirdorf (Bedburg), hasNumberOfPeople, "1204"^^xsd:integer fr/Ch^ ateau de Montcony, isLocatedIn, Burgundy pl/Henryk Pietras, wasBornIn de/Debiensko

1M new entities (3.5M for English) 2.5M new facts (6.5M for English)

19

slide-58
SLIDE 58

YAGO3

Large, clean knowledge base from multilingual wikipedias.

de/Kirdorf (Bedburg), hasNumberOfPeople, "1204"^^xsd:integer fr/Ch^ ateau de Montcony, isLocatedIn, Burgundy pl/Henryk Pietras, wasBornIn de/Debiensko

19

Single coherent taxonomy. Mapping of infobox attributes to YAGO relations.

1M new entities (3.5M for English) 2.5M new facts (6.5M for English)

slide-59
SLIDE 59

YAGO3

http://yago-knowledge.org

Thank you!