Automatic acquisition of Named Entities for Rule-Based Machine - - PowerPoint PPT Presentation

automatic acquisition of named entities for rule based
SMART_READER_LITE
LIVE PREVIEW

Automatic acquisition of Named Entities for Rule-Based Machine - - PowerPoint PPT Presentation

Introduction MINELex Methodology Evaluation Conclusions Automatic acquisition of Named Entities for Rule-Based Machine Translation Antonio Toral , Andy Way DCU 2 nd International Workshop on Free/Open-Source Rule-Based Machine Translation


slide-1
SLIDE 1

Introduction MINELex Methodology Evaluation Conclusions

Automatic acquisition of Named Entities for Rule-Based Machine Translation

Antonio Toral, Andy Way – DCU

2nd International Workshop on Free/Open-Source Rule-Based Machine Translation

2011/01/20

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-2
SLIDE 2

Introduction MINELex Methodology Evaluation Conclusions

Contents

1

Introduction

2

MINELex

3

Methodology Motivation Procedure Example

4

Evaluation Environment Experiments

5

Conclusions

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-3
SLIDE 3

Introduction MINELex Methodology Evaluation Conclusions

Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-4
SLIDE 4

Introduction MINELex Methodology Evaluation Conclusions

Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-5
SLIDE 5

Introduction MINELex Methodology Evaluation Conclusions

Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS

English Europarl, tagged with Freeling

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-6
SLIDE 6

Introduction MINELex Methodology Evaluation Conclusions

Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS

English Europarl, tagged with Freeling Mean: 1 NEs, 3 common nouns, 7 verbs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-7
SLIDE 7

Introduction MINELex Methodology Evaluation Conclusions

Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS

English Europarl, tagged with Freeling Mean: 1 NEs, 3 common nouns, 7 verbs Avg num occurrences: 24 NEs, 295 common nouns, 888 verbs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-8
SLIDE 8

Introduction MINELex Methodology Evaluation Conclusions

Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS

English Europarl, tagged with Freeling Mean: 1 NEs, 3 common nouns, 7 verbs Avg num occurrences: 24 NEs, 295 common nouns, 888 verbs Num different instances: 88k NEs, 26k common nouns, 7k verbs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-9
SLIDE 9

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-10
SLIDE 10

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-11
SLIDE 11

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-12
SLIDE 12

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-13
SLIDE 13

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes

Equivalent NEs in different languages connected by interlingual links

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-14
SLIDE 14

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes

Equivalent NEs in different languages connected by interlingual links NEs associated with confidence scores (num occurrences, %

  • ccurs capitalised)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-15
SLIDE 15

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes

Equivalent NEs in different languages connected by interlingual links NEs associated with confidence scores (num occurrences, %

  • ccurs capitalised)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-16
SLIDE 16

Introduction MINELex Methodology Evaluation Conclusions

Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs:

Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes

Equivalent NEs in different languages connected by interlingual links NEs associated with confidence scores (num occurrences, %

  • ccurs capitalised)

English Spanish NEs 948,410 99,330 Variants 1,541,993 128,796 Instance relations 1,366,899 128,796

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-17
SLIDE 17

Introduction MINELex Methodology Evaluation Conclusions Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-18
SLIDE 18

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Aim: automatically add NEs to RBMT dictionaries

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-19
SLIDE 19

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Aim: automatically add NEs to RBMT dictionaries Reasons:

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-20
SLIDE 20

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Aim: automatically add NEs to RBMT dictionaries Reasons:

Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-21
SLIDE 21

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Aim: automatically add NEs to RBMT dictionaries Reasons:

Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually 1/3 of entries in Apertium English–Spanish dic regard NEs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-22
SLIDE 22

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Aim: automatically add NEs to RBMT dictionaries Reasons:

Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually 1/3 of entries in Apertium English–Spanish dic regard NEs Simpler morphology of NEs*

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-23
SLIDE 23

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Aim: automatically add NEs to RBMT dictionaries Reasons:

Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually 1/3 of entries in Apertium English–Spanish dic regard NEs Simpler morphology of NEs*

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-24
SLIDE 24

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Extract bilingual pairs of NEs from MINELex and insert into Apertium dics

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-25
SLIDE 25

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Subset of NEs that satisfy restrictions

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-26
SLIDE 26

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Subset of NEs that satisfy restrictions

Min num of occurrences

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-27
SLIDE 27

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Subset of NEs that satisfy restrictions

Min num of occurrences Min % of occurrences are capitalised

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-28
SLIDE 28

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Subset of NEs that satisfy restrictions

Min num of occurrences Min % of occurrences are capitalised

Insert relevant data in Apertium dics (sl, tl, bi)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-29
SLIDE 29

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-30
SLIDE 30

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

NE E n g l i s h = Y eka teri nburg NE Spanish = Ekaterimburgo Number

  • ccurrenc e s = 190

Percentage c a p i t a l i s e d = .95

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-31
SLIDE 31

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

NE E n g l i s h = Y eka teri nburg NE Spanish = Ekaterimburgo Number

  • ccurrenc e s = 190

Percentage c a p i t a l i s e d = .95

<p a r d e f n=”Aachen np”> <e> <p> <l/ > <r> <s n=”np”/><s n=”a l”/><s n=”sp”/> </r> </p> </e> </parde f> <e lm=”Ye kate rinburg ”> <i>Ye kate rinburg </i> <par n=”Aachen np”/> </e> Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-32
SLIDE 32

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

NE E n g l i s h = Y eka teri nburg NE Spanish = Ekaterimburgo Number

  • ccurrenc e s = 190

Percentage c a p i t a l i s e d = .95

<p a r d e f n=”Aachen np”> <e> <p> <l/ > <r> <s n=”np”/><s n=”a l”/><s n=”sp”/> </r> </p> </e> </parde f> <e lm=”Ye kate rinburg ”> <i>Ye kate rinburg </i> <par n=”Aachen np”/> </e> <p a r d e f n=”Aquisgran np”> <e> <p> <l/ > <r> <s n=”np”/><s n=” a l”/><s n=”mf”/><s n=”sp”/> </r> </p> </e> </parde f> <e lm=”Ekaterimburgo ”> <i>Ekaterimburgo </i> <par n=”Aquisgran np”/> </e> Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-33
SLIDE 33

Introduction MINELex Methodology Evaluation Conclusions Motivation Procedure Example

NE E n g l i s h = Y eka teri nburg NE Spanish = Ekaterimburgo Number

  • ccurrenc e s = 190

Percentage c a p i t a l i s e d = .95

<p a r d e f n=”Aachen np”> <e> <p> <l/ > <r> <s n=”np”/><s n=”a l”/><s n=”sp”/> </r> </p> </e> </parde f> <e lm=”Ye kate rinburg ”> <i>Ye kate rinburg </i> <par n=”Aachen np”/> </e> <p a r d e f n=”Aquisgran np”> <e> <p> <l/ > <r> <s n=”np”/><s n=” a l”/><s n=”mf”/><s n=”sp”/> </r> </p> </e> </parde f> <e lm=”Ekaterimburgo ”> <i>Ekaterimburgo </i> <par n=”Aquisgran np”/> </e> <e> <p> <l>Ye kate rinburg <s n=”np”/><s n=” a l”/> </l> <r>Ekaterimburgo <s n=”np”/><s n=” a l”/><s n=”mf”/> </r> </p> </e> Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-34
SLIDE 34

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

System: Apertium es–en 0.7.1

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-35
SLIDE 35

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

System: Apertium es–en 0.7.1 Baselines: Apertium without NEs (no nes) and Apertium with handtagged NEs (nes)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-36
SLIDE 36

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

System: Apertium es–en 0.7.1 Baselines: Apertium without NEs (no nes) and Apertium with handtagged NEs (nes) Test set: nc-2007 (WMT’08)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-37
SLIDE 37

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

System: Apertium es–en 0.7.1 Baselines: Apertium without NEs (no nes) and Apertium with handtagged NEs (nes) Test set: nc-2007 (WMT’08) Metrics: UNK, BLEU, NIST, TER, GTM

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-38
SLIDE 38

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

System: Apertium es–en 0.7.1 Baselines: Apertium without NEs (no nes) and Apertium with handtagged NEs (nes) Test set: nc-2007 (WMT’08) Metrics: UNK, BLEU, NIST, TER, GTM Parameters: Min occurrences {25, 50, 100, 200}, min %

  • ccurrences capitalised .7, .75, .8, .85

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-39
SLIDE 39

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

what is the importance of handtagged NEs in Apertium’s dictionaries?

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-40
SLIDE 40

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

what is the importance of handtagged NEs in Apertium’s dictionaries? Apertium with handtagged NEs vs Apertium without NEs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-41
SLIDE 41

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

what is the importance of handtagged NEs in Apertium’s dictionaries? Apertium with handtagged NEs vs Apertium without NEs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-42
SLIDE 42

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

what is the importance of handtagged NEs in Apertium’s dictionaries? Apertium with handtagged NEs vs Apertium without NEs System UNK BLEU NIST TER GTM en→es no nes 3440 0.1976 6.5389 0.6222 0.4917 en→es nes 2285 0.2119 6.7641 0.6084 0.5054

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-43
SLIDE 43

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

what is the importance of handtagged NEs in Apertium’s dictionaries? Apertium with handtagged NEs vs Apertium without NEs System UNK BLEU NIST TER GTM en→es no nes 3440 0.1976 6.5389 0.6222 0.4917 en→es nes 2285 0.2119 6.7641 0.6084 0.5054 es→en no nes 3027 0.2016 6.1521 0.7091 0.5073 es→en nes 1936 0.2127 6.3277 0.6969 0.5182

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-44
SLIDE 44

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Experiments to answer two questions:

1

Can NEs from MINELex obtain comparable performance to handtagged NEs?

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-45
SLIDE 45

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Experiments to answer two questions:

1

Can NEs from MINELex obtain comparable performance to handtagged NEs?

2

Can NEs from MINELex add significant value to handtagged NEs?

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-46
SLIDE 46

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium without NEs (en→es)

System UNK BLEU NIST TER GTM 100,.75 2334 .2060 6.6879 .6173 .5007 100,.8 2372 .2061 6.6882 .6168 .5010 200,.75 2441 .2058 6.6903 .6164 .5006 200,.8 2481 .2059 6.6899 .6158 .5009 no nes 3440 .1976 6.5389 .6222 .4917 nes 2285 .2119 6.7641 .6084 .5054

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-47
SLIDE 47

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium without NEs (en→es)

System UNK BLEU NIST TER GTM 100,.75 2334 .2060 6.6879 .6173 .5007 100,.8 2372 .2061 6.6882 .6168 .5010 200,.75 2441 .2058 6.6903 .6164 .5006 200,.8 2481 .2059 6.6899 .6158 .5009 no nes 3440 .1976 6.5389 .6222 .4917 nes 2285 .2119 6.7641 .6084 .5054 Automatic NEs >> no nes for all metrics

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-48
SLIDE 48

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium without NEs (en→es)

System UNK BLEU NIST TER GTM 100,.75 2334 .2060 6.6879 .6173 .5007 100,.8 2372 .2061 6.6882 .6168 .5010 200,.75 2441 .2058 6.6903 .6164 .5006 200,.8 2481 .2059 6.6899 .6158 .5009 no nes 3440 .1976 6.5389 .6222 .4917 nes 2285 .2119 6.7641 .6084 .5054 Automatic NEs >> no nes for all metrics Handtagged NEs >> Automatic NEs for all metrics

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-49
SLIDE 49

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium without NEs (es→en)

System UNK BLEU NIST TER GTM 100,.75 2078 .2100 6.2882 .7017 .5152 100,.8 2100 .2099 6.2831 .7019 .5149 200,.75 2303 .2097 6.2826 .7021 .5146 200,.8 2325 .2096 6.2790 .7023 .5144 nones 3027 .2016 6.1521 .7091 .5073 nes 1936 .2127 6.3277 .6969 .5182

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-50
SLIDE 50

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium without NEs (es→en)

System UNK BLEU NIST TER GTM 100,.75 2078 .2100 6.2882 .7017 .5152 100,.8 2100 .2099 6.2831 .7019 .5149 200,.75 2303 .2097 6.2826 .7021 .5146 200,.8 2325 .2096 6.2790 .7023 .5144 nones 3027 .2016 6.1521 .7091 .5073 nes 1936 .2127 6.3277 .6969 .5182 Automatic NEs >> no nes for all metrics

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-51
SLIDE 51

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium without NEs (es→en)

System UNK BLEU NIST TER GTM 100,.75 2078 .2100 6.2882 .7017 .5152 100,.8 2100 .2099 6.2831 .7019 .5149 200,.75 2303 .2097 6.2826 .7021 .5146 200,.8 2325 .2096 6.2790 .7023 .5144 nones 3027 .2016 6.1521 .7091 .5073 nes 1936 .2127 6.3277 .6969 .5182 Automatic NEs >> no nes for all metrics Handtagged NEs comparable to Automatic (BLEU, NIST)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-52
SLIDE 52

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium with NEs (en→es)

System UNK BLEU NIST TER GTM 25,.8 2027 .2105 6.7122 .6144 .5028 100,.75 2089 .2113 6.7472 .6097 .5051 100,.8 2089 .2113 6.7482 .6096 .5052 200,.75 2141 .2117 6.7568 .6088 .5054 200,.8 2141 .2117 6.7577 .6087 .5055 nes 2285 .212 6.764 .608 .505

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-53
SLIDE 53

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium with NEs (en→es)

System UNK BLEU NIST TER GTM 25,.8 2027 .2105 6.7122 .6144 .5028 100,.75 2089 .2113 6.7472 .6097 .5051 100,.8 2089 .2113 6.7482 .6096 .5052 200,.75 2141 .2117 6.7568 .6088 .5054 200,.8 2141 .2117 6.7577 .6087 .5055 nes 2285 .212 6.764 .608 .505 Automatic+Handtagged NEs comparable to Handtagged

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-54
SLIDE 54

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium with NEs (en→es)

System UNK BLEU NIST TER GTM 25,.8 2027 .2105 6.7122 .6144 .5028 100,.75 2089 .2113 6.7472 .6097 .5051 100,.8 2089 .2113 6.7482 .6096 .5052 200,.75 2141 .2117 6.7568 .6088 .5054 200,.8 2141 .2117 6.7577 .6087 .5055 nes 2285 .212 6.764 .608 .505 Automatic+Handtagged NEs comparable to Handtagged UNK can be reduced up to 11.3% (25,.8)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-55
SLIDE 55

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium with NEs (es→en)

System UNK BLEU NIST TER GTM 25,.8 1725 .2133 6.3291 .6979 .5184 100,.75 1789 .2135 6.3368 .6968 .5188 100,.8 1789 .2135 6.3362 .6968 .5187 200,.75 1830 .2135 6.3362 .6968 .5187 200,.8 1830 .2135 6.3356 .6969 .5186 nes 1936 .2127 6.3277 .6969 .5182

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-56
SLIDE 56

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium with NEs (es→en)

System UNK BLEU NIST TER GTM 25,.8 1725 .2133 6.3291 .6979 .5184 100,.75 1789 .2135 6.3368 .6968 .5188 100,.8 1789 .2135 6.3362 .6968 .5187 200,.75 1830 .2135 6.3362 .6968 .5187 200,.8 1830 .2135 6.3356 .6969 .5186 nes 1936 .2127 6.3277 .6969 .5182 Automatic+Handtagged NEs >> Handtagged

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-57
SLIDE 57

Introduction MINELex Methodology Evaluation Conclusions Environment Experiments

Adding NEs to Apertium with NEs (es→en)

System UNK BLEU NIST TER GTM 25,.8 1725 .2133 6.3291 .6979 .5184 100,.75 1789 .2135 6.3368 .6968 .5188 100,.8 1789 .2135 6.3362 .6968 .5187 200,.75 1830 .2135 6.3362 .6968 .5187 200,.8 1830 .2135 6.3356 .6969 .5186 nes 1936 .2127 6.3277 .6969 .5182 Automatic+Handtagged NEs >> Handtagged UNK can be reduced up to 10.9% (25,.8)

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-58
SLIDE 58

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-59
SLIDE 59

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-60
SLIDE 60

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-61
SLIDE 61

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-62
SLIDE 62

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-63
SLIDE 63

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs Mixed results when comparing/adding automatic NEs to handtagged NEs

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-64
SLIDE 64

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs Mixed results when comparing/adding automatic NEs to handtagged NEs

Software developed

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-65
SLIDE 65

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs Mixed results when comparing/adding automatic NEs to handtagged NEs

Software developed

minelex2plain → exports a subset of NEs for a language pair to a plain text tabbed format

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-66
SLIDE 66

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs Mixed results when comparing/adding automatic NEs to handtagged NEs

Software developed

minelex2plain → exports a subset of NEs for a language pair to a plain text tabbed format minelex2apertium → inserts in Apertium dictionaries output from minelex2plain

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-67
SLIDE 67

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs Mixed results when comparing/adding automatic NEs to handtagged NEs

Software developed

minelex2plain → exports a subset of NEs for a language pair to a plain text tabbed format minelex2apertium → inserts in Apertium dictionaries output from minelex2plain

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-68
SLIDE 68

Introduction MINELex Methodology Evaluation Conclusions

Importance of NEs in RBMT (en–es) has been studied

Improvement across a set of MT evaluation metrics Reduction by 33% of unknown terms

Method for enriching RBMT with automatically acquired NEs

System with automatic NEs outperforms system without NEs Mixed results when comparing/adding automatic NEs to handtagged NEs

Software developed

minelex2plain → exports a subset of NEs for a language pair to a plain text tabbed format minelex2apertium → inserts in Apertium dictionaries output from minelex2plain

http://www.computing.dcu.ie/~atoral/#Resources

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT

slide-69
SLIDE 69

Introduction MINELex Methodology Evaluation Conclusions

Thanks! Questions?

Antonio Toral, Andy Way – DCU Automatic acquisition of NEs for RBMT