[PPT] - Learning to Recognize Discontiguous Entities Aldrian Obaja Muis and PowerPoint Presentation

SLIDE 1

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Learning to Recognize Discontiguous Entities

Aldrian Obaja Muis and Wei Lu

Singapore University of Technology and Design aldrian muis@sutd.edu.sg luwei@sutd.edu.sg

SLIDE 2

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Introduction

2 / 37

SLIDE 3

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

line1 line2 line1 line2 line1 line2 line1 line2

3 / 37

SLIDE 4

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

line1 line2 line1 line2 line1 line2 line1 line2

3 / 37

SLIDE 5

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

Allow overlaps/nesting but still assume contiguous:

line1 line2 line1 line2 line1 line2 line1 line2

3 / 37

SLIDE 6

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

Allow overlaps/nesting but still assume contiguous:

1

Tag n-grams instead of words (Byrne. 2007)1

1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive

Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596 line1 line2 line1 line2 line1 line2

3 / 37

SLIDE 7

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

Allow overlaps/nesting but still assume contiguous:

1

Tag n-grams instead of words (Byrne. 2007)1

2

Tag in multiple layers (Alex, Haddow, and Grover. 2007)2

1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive

Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596

2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested

Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72 line1 line2 line1 line2

3 / 37

SLIDE 8

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

Allow overlaps/nesting but still assume contiguous:

1

Tag n-grams instead of words (Byrne. 2007)1

2

Tag in multiple layers (Alex, Haddow, and Grover. 2007)2

3

Treat as parsing task (Finkel and Manning. 2009)3

1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive

Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596

2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested

Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72

3Jenny Rose Finkel and Christopher D. Manning (2009). “Nested named entity

recognition”. In: Proc. of EMNLP 2009. Vol. 1, pp. 141–150 line1 line2

3 / 37

SLIDE 9

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

Allow overlaps/nesting but still assume contiguous:

1

Tag n-grams instead of words (Byrne. 2007)1

2

Tag in multiple layers (Alex, Haddow, and Grover. 2007)2

3

Treat as parsing task (Finkel and Manning. 2009)3

4

Use mention hypergraph (Lu and Roth. 2015)4

1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive

Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596

2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested

Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72

3Jenny Rose Finkel and Christopher D. Manning (2009). “Nested named entity

recognition”. In: Proc. of EMNLP 2009. Vol. 1, pp. 141–150

4Wei Lu and Dan Roth (2015). “Joint Mention Extraction and Classification with

Mention Hypergraphs”. In: Proc. of EMNLP 2015, pp. 857–867

3 / 37

SLIDE 10

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Works in Entity Recognition

Assuming non-overlapping and contiguous entities:

Mostly using BIO/BILOU tagset

Allow overlaps/nesting but still assume contiguous:

1

Tag n-grams instead of words (Byrne. 2007)1

2

Tag in multiple layers (Alex, Haddow, and Grover. 2007)2

3

Treat as parsing task (Finkel and Manning. 2009)3

4

Use mention hypergraph (Lu and Roth. 2015)4

How about discontiguous entities?

1Kate Byrne (2007). “Nested Named Entity Recognition in Historical Archive

Text”. In: IEEE ICSC 2007. IEEE Computer Society, pp. 589–596

2Beatrice Alex, Barry Haddow, and Claire Grover (2007). “Recognising Nested

Named Entities in Biomedical Text”. In: BioNLP Workshop 2007. June, pp. 65–72

3Jenny Rose Finkel and Christopher D. Manning (2009). “Nested named entity

recognition”. In: Proc. of EMNLP 2009. Vol. 1, pp. 141–150

4Wei Lu and Dan Roth (2015). “Joint Mention Extraction and Classification with

Mention Hypergraphs”. In: Proc. of EMNLP 2015, pp. 857–867

3 / 37

SLIDE 11

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) 4 / 37

SLIDE 12

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac. 4 / 37

SLIDE 13

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

4 / 37

SLIDE 14

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

4 / 37

SLIDE 15

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

3

blood in stomach

4 / 37

SLIDE 16

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

3

blood in stomach

4

stomach . . . lac

4 / 37

SLIDE 17

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

3

blood in stomach

4

stomach . . . lac

Infarctions either water shed or embolic 4 / 37

SLIDE 18

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

3

blood in stomach

4

stomach . . . lac

Infarctions either water shed or embolic

1

Infarctions

4 / 37

SLIDE 19

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

3

blood in stomach

4

stomach . . . lac

Infarctions either water shed or embolic

1

Infarctions

2

Infarctions . . . water shed

4 / 37

SLIDE 20

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entity Recognition

Definition A task to recognize entities in text, where they can be discontiguous (and possibly overlapping with each other) Examples from SemEval 2014 Task 7: Analysis of Clinical Text: EGD showed hiatal hernia and vertical laceration in distal esophagus with blood in stomach and overlying lac.

1

hiatal hernia

2

laceration . . . esophagus

3

blood in stomach

4

stomach . . . lac

Infarctions either water shed or embolic

1

Infarctions

2

Infarctions . . . water shed

3

Infarctions . . . embolic

4 / 37

SLIDE 21

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

line1 line2 line1 line2 line1 line2

5 / 37

SLIDE 22

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5 5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014 line1 line2 line1 line2

5 / 37

SLIDE 23

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5

Standard NER using BIO tagset pipelined with SVM to combine the spans

5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014 line1 line2 line1 line2

5 / 37

SLIDE 24

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5

Standard NER using BIO tagset pipelined with SVM to combine the spans

2 Zhang et al. (2014)6 (best team) 5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014

6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7

Analysis of Clinical Text”. In: SemEval 2014 line1 line2

5 / 37

SLIDE 25

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5

Standard NER using BIO tagset pipelined with SVM to combine the spans

2 Zhang et al. (2014)6 (best team)

Use extended BIO tagset coupled with heuristics7

5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014

6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7

Analysis of Clinical Text”. In: SemEval 2014

7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in

Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab

5 / 37

SLIDE 26

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5

Standard NER using BIO tagset pipelined with SVM to combine the spans

2 Zhang et al. (2014)6 (best team)

Use extended BIO tagset coupled with heuristics7 B, I for contiguous tokens

5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014

6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7

Analysis of Clinical Text”. In: SemEval 2014

7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in

Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab

5 / 37

SLIDE 27

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5

Standard NER using BIO tagset pipelined with SVM to combine the spans

2 Zhang et al. (2014)6 (best team)

Use extended BIO tagset coupled with heuristics7 B, I for contiguous tokens BD, ID for discontiguous tokens

5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014

6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7

Analysis of Clinical Text”. In: SemEval 2014

7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in

Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab

5 / 37

SLIDE 28

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Previous Approaches

In SemEval 2014 Task 7, there were only two teams that could handle discontiguous and overlapping entities:

1 Pathak et al. (2014)5

Standard NER using BIO tagset pipelined with SVM to combine the spans

2 Zhang et al. (2014)6 (best team)

Use extended BIO tagset coupled with heuristics7 B, I for contiguous tokens BD, ID for discontiguous tokens BH, IH for overlapping tokens

5Parth Pathak et al. (2014). “ezDI: A Hybrid CRF and SVM based Model for

Detecting and Encoding Disorder Mentions in Clinical Notes”. In: SemEval 2014

6Yaoyun Zhang et al. (2014). “UTH CCB: A report for SemEval 2014 – Task 7

Analysis of Clinical Text”. In: SemEval 2014

7Buzhou Tang et al. (2013). “Recognizing and Encoding Discorder Concepts in

Clinical Text using Machine Learning and Vector Space”. In: ShARe/CLEF Eval. Lab

5 / 37

SLIDE 29

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. Infarctions either water shed

r embolic

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 30

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [Infarctions]1 either [water shed]1 or embolic

1 Infarctions ... water shed

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 31

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2

1 Infarctions ... water shed 2 Infarctions ... embolic

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 32

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 33

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 O O

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 34

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O O

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 35

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O BD ID O

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 36

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O BD ID O BD

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 37

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Encoding in Model of Zhang et al. [[[Infarctions]1]2]3 either [water shed]1 or [embolic]2 BH O BD ID O BD

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions

This is the canonical encoding of this particular set of entities

Example taken from the full sentence: “... protocol to evaluate for any infarctions, either water shed or embolic, ...”

6 / 37

SLIDE 38

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. Infarctions either water shed

r embolic

BH O BD ID O BD

7 / 37

SLIDE 39

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. Infarctions either [water shed]1 or [embolic]1 BH O BD ID O BD

7 / 37

SLIDE 40

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. Infarctions either water shed

r embolic

BH O BD ID O BD

7 / 37

SLIDE 41

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. [Infarctions]1 either [water shed]1 or embolic BH O BD ID O BD

1 Infarctions ... water shed

7 / 37

SLIDE 42

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2 BH O BD ID O BD

1 Infarctions ... water shed 2 Infarctions ... embolic

7 / 37

SLIDE 43

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2 BH O BD ID O BD

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions (?)

7 / 37

SLIDE 44

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Decoding in Model of Zhang et al. [[Infarctions]1]2 either [water shed]1 or [embolic]2 BH O BD ID O BD

1 Infarctions ... water shed 2 Infarctions ... embolic 3 Infarctions (?)

Ambiguous!

7 / 37

SLIDE 45

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Number of Entity Combinations

In a sentence with n words, there are:

1 2n − 1 possible discontiguous entities

8 / 37

SLIDE 46

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Number of Entity Combinations

In a sentence with n words, there are:

1 2n − 1 possible discontiguous entities 2 22n−1 possible combinations of discontiguous entities*

8 / 37

SLIDE 47

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entities Recognition

1 How to efficiently model these discontiguous (and possibly

verlapping) entities?

9 / 37

SLIDE 48

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Discontiguous Entities Recognition

1 How to efficiently model these discontiguous (and possibly

verlapping) entities?

2 How to compare the ambiguity between models for

discontiguous entities?

9 / 37

SLIDE 49

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Contributions

In this paper, we contributed:

1 A new hypergraph-based model to handle discontiguous

entities better

10 / 37

SLIDE 50

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Contributions

In this paper, we contributed:

1 A new hypergraph-based model to handle discontiguous

entities better

2 A simple theoretical framework to compare ambiguity

between models

10 / 37

SLIDE 51

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 B0 B0 B0 B0 B0 O1 O1 O1 O1 O1 O1 B1 B1 B1 B1 B1 B1 X X X X X X X X X X X X X X X X X X

Infarctions either water shed

r

embolic

11 / 37

SLIDE 52

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X X

Infarctions either water shed

r

embolic Infarctions water shed embolic

Infarctions Infarctions . . . water shed Infarctions . . . embolic

12 / 37

SLIDE 53

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

Key ideas:

1 Build a hypergraph that can encode any entity combination

13 / 37

SLIDE 54

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

Key ideas:

1 Build a hypergraph that can encode any entity combination 2 For any sentence annotated with entities, there would be a

unique subgraph that represents it (canonical encoding) 13 / 37

SLIDE 55

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

Key ideas:

1 Build a hypergraph that can encode any entity combination 2 For any sentence annotated with entities, there would be a

unique subgraph that represents it (canonical encoding)

3 Each entity is represented as a path in the entity-encoded

hypergraph, where the B-nodes indicate which tokens are part

f the entity

13 / 37

SLIDE 56

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X X

Infarctions either water shed

r

embolic

Infarctions Infarctions . . . water shed Infarctions . . . embolic

14 / 37

SLIDE 57

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T X X X X X

Infarctions either water shed

r

embolic

14 / 37

SLIDE 58

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 X X X X X

Infarctions either water shed

r

embolic

14 / 37

SLIDE 59

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 X X X X X X

Infarctions either water shed

r

embolic Infarctions

Infarctions

14 / 37

SLIDE 60

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 X X X X X X

Infarctions either water shed

r

embolic

Infarctions

14 / 37

SLIDE 61

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 B1 X X X X X X

Infarctions either water shed

r

embolic

Infarctions

14 / 37

SLIDE 62

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 B1 B1 X X X X X X

Infarctions either water shed

r

embolic

Infarctions

14 / 37

SLIDE 63

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 B1 B1 X X X X X X X

Infarctions either water shed

r

embolic Infarctions water shed

Infarctions Infarctions . . . water shed

14 / 37

SLIDE 64

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 B1 B1 X X X X X X X

Infarctions either water shed

r

embolic

Infarctions Infarctions . . . water shed

14 / 37

SLIDE 65

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 B1 B1 X X X X X X X

Infarctions either water shed

r

embolic

Infarctions Infarctions . . . water shed

14 / 37

SLIDE 66

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 X X X X X X X

Infarctions either water shed

r

embolic

Infarctions Infarctions . . . water shed

14 / 37

SLIDE 67

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X

Infarctions either water shed

r

embolic

Infarctions Infarctions . . . water shed

14 / 37

SLIDE 68

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

A A A A A A E E E E E E T T T T T T B0 O1 O1 O1 O1 B1 B1 B1 X X X X X X X X

Infarctions either water shed

r

embolic Infarctions embolic

Infarctions Infarctions . . . water shed Infarctions . . . embolic

14 / 37

SLIDE 69

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

Training and predicting:

1 Training: Maximize conditional log-likelihood of training data

15 / 37

SLIDE 70

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Our Hypergraph-based Model

Training and predicting:

1 Training: Maximize conditional log-likelihood of training data 2 Predicting: Use Viterbi to find the highest-scoring subgraph

15 / 37

SLIDE 71

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Experiments

16 / 37

SLIDE 72

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Experimental Setup

Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities 17 / 37

SLIDE 73

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Experimental Setup

Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities Two setups for training set: “Discontiguous” (smaller) and “Original” (larger) 17 / 37

SLIDE 74

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Experimental Setup

Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities Two setups for training set: “Discontiguous” (smaller) and “Original” (larger) Models optimized for F1-score in dev set by varying λ 17 / 37

SLIDE 75

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Experimental Setup

Dataset taken from SemEval 2014 Task 7, taking sentences containing discontiguous entities Two setups for training set: “Discontiguous” (smaller) and “Original” (larger) Models optimized for F1-score in dev set by varying λ Features followed Tang et al. (2013): words, POS, Brown cluster, semantic category, . . . 17 / 37

SLIDE 76

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Results Using Smaller Training Set

Precision Recall F1-score 20 40 60 80 100

54.70 41.20 47.00 15.20 44.90 22.70 76.90 40.10 52.70 76.00 40.50 52.80

Score (%) Li-Enh Li-All Sh-Enh Sh-All 18 / 37

SLIDE 77

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Results Using Larger Training Set

Precision Recall F1-score 20 40 60 80 100

64.10 46.50 53.90 52.80 49.40 51.10 73.90 49.10 59.00 73.40 49.50 59.10

Score (%) Li-Enh Li-All Sh-Enh Sh-All 19 / 37

SLIDE 78

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

20 / 37

SLIDE 79

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

One encoding can have multiple interpretations (set of entities)

A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X

apparent [atrial [pacemaker]2 artifact]1 without [capture]2

atrial pacemaker artifact pacemaker artifact pacemaker . . . capture atrial pacemaker . . . capture

Infarctions either water shed

r

embolic BH O BD ID O BD

1

atrial pacemaker artifact

2

pacemaker . . . capture

1

pacemaker artifact

2

atrial pacemaker . . . capture

1

infarctions . . . water shed

2

infarctions . . . embolic

1

infarctions

2

infarctions . . . water shed

3

infarctions . . . embolic

21 / 37

SLIDE 80

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

The models need further processing after prediction to generate one set of entities 22 / 37

SLIDE 81

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

The models need further processing after prediction to generate one set of entities We compare two heuristics: 22 / 37

SLIDE 82

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

The models need further processing after prediction to generate one set of entities We compare two heuristics:

1

All: Return the union of all possible interpretations

22 / 37

SLIDE 83

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

The models need further processing after prediction to generate one set of entities We compare two heuristics:

1

All: Return the union of all possible interpretations

2

Enough: Return one possible interpretation

22 / 37

SLIDE 84

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

23 / 37

SLIDE 85

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity

Definition Ambiguity level A(M) of model M is the average number of interpretations of each canonical encoding in the model 23 / 37

SLIDE 86

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? 24 / 37

SLIDE 87

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model: 24 / 37

SLIDE 88

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O)

24 / 37

SLIDE 89

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n

24 / 37

SLIDE 90

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

24 / 37

SLIDE 91

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

For our hypergraph-based model: 24 / 37

SLIDE 92

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

For our hypergraph-based model:

Number of canonical encoding = number of subgraphs

24 / 37

SLIDE 93

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline model:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

For our hypergraph-based model:

Number of canonical encoding = number of subgraphs Q: How to calculate the number of subgraphs?

24 / 37

SLIDE 94

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

A: Use dynamic programming on combination of nodes 25 / 37

SLIDE 95

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

A: Use dynamic programming on combination of nodes

Fig. 1: Simplified graph to illustrate

subgraph counting

25 / 37

SLIDE 96

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

A: Use dynamic programming on combination of nodes

Fig. 1: Simplified graph to illustrate

subgraph counting

Fig. 2: State transitions

25 / 37

SLIDE 97

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

A: Use dynamic programming on combination of nodes

Fig. 1: Simplified graph to illustrate

subgraph counting

Fig. 2: State transitions

f11(n) = 2 ∗ f11(n − 1) + f01(n − 1) (1) 25 / 37

SLIDE 98

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

For our hypergraph-based model:

Number of canonical encoding = number of subgraphs

26 / 37

SLIDE 99

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

For our hypergraph-based model:

Number of canonical encoding = number of subgraphs After more calculations: MSh(n) > C · 210n

26 / 37

SLIDE 100

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

How many canonical encodings do the models have? For the baseline:

There are 7 possible tags per word (B, I, BD, ID, BH, IH, O) The model can output any combination of those: 7n Not all are canonical, so: MLi(n) < 7n < 23n

For our hypergraph-based model:

Number of canonical encoding = number of subgraphs After more calculations: MSh(n) > C · 210n

So our model is less ambiguous compared to the baseline model 26 / 37

SLIDE 101

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Empirical Ambiguity

Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*

(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)

Sh-all 1.73% 0.00%* 0.35% 0.00%*

(35/2,020) (0/1985) (39/11,186) (0/11,147)

Li-enh 2.74% 3.82% 0.52% 0.90%

(54/1,969) (76/1,991) (58/11,123) (101/11,166)

Sh-enh 1.21% 1.46% 0.25% 0.38%

(24/1,986) (29/1,991) (28/11,152) (42/11,166)

Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output

structures. Lower numbers are better.

27 / 37

SLIDE 102

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Empirical Ambiguity

Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*

(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)

Sh-all 1.73% 0.00%* 0.35% 0.00%*

(35/2,020) (0/1985) (39/11,186) (0/11,147)

Li-enh 2.74% 3.82% 0.52% 0.90%

(54/1,969) (76/1,991) (58/11,123) (101/11,166)

Sh-enh 1.21% 1.46% 0.25% 0.38%

(24/1,986) (29/1,991) (28/11,152) (42/11,166)

Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output

structures. Lower numbers are better.

27 / 37

SLIDE 103

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Empirical Ambiguity

Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*

(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)

Sh-all 1.73% 0.00%* 0.35% 0.00%*

(35/2,020) (0/1985) (39/11,186) (0/11,147)

Li-enh 2.74% 3.82% 0.52% 0.90%

(54/1,969) (76/1,991) (58/11,123) (101/11,166)

Sh-enh 1.21% 1.46% 0.25% 0.38%

(24/1,986) (29/1,991) (28/11,152) (42/11,166)

Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output

structures. Lower numbers are better.

27 / 37

SLIDE 104

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Empirical Ambiguity

Discontiguous Original Prec Err Rec Err Prec Err Rec Err Li-all 63.66% 0.00%* 23.81% 0.00%*

(3,478/5,463) (0/1985) (3,484/14,632) (0/11,147)

Sh-all 1.73% 0.00%* 0.35% 0.00%*

(35/2,020) (0/1985) (39/11,186) (0/11,147)

Li-enh 2.74% 3.82% 0.52% 0.90%

(54/1,969) (76/1,991) (58/11,123) (101/11,166)

Sh-enh 1.21% 1.46% 0.25% 0.38%

(24/1,986) (29/1,991) (28/11,152) (42/11,166)

Table 1: Precision and recall errors (%) of each model in the “Discontiguous” and “Original” training data when given the gold output

structures. Lower numbers are better.

27 / 37

SLIDE 105

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Conclusion

28 / 37

SLIDE 106

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Conclusion

The hypergraph-based model we proposed is better in recognizing discontiguous and overlapping spans compared to a strong baseline 29 / 37

SLIDE 107

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Conclusion

The hypergraph-based model we proposed is better in recognizing discontiguous and overlapping spans compared to a strong baseline Our theoretical analysis (by counting encodings) shows that

ur model is less ambiguous in representing discontiguous

entities, which matches the result of experiments in ambiguity 29 / 37

SLIDE 108

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Future Work

Explore applications of discontiguous spans recognition for

ther tasks

30 / 37

SLIDE 109

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Future Work

Explore applications of discontiguous spans recognition for

ther tasks

Explore more extensions of this model similar to semi-Markov CRF 30 / 37

SLIDE 110

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Future Work

Explore applications of discontiguous spans recognition for

ther tasks

Explore more extensions of this model similar to semi-Markov CRF Explore other training procedures (SSVM, max-margin) 30 / 37

SLIDE 111

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Thank You

Code available at: http://statnlp.org/research/ie/ Aldrian Obaja Muis and Wei Lu

Singapore University of Technology and Design

SLIDE 112

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity in Our Model

A A A A A A E E E E E E T T T T T T X X X X

apparent [atrial [pacemaker]2 artifact]1 without [capture]2

31 / 37

SLIDE 113

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity in Our Model

A A A A A A E E E E E E T T T T T T B0 B0 B0 X X X X X

apparent [atrial [pacemaker]2 artifact]1 without [capture]2

atrial pacemaker artifact

31 / 37

SLIDE 114

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity in Our Model

A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X

apparent [atrial [pacemaker]2 artifact]1 without [capture]2

atrial pacemaker artifact pacemaker . . . capture

31 / 37

SLIDE 115

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity in Our Model

A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X

apparent [atrial [pacemaker]2 artifact]1 without [capture]2

atrial pacemaker artifact pacemaker artifact pacemaker . . . capture

31 / 37

SLIDE 116

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity in Our Model

A A A A A A E E E E E E T T T T T T B0 B0 B0 O1 O1 B1 X X X X X X

apparent [atrial [pacemaker]2 artifact]1 without [capture]2

atrial pacemaker artifact pacemaker artifact pacemaker . . . capture atrial pacemaker . . . capture

31 / 37

SLIDE 117

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Double Counting in Naive DP

32 / 37

SLIDE 118

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Double Counting in Naive DP

32 / 37

SLIDE 119

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Double Counting in Naive DP

32 / 37

SLIDE 120

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Double Counting in Naive DP

32 / 37

SLIDE 121

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Double Counting in Naive DP

32 / 37

SLIDE 122

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Counting Number of Encodings

n MLi(n) MSh(n) N(n) 1 2 2 21 = 2 2 8 8 23 = 8 3 46 80 27 = 128 4 < 2401 3584 215 = 32768 5 < 16807 533504 231 = 2147483648

Table 2: The number of possible encodings for small values of n

33 / 37

SLIDE 123

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity Level

Definition Relative ambiguity Ar(M1, M2) between models M1 and M2 is the ratio of log of number of canonical encodings: Ar(M1, M2) = lim

n→∞

log n

i=1 MM2(i)

log n

i=1 MM1(i)

where MM(i) is the number of encodings in model M for a sequence of length i. 34 / 37

SLIDE 124

Introduction Our Model Experiments Ambiguity Conclusion Appendix

Ambiguity Level

Definition Relative ambiguity Ar(M1, M2) between models M1 and M2 is the ratio of log of number of canonical encodings: Ar(M1, M2) = lim

n→∞

log n

i=1 MM2(i)

log n

i=1 MM1(i)

where MM(i) is the number of encodings in model M for a sequence of length i. Results in Ar(Li, Sh)≥ lim

n→∞

log C +10n log 2 3n log 2 = 10 3 >1 34 / 37

SLIDE 125

Introduction Our Model Experiments Ambiguity Conclusion Appendix