Inspecting the Structural Biases of Dependency Parsing Algorithms - - PowerPoint PPT Presentation

inspecting the structural biases of dependency parsing
SMART_READER_LITE
LIVE PREVIEW

Inspecting the Structural Biases of Dependency Parsing Algorithms - - PowerPoint PPT Presentation

Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael Elhadad Ben Gurion University CoNLL 2010, Sweden There are many ways to parse a sentence There are many ways to parse a sentence Transition Based


slide-1
SLIDE 1

Inspecting the Structural Biases

  • f Dependency Parsing Algorithms

Yoav Goldberg and Michael Elhadad

Ben Gurion University

CoNLL 2010, Sweden

slide-2
SLIDE 2

There are many ways to parse a sentence

slide-3
SLIDE 3

There are many ways to parse a sentence

Transition Based Parsers

  • Covington
  • Multiple Passes
  • Arc-Eager
  • Arc-Standard
  • With Swap Operator
  • First-Best Parser
  • DAG Parsing
  • With a Beam
  • With Dynamic Programming
  • With Tree Revision
  • Left-to-right
  • Right-to-left

Easy-First Parsing (check out

  • ur naacl 2010 paper)

Graph Based Parsers

  • First Order
  • Second Order (two children /

with grandparent)

  • Third Order
  • MST Algorithm / Matrix Tree

Theorem

  • Eisner Algorithm
  • Belief Propagation
  • With global constraints (ILP /

gibbs sampling) Combinations

  • Voted Ensembles (Sagae’s

way, Attardi’s way)

  • Stacked Learning
slide-4
SLIDE 4

We can build many reasonably accurate parsers

slide-5
SLIDE 5

We can build many reasonably accurate parsers Parser combinations work

slide-6
SLIDE 6

We can build many reasonably accurate parsers Parser combinations work ⇒ every parser has its strong points

slide-7
SLIDE 7

We can build many reasonably accurate parsers Parser combinations work ⇒ every parser has its strong points

Different parsers behave differently

slide-8
SLIDE 8

Open questions

slide-9
SLIDE 9

Open questions WHY do they behave as they do?

slide-10
SLIDE 10

Open questions WHY do they behave as they do? WHAT are the differences between them?

slide-11
SLIDE 11

More open questions Which linguistic phenomena are hard for parser X?

slide-12
SLIDE 12

More open questions Which linguistic phenomena are hard for parser X? What kinds of errors are common for parser Y?

slide-13
SLIDE 13

More open questions Which linguistic phenomena are hard for parser X? What kinds of errors are common for parser Y? Which parsing approach is most suitable for language Z?

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Previously

McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models”

slide-21
SLIDE 21

Previously

McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models”

◮ Focus on single-edge errors

slide-22
SLIDE 22

Previously

McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models”

◮ Focus on single-edge errors

◮ MST better for long edges, MALT better for short ◮ MST better near root, MALT better away from root ◮ MALT better at nouns and pronouns, MST better at others

slide-23
SLIDE 23

Previously

McDonald and Nivre 2007: “Characterize the Errors of Data-Driven Dependency Parsing Models”

◮ Focus on single-edge errors

◮ MST better for long edges, MALT better for short ◮ MST better near root, MALT better away from root ◮ MALT better at nouns and pronouns, MST better at others

◮ . . . but all these differences are very small

slide-24
SLIDE 24

we do something a bit different

slide-25
SLIDE 25

Assumptions

◮ Parsers fail in predictable ways ◮ those can be analyzed ◮ analysis should be done by inspecting trends rather than

individual decisions

slide-26
SLIDE 26

Note: We do not do error analysis

slide-27
SLIDE 27

Note: We do not do error analysis

◮ Error analysis is complicated

◮ one error can yield another / hide another

slide-28
SLIDE 28

Note: We do not do error analysis

◮ Error analysis is complicated

◮ one error can yield another / hide another

◮ Error analysis is local to one tree

◮ many factors may be involved in that single error

slide-29
SLIDE 29

Note: We do not do error analysis

◮ Error analysis is complicated

◮ one error can yield another / hide another

◮ Error analysis is local to one tree

◮ many factors may be involved in that single error

we are aiming at more global trends

slide-30
SLIDE 30

Structural Preferences

slide-31
SLIDE 31

Structural preferences

for a given language+syntactic theory

◮ Some structures are more common than others

◮ (think Right Branching for English)

slide-32
SLIDE 32

Structural preferences

for a given language+syntactic theory

◮ Some structures are more common than others

◮ (think Right Branching for English)

◮ Some structures are very rare

◮ (think non-projectivity, OSV constituent order)

slide-33
SLIDE 33

Structural preferences

parsers also exhibit structural preferences

slide-34
SLIDE 34

Structural preferences

parsers also exhibit structural preferences

◮ Some are explicit / by design

◮ e.g. projectivity

slide-35
SLIDE 35

Structural preferences

parsers also exhibit structural preferences

◮ Some are explicit / by design

◮ e.g. projectivity

◮ Some are implicit, stem from

◮ features ◮ modeling ◮ data ◮ interactions ◮ and other stuff

slide-36
SLIDE 36

Structural preferences

parsers also exhibit structural preferences

◮ Some are explicit / by design

◮ e.g. projectivity

◮ Some are implicit, stem from

◮ features ◮ modeling ◮ data ◮ interactions ◮ and other stuff

These trends are interesting!

slide-37
SLIDE 37

Structural Bias

slide-38
SLIDE 38

Structural bias

“The difference between the structural preferences of two languages”

slide-39
SLIDE 39

Structural bias

“The difference between the structural preferences of two languages” For us: Which structures tend to occur more in language than in parser?

slide-40
SLIDE 40

Bias vs. Error

related, but not the same

Parser X makes many PP attachment errors

◮ claim about error pattern

slide-41
SLIDE 41

Bias vs. Error

related, but not the same

Parser X makes many PP attachment errors

◮ claim about error pattern

Parser X tends to attach PPs low, while language Y tends to attach them high

◮ claim about structural bias (and also about errors)

slide-42
SLIDE 42

Bias vs. Error

related, but not the same

Parser X makes many PP attachment errors

◮ claim about error pattern

Parser X tends to attach PPs low, while language Y tends to attach them high

◮ claim about structural bias (and also about errors)

Parser X can never produce structure Y

◮ claim about structural bias

slide-43
SLIDE 43

Formulating Structural Bias

“given a tree, can we say where it came from?” ?

slide-44
SLIDE 44

Formulating Structural Bias

“given two trees of the same sentence, can we tell which parser produced each parse?” ?

slide-45
SLIDE 45

Formulating Structural Bias

“which parser produced which tree?” ? any predictor that can help us answer this question is an indicator of structural bias

slide-46
SLIDE 46

Formulating Structural Bias

“which parser produced which tree?” ? any predictor that can help us answer this question is an indicator of structural bias

slide-47
SLIDE 47

Formulating Structural Bias

“which parser produced which tree?” ? any predictor that can help us answer this question is an indicator of structural bias uncovering structural bias = searching for good predictors

slide-48
SLIDE 48

Method

◮ start with two sets of parses for same set of sentences ◮ look for predictors that allow to distinguish between trees in

each group

slide-49
SLIDE 49

Our Predictors

◮ all possible subtrees

slide-50
SLIDE 50

Our Predictors

◮ all possible subtrees ◮ always encode:

◮ parts of speech ◮ relations ◮ direction

JJ NN VB IN

slide-51
SLIDE 51

Our Predictors

◮ all possible subtrees ◮ always encode:

◮ parts of speech ◮ relations ◮ direction

◮ can encode also:

◮ lexical items

JJ NN VB IN/with

slide-52
SLIDE 52

Our Predictors

◮ all possible subtrees ◮ always encode:

◮ parts of speech ◮ relations ◮ direction

◮ can encode also:

◮ lexical items ◮ distance to parent

JJ 4 NN VB IN/with 2

slide-53
SLIDE 53

Search Procedure

boosting with subtree features

algorithm by Kudo and Matsumoto 2004.

slide-54
SLIDE 54

Search Procedure

boosting with subtree features

algorithm by Kudo and Matsumoto 2004. very briefly:

slide-55
SLIDE 55

Search Procedure

boosting with subtree features

algorithm by Kudo and Matsumoto 2004. very briefly:

◮ input: two sets of constituency trees ◮ while not done:

◮ choose a subtree that classifies most trees correctly ◮ re-weight trees based on errors

slide-56
SLIDE 56

Search Procedure

boosting with subtree features

algorithm by Kudo and Matsumoto 2004. very briefly:

◮ input: two sets of constituency trees ◮ while not done:

◮ choose a subtree that classifies most trees correctly ◮ re-weight trees based on errors

◮ output: weighted subtrees (= linear classifier)

slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62

conversion to constituency

JJ 3 NN VB IN/with 2 JJ→ d:3 VB← NN→ IN← w:with d:2 mandatory information at node label

  • ptional information as leaves
slide-63
SLIDE 63

conversion to constituency

JJ 3 NN VB IN/with 2 JJ→ d:3 VB← NN→ IN← w:with d:2 mandatory information at node label

  • ptional information as leaves
slide-64
SLIDE 64

conversion to constituency

JJ 3 NN VB IN/with 2 JJ→ d:3 VB← NN→ IN← w:with d:2 mandatory information at node label

  • ptional information as leaves
slide-65
SLIDE 65

Experiments

Analyzed Parsers

◮ Malt Eager ◮ Malt Standard ◮ Mst 1 ◮ Mst 2

slide-66
SLIDE 66

Experiments

Analyzed Parsers

◮ Malt Eager ◮ Malt Standard ◮ Mst 1 ◮ Mst 2

Data

◮ WSJ (converted using Johansson and Nugues) ◮ splits: parse-train (15-18), boost-train (10-11), boost-val

(4-7)

◮ gold pos-tags

slide-67
SLIDE 67

Quantitative Results

Q: Are the parsers biased with respect to English?

slide-68
SLIDE 68

Quantitative Results

Q: Are the parsers biased with respect to English? A: Yes

slide-69
SLIDE 69

Quantitative Results

Q: Are the parsers biased with respect to English? A: Yes Parser Train Accuracy Val Accuracy MST1 65.4 57.8 MST2 62.8 56.6 MALTE 69.2 65.3 MALTS 65.1 60.1

Table: Distinguishing parser output from gold-trees based on structural information

slide-70
SLIDE 70

Qualitative Results (teasers)

Over-produced by ArcEager:

slide-71
SLIDE 71

Qualitative Results (teasers)

Over-produced by ArcEager:

ROOT→“ ROOT→DT ROOT→WP (we knew it’s bad at root, now we know how!)

slide-72
SLIDE 72

Qualitative Results (teasers)

Over-produced by ArcEager:

ROOT→“ ROOT→DT ROOT→WP ROOT VBD VBD (we knew it’s bad at root, now we know how!)

slide-73
SLIDE 73

Qualitative Results (teasers)

Over-produced by ArcEager and ArcStandard

slide-74
SLIDE 74

Qualitative Results (teasers)

Over-produced by ArcEager and ArcStandard

→VBD − →

9+ VBD

→VBD − →

5−7 VBD

ROOT→VBZ→VBZ (prefer first verb above second one: because of left-to-right processing? )

slide-75
SLIDE 75

Qualitative Results (teasers)

Over-produced by MST1

slide-76
SLIDE 76

Qualitative Results (teasers)

Over-produced by MST1

→ IN NN NN NN NN VBZ (independence assumption failing)

slide-77
SLIDE 77

Qualitative Results (teasers)

Under-produced by MST1 and MST2

slide-78
SLIDE 78

Qualitative Results (teasers)

Under-produced by MST1 and MST2

→ NN IN CC NN (hard time in coordinating “heavy” NPs: due to pos-in-between feature?)

slide-79
SLIDE 79

Qualitative Results (teasers)

More in paper

You should read it

slide-80
SLIDE 80

Qualitative Results (teasers)

Software available

Try with your language / parser

slide-81
SLIDE 81

To Conclude

◮ understanding HOW parsers behave and WHY is

important

◮ we should do more of that

◮ we defined structural bias as way of characterizing

behaviour

◮ we presented an algorithm for uncovering structural bias ◮ applied to English with interesting results