Span-Based Constituency Parsing with Provably Optimal Dynamic - - PowerPoint PPT Presentation

span based constituency parsing with provably optimal
SMART_READER_LITE
LIVE PREVIEW

Span-Based Constituency Parsing with Provably Optimal Dynamic - - PowerPoint PPT Presentation

Span-Based Constituency Parsing with Provably Optimal Dynamic Oracles James Cross and Liang Huang Oregon State University EMNLP, Austin, TX November 2, 2016 Dependency vs. Constituency S ROOT NP VP PRP MD VBP S subj dobj dobj aux


slide-1
SLIDE 1

Span-Based Constituency Parsing with Provably Optimal Dynamic Oracles

James Cross and Liang Huang Oregon State University EMNLP, Austin, TX November 2, 2016

slide-2
SLIDE 2

Dependency vs. Constituency

2

search UAS Zhang & Nivre 2011 beam 92.9 Chen & Manning 2014 greedy 91.8 Zhou et al. (2015) beam 93.3 Weiss et al. (2015) beam 94.0

  • ur work (ACL 2016)

greedy 93.4 Andor et al. (2016) beam 94.4 search F1 Carreras et al. (2008) cubic 91.1 Shindo et al. (2012) cubic 91.1 Thang et al. (2015) (A*) ~cubic 91.1 Watanabe et al. (2015) beam 90.7 Vinyals et al. (2015) (WSJ) beam 90.5

Red = Neural

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

I do like eating fish PRP MD VBP VBG NN

ROOT subj aux dobj dobj

slide-3
SLIDE 3

Dependency vs. Constituency

2

search UAS Zhang & Nivre 2011 beam 92.9 Chen & Manning 2014 greedy 91.8 Zhou et al. (2015) beam 93.3 Weiss et al. (2015) beam 94.0

  • ur work (ACL 2016)

greedy 93.4 Andor et al. (2016) beam 94.4 search F1 Carreras et al. (2008) cubic 91.1 Shindo et al. (2012) cubic 91.1 Thang et al. (2015) (A*) ~cubic 91.1 Watanabe et al. (2015) beam 90.7 Vinyals et al. (2015) (WSJ) beam 90.5

This Work greedy 91.3

Red = Neural

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

I do like eating fish PRP MD VBP VBG NN

ROOT subj aux dobj dobj

slide-4
SLIDE 4

Outline

  • Span-Based Constituency Parsing
  • Bi-Directional LSTM Span Features
  • Provably Optimal Dynamic Oracle
  • Experiments

3

slide-5
SLIDE 5

Span-Based Parsing

  • Previous work uses tree structures on stack
  • We simplify to operate directly on sentence spans
  • Simple-to-implement linear-time parsing

4

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Stack Queue

do/MD I/PRP eating/VBG fish/NN

Stack Queue VP’ NP

like/VBP

previous work

  • ur work
slide-6
SLIDE 6

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {}

slide-7
SLIDE 7

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {} Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

slide-8
SLIDE 8

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {} Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Label-NP

slide-9
SLIDE 9

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {} Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Label-NP

slide-10
SLIDE 10

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {} Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Label-NP t = {0NP1} No-Label

slide-11
SLIDE 11

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {} Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Label-NP t = {0NP1} No-Label

slide-12
SLIDE 12

current brackets

5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {} Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

Shift do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Label-NP t = {0NP1} No-Label t = {0NP1} No-Label

slide-13
SLIDE 13

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-14
SLIDE 14

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-15
SLIDE 15

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1} No-Label

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-16
SLIDE 16

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Shift do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1} No-Label

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-17
SLIDE 17

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Shift do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1} No-Label t = {0NP1} No-Label

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-18
SLIDE 18

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Shift do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1} No-Label t = {0NP1} No-Label Shift do/MD like/VBP I/PRP

1 3 5

eating/VBG fish/NN

4

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-19
SLIDE 19

6

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD I/PRP like/VBP

1 2

eating/VBG fish/NN

3 4 5

t = {0NP1} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Shift do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1} No-Label t = {0NP1} No-Label Label-NP t = {0NP1, 4NP5} Shift do/MD like/VBP I/PRP

1 3 5

eating/VBG fish/NN

4

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-20
SLIDE 20

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-21
SLIDE 21

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 5

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-22
SLIDE 22

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 5

Label-S-VP t = {0NP1, 4NP5,

3S5, 3VP5}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

slide-23
SLIDE 23

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 5

Combine do/MD like/VBP eating/VBG fish/NN I/PRP

1 5

Label-S-VP t = {0NP1, 4NP5,

3S5, 3VP5}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

`

slide-24
SLIDE 24

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 5

Combine do/MD like/VBP eating/VBG fish/NN I/PRP

1 5

Label-S-VP t = {0NP1, 4NP5,

3S5, 3VP5}

Label-VP t = {0NP1, 4NP5,

3S5, 3VP5, 1VP5}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

`

slide-25
SLIDE 25

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 5

Combine do/MD like/VBP eating/VBG fish/NN I/PRP

1 5

Combine I/PRP do/MD like/VBP eating/VBG fish/NN

5

Label-S-VP t = {0NP1, 4NP5,

3S5, 3VP5}

Label-VP t = {0NP1, 4NP5,

3S5, 3VP5, 1VP5}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

`

slide-26
SLIDE 26

7

Structural (even step) Shift Combine Label (odd step) Label-X No-Label

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

t = {0NP1, 4NP5} Combine do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 5

Combine do/MD like/VBP eating/VBG fish/NN I/PRP

1 5

Combine I/PRP do/MD like/VBP eating/VBG fish/NN

5

Label-S-VP t = {0NP1, 4NP5,

3S5, 3VP5}

Label-VP t = {0NP1, 4NP5,

3S5, 3VP5, 1VP5}

Label-S t = {0NP1, 4NP5,

3S5, 3VP5, 1VP5, 0S5}

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

`

slide-27
SLIDE 27

Advantages of Span-Based System

  • Linear-time and fixed number of steps (well-suited

for beam search)

  • Separates prediction of structure and labels
  • Predicts rules of arbitrary arity with no binarization

8

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Stack Queue

slide-28
SLIDE 28

Advantages of Span-Based System

  • Linear-time and fixed number of steps (well-suited

for beam search)

  • Separates prediction of structure and labels
  • Predicts rules of arbitrary arity with no binarization

8

do/MD like/VBP I/PRP

1

eating/VBG fish/NN

3 4 5

Stack Queue

Q: How to decide which action to take? 
 What features represent spans?

slide-29
SLIDE 29

Outline

  • Span-Based Constituency Parsing
  • Bi-Directional LSTM Span Features
  • Provably Optimal Dynamic Oracle
  • Experiments

9

slide-30
SLIDE 30

Bi-LSTM Span Features

10

Sentence segment “eating fish” represented by two vectors:

  • Forward component: f5 - f3 (Wang and Chang, ACL 2016)
  • Backward component: b3 - b5

hsi I do like eating fish h/si f0 b0

1

f1 b1

2

f2 b2

3

f3 b3

4

f4 b4

5

f5 b5

slide-31
SLIDE 31

Span Features for Structure Action

11

to predict: Combine 4 bi-LSTM span features (no tree-structure information used)

pre-s1 s1 s0 queue do/MD like/VBP I/PRP eating/VBG fish/NN ./.

slide-32
SLIDE 32

Span Features for Label Action

12

to predict: Label-VP

pre-s0 s0 queue do/MD like/VBP eating/VBG fish/NN I/PRP ./.

3 bi-LSTM span features (no tree-structure information used)

slide-33
SLIDE 33

Training Scheme: Local

  • Every parser state is paired with a correct action
  • Separate multilayer perceptron for each action type
  • Baseline training scheme (static oracle) uses

canonical order with short-stack preference

13

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

shift label-NP shift no-label shift no-label combine no-label

slide-34
SLIDE 34

Training Scheme: Local

  • Every parser state is paired with a correct action
  • Separate multilayer perceptron for each action type
  • Baseline training scheme (static oracle) uses

canonical order with short-stack preference

13

S VP S VP NP NN fish VBG eating VBP like MD do NP PRP I

shift label-NP shift no-label shift no-label combine no-label combine

Correct action after mistake?

?

slide-35
SLIDE 35

Outline

  • Span-Based Constituency Parsing
  • Bi-Directional LSTM Span Features
  • Provably Optimal Dynamic Oracle
  • Experiments

14

slide-36
SLIDE 36

Dynamic Oracle: Motivation

  • Static oracle training assumes all correct actions
  • What to do after decoding mistakes?

15

gold path ? ? ?

slide-37
SLIDE 37

Dynamic Oracle: Motivation

  • Static oracle training assumes all correct actions
  • What to do after decoding mistakes?
  • Need a way to decide best action in arbitrary state:

Dynamic Oracle (everywhere-defined optimal policy)

15

gold path best action?

)

Dynamic Oracle!

slide-38
SLIDE 38

16

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

S . VP S VP NP NN VBG VBP MD NP PRP

slide-39
SLIDE 39

16

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

S . VP S VP NP NN VBG VBP MD NP PRP

slide-40
SLIDE 40

16

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

slide-41
SLIDE 41

16

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

slide-42
SLIDE 42

16

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

s0 next reachable

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

slide-43
SLIDE 43

16

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

Dynamic Oracle: Shift or Combine

s0 next reachable

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

slide-44
SLIDE 44

17

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

S . VP S VP NP NN VBG VBP MD NP PRP

slide-45
SLIDE 45

17

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

S . VP S VP NP NN VBG VBP MD NP PRP

slide-46
SLIDE 46

17

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

s0 next reachable

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

slide-47
SLIDE 47

17

do I like eating fish

Dynamic Oracle: Example

.

s0 s1

Dynamic Oracle:
 Combine

s0 next reachable

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

slide-48
SLIDE 48

18

do I like fish

Dynamic Oracle: Example

.

s0 s1

S . VP S VP NP NN VBG VBP MD NP PRP

eating

slide-49
SLIDE 49

18

do I like fish

Dynamic Oracle: Example

.

s0 s1

S . VP S VP NP NN VBG VBP MD NP PRP

eating

slide-50
SLIDE 50

18

do I like fish

Dynamic Oracle: Example

.

s0 s1

s0 next reachable

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

eating

slide-51
SLIDE 51

18

do I like fish

Dynamic Oracle: Example

.

s0 s1

Dynamic Oracle:
 Shift

s0 next reachable

smallest reachable gold bracket incl. s0 S . VP S VP NP NN VBG VBP MD NP PRP

eating

slide-52
SLIDE 52

Dynamic Oracle: Full Definition

19

  • Structure actions depend on next reachable

bracket in gold tree

  • All non-bracket label states —> No-Label
  • All gold-bracket label states —> Correct label(s)

s0 next

Shift —>

s0 next

<— Combine/Shift —>

s0 next

<—Combine

slide-53
SLIDE 53

Dynamic Oracle: Full Definition

19

  • Structure actions depend on next reachable

bracket in gold tree

  • All non-bracket label states —> No-Label
  • All gold-bracket label states —> Correct label(s)

s0 next

Shift —>

s0 next

<— Combine/Shift —>

s0 next

<—Combine

Gold Brackets Current Brackets

slide-54
SLIDE 54

Reachable

Dynamic Oracle: Full Definition

19

  • Structure actions depend on next reachable

bracket in gold tree

  • All non-bracket label states —> No-Label
  • All gold-bracket label states —> Correct label(s)

s0 next

Shift —>

s0 next

<— Combine/Shift —>

s0 next

<—Combine

Gold Brackets Current Brackets

slide-55
SLIDE 55

Reachable

Dynamic Oracle: Full Definition

19

  • Structure actions depend on next reachable

bracket in gold tree

  • All non-bracket label states —> No-Label
  • All gold-bracket label states —> Correct label(s)

s0 next

Shift —>

s0 next

<— Combine/Shift —>

s0 next

<—Combine

Gold Brackets Current Brackets

  • ptimal

tree

slide-56
SLIDE 56

Dynamic Oracle: Optimality/Complexity

  • First provably optimal oracle for constituency parsing (optimal

in both precision and recall)

  • After each action next reachable may (or may not) be updated

by tracing parent link in gold tree

  • Also O(n) steps, thus amortized O(1) time
  • Dependency parsing oracle (arc-std): worst case O(n3) per step

20

slide-57
SLIDE 57

Dynamic Oracle: Optimality/Complexity

  • First provably optimal oracle for constituency parsing (optimal

in both precision and recall)

  • After each action next reachable may (or may not) be updated

by tracing parent link in gold tree

  • Also O(n) steps, thus amortized O(1) time
  • Dependency parsing oracle (arc-std): worst case O(n3) per step

20

shift (oracle)

slide-58
SLIDE 58

Dynamic Oracle: Optimality/Complexity

  • First provably optimal oracle for constituency parsing (optimal

in both precision and recall)

  • After each action next reachable may (or may not) be updated

by tracing parent link in gold tree

  • Also O(n) steps, thus amortized O(1) time
  • Dependency parsing oracle (arc-std): worst case O(n3) per step

20

shift (oracle) combine (incorrect)

slide-59
SLIDE 59

Training with Dynamic Oracle

21

(scores on PTB 22) Recall Prec. F1 Static Oracle 91.34 91.43 91.38

static

slide-60
SLIDE 60

Training with Dynamic Oracle

  • Basic dynamic oracle: follow current model

21

(scores on PTB 22) Recall Prec. F1 Static Oracle 91.34 91.43 91.38

static dynamic

slide-61
SLIDE 61

Training with Dynamic Oracle

  • Basic dynamic oracle: follow current model

21

(scores on PTB 22) Recall Prec. F1 Static Oracle 91.34 91.43 91.38 Dynamic Oracle 91.14 91.61 91.38

static dynamic

slide-62
SLIDE 62

Training with Dynamic Oracle

  • Basic dynamic oracle: follow current model
  • Problem: overfits training data, making fewer mistakes than test

21

(scores on PTB 22) Recall Prec. F1 Static Oracle 91.34 91.43 91.38 Dynamic Oracle 91.14 91.61 91.38

static dynamic

slide-63
SLIDE 63

Training with Dynamic Oracle

  • Basic dynamic oracle: follow current model
  • Problem: overfits training data, making fewer mistakes than test
  • Exploration: sample from softmax distribution (Ballesteros et

al., 2016) to encourage more mistakes

21

(scores on PTB 22) Recall Prec. F1 Static Oracle 91.34 91.43 91.38 Dynamic Oracle 91.14 91.61 91.38

static dynamic + exploration

slide-64
SLIDE 64

Training with Dynamic Oracle

  • Basic dynamic oracle: follow current model
  • Problem: overfits training data, making fewer mistakes than test
  • Exploration: sample from softmax distribution (Ballesteros et

al., 2016) to encourage more mistakes

21

(scores on PTB 22) Recall Prec. F1 Static Oracle 91.34 91.43 91.38 Dynamic Oracle 91.14 91.61 91.38 Dynamic + Exploration 91.07 92.22 91.64

static dynamic + exploration

slide-65
SLIDE 65

Outline

  • Span-Based Constituency Parsing
  • Bi-Directional LSTM Span Features
  • Provably Optimal Dynamic Oracle
  • Experiments

22

slide-66
SLIDE 66

Architecture

  • 50-dim word and 20-dim tag embeddings
  • No pre-training
  • Each LSTM layer 200 units each direction
  • 200 ReLU units for each of structure and label predictors

23

slide-67
SLIDE 67

Results on Penn Treebank

24

Parser Search Recall Prec. F1 Carreras et al. (2008) cubic 90.7 91.4 91.1 Shindo et al. (2012) cubic 91.1 Thang et al. (2015) ~cubic 91.1 Watanabe et al. (2015) beam 90.7 Static Oracle greedy 90.7 91.4 91.0 Dynamic + Exploration greedy 90.5 92.1 91.3

  • State of the art despite: simple system with greedy

actions and small embeddings trained from scratch

slide-68
SLIDE 68

Parsing Morphologically Rich Languages

25

lemma = perspective coarse_POS = N gender = feminine number = plural subcategory = common SENT PONCT ? NP-SUJ NC perspectives DET les VN V sont NP-ATS DETWH Quelles

slide-69
SLIDE 69

Results on French Treebank

  • Morphological feature embeddings (10 dim. each)
  • Additional input to recurrent network
  • For French, we used SPMRL 2014 predicted features

26

Parser Recall Prec. F1 Björkelund et al. (2014) 82.53 Static Oracle 83.50 82.87 83.18 Dynamic + Exploration 81.90 84.77 83.31

slide-70
SLIDE 70

Summary

  • Simple, easy-to-implement span-based parsing system
  • No tree/label information in features (good candidate

for dynamic programming)

  • Linear time parsing with greedy decoding
  • No pre-trained embeddings, small architecture, and

minimal hyper-parameter tuning (trained on CPU)

  • First optimal dynamic oracle for constituency parsing

27

slide-71
SLIDE 71

Thank You!

pre-s1 s1 s0 queue

do/MD like/VBP I/PRP eating/VBG fish/NN ./.

s0 next

Shift —>

s0 next

<— Combine/Shift —>

s0 next

<—Combine