Transforming Projective Bilexical Dependency Grammars into - - PowerPoint PPT Presentation

transforming projective bilexical dependency grammars
SMART_READER_LITE
LIVE PREVIEW

Transforming Projective Bilexical Dependency Grammars into - - PowerPoint PPT Presentation

Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold Mark Johnson Microsoft Research Brown University ACL 2007 1 / 22 Motivation and summary Whats the relationship between CKY parsing


slide-1
SLIDE 1

Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold

Mark Johnson Microsoft Research Brown University ACL 2007

1 / 22

slide-2
SLIDE 2

Motivation and summary

◮ What’s the relationship between CKY parsing and the Eisner/Satta

O(n3) PBDG parsing algorithm? (c.f., McAllester 1999)

◮ split-head encoding, collecting left and right dependents

separately

◮ unfold-fold transform reorganizes grammar for efficient CKY

parsing

◮ Approach generalizes to 2nd-order dependencies

◮ predict argument given governor and sibling (McDonald 2006) ◮ predict argument given governor and governor’s governor

◮ In principle can use any CFG parsing or estimation algorithm for

PBDGs

◮ transformed grammars typically too large to enumerate ◮ my CKY implementations transform grammar on the fly 2 / 22

slide-3
SLIDE 3

Outline

Projective Bilexical Dependency Grammars Simple split-head encoding O(n3) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion

3 / 22

slide-4
SLIDE 4

Projective Bilexical Dependency Grammars

◮ Projective Bilexical Dependency Grammar (PBDG)

gave Sandy gave gave dog the dog gave bone a bone

◮ A dependency parse generated by the PBDG

0 Sandy gave the dog a bone

◮ Weights can be attached to dependencies (and preserved in CFG

transforms)

4 / 22

slide-5
SLIDE 5

A naive encoding of PBDGs as CFGs

S → Xu where 0 u Xu → u Xu → Xv Xu where v u Xu → Xu Xv where u v

gave the dog Xthe Xdog Xdog Xgave Xgave Xbone Xa a Xbone bone Xgave XSandy Sandy Xgave S

5 / 22

slide-6
SLIDE 6

Spurious ambiguity in naive encoding

◮ Naive encoding allows dependencies on different sides of head to

be freely reordered ⇒ Spurious ambiguity in CFG parses (not present in PBDG parses)

gave the dog Xthe Xdog Xdog Xgave Xgave Xbone Xa a Xbone bone Xgave XSandy Sandy Xgave S Sandy the dog Xthe Xdog Xdog Xbone Xa a Xbone bone Xgave Xgave gave XSandy Xgave Xgave S

6 / 22

slide-7
SLIDE 7

Parsing naive CFG encoding takes O(n5) time

◮ A production schema such as

Xu → Xu Xv has 5 variables, and so can match input in O(n5) different ways

k Xu u Xu Xv v i j

7 / 22

slide-8
SLIDE 8

Outline

Projective Bilexical Dependency Grammars Simple split-head encoding O(n3) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion

8 / 22

slide-9
SLIDE 9

Simple split-head encoding

◮ Replace input word u with a left variant uℓ and a right variant ur

(can be avoided in practice with fancy book-keeping) Sandy gave the dog a bone ⇓ Sandyℓ Sandyr gaveℓ gaver theℓ ther dogℓ dogr aℓ ar boneℓ boner

◮ PCFG separately collects left dependencies and right dependencies

the dog gaveℓ gaver

gaveR

Xdog

gaveR

Xbone a bone

gaveR

Lgave XSandy Sandy Lgave Xgave S

S → Xu where 0 u Xu → Lu

uR

where u ∈ Σ Lu → ul Lu → Xv Lu where v u

uR

→ ur

uR

uR Xv

where u v

9 / 22

slide-10
SLIDE 10

Simple split-head CFG parse

dogR

XSandy LSandy Sandyℓ Xdog gaver gaveℓ

gaveR gaveR

La aℓ

aR

ar Xa Lbone boneℓ Lbone boner

boneR

Xbone

SandyR

Sandyr Lgave Lgave Xgave S

gaveR

Lthe theℓ

theR

ther Xthe Ldog dogℓ Ldog dogr

10 / 22

slide-11
SLIDE 11

Lu and uR heads are phrase-peripheral ⇒ O(n4)

◮ Heads of Lu and uR are always at right (left) edge

uR

ur

uR

uℓ Xu Lu Xv1 Lu Lu Xv2 Xv3 Xv4

uR

S → Xu where 0 u Xu → Lu

uR

where u ∈ Σ Lu → ul Lu → Xv Lu where v u

uR

→ ur

uR

uR Xv

where u v

◮ Xu

→ Lu

uR

take O(n3)

◮ uR

uR Xv

take O(n4)

i = u

uR uR

Xv v j k

11 / 22

slide-12
SLIDE 12

Outline

Projective Bilexical Dependency Grammars Simple split-head encoding O(n3) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion

12 / 22

slide-13
SLIDE 13

The Unfold-Fold transform

◮ Unfold-fold originally proposed for transforming recursive programs;

used here to transform CFGs into new CFGs

◮ Unfolding a nonterminal replaces it with its expansion

A → α B γ B → β1 B → β2 . . . ⇒ A → α β1 γ A → α β2 γ B → β1 B → β2 . . .

◮ Folding is the inverse of unfolding (replace RHS with nonterminal)

A → α β γ B → β . . . ⇒ A → α B γ B → β . . .

◮ Transformed grammar generates same language (Sato 1992)

13 / 22

slide-14
SLIDE 14

Unfold-fold converts O(n4) to O(n3) grammar

◮ Unfold Xv responsible for O(n4) parse time

Lu → ul Lu → Xv Lu Xv → Lv

vR

⇒ Lu → ul Lu → Lv

vR Lu ◮ Introduce new non-terminals xMy (doesn’t change language) xMy

xR Ly ◮ Fold two children of Lu into xMy

Lu → ul Lu → Lv

vR Lu xMy

xR Ly

⇒ Lu → ul Lu → Lv

vMu xMy

xR Ly

14 / 22

slide-15
SLIDE 15

Transformed grammar collects left and right dependencies separately

uR

ur uℓ Xv

vR

Lv Lu

uR

Xv′ Lv′

v′R

Lu

uMv′

ur uℓ

vR

Lv Lu

uR

Lv′

v′R

Lu

vMu uR

◮ Xv constituents (which cause O(n4) parse time) no longer used ◮ Head annotations now all phrase peripheral ⇒ O(n3) parse time ◮ Dependencies can be recovered from parse tree ◮ Basically same as Eisner and Satta O(n3) algorithm

◮ explains why Inside-Outside sanity check fails for Eisner/Satta ◮ two copies of each terminal ⇒ each terminals’ Outside

probability is double the Inside sentence probability

15 / 22

slide-16
SLIDE 16

Parse using O(n3) transformed split-head grammar

S dogr ther Ldog

theR theMdog

theℓ Lthe Ldog dogℓ gaver

gaveR gaveMdog

gaveℓ

dogR gaveR

aℓ ar

aR

boneℓ Lbone

aMbone

La Lbone

gaveMbone boneR

boner

gaveR

Lgave Lgave

SandyR SandyMgave

LSandy Sandyℓ Sandyr

0 Sandy gave the dog a bone

16 / 22

slide-17
SLIDE 17

Parsing time of CFG encodings of same PBDG

CFG schemata sentences parsed / second Naive O(n5) CFG 45.4 O(n4) simple split-head CFG 406.2 O(n3) transformed split-head CFG 3580.0

◮ Weighted PBDG; all pairs of heads have some dependency weight ◮ Dependency weights precomputed before parsing begins ◮ Timing results on a 3.6GHz Pentium 4 machine parsing section 24

  • f the PTB

◮ CKY parsers with grammars hard-coded in C (no rule lookup) ◮ Dependency accuracy of Viterbi parses = 0.8918 for all grammars ◮ Feature extraction is much slower than even naive CFG

17 / 22

slide-18
SLIDE 18

Outline

Projective Bilexical Dependency Grammars Simple split-head encoding O(n3) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion

18 / 22

slide-19
SLIDE 19

Predict argument based on governor and sibling

S ther

theR theM

L

dog

theℓ Lthe Ldog

dogR

dogℓ dogr Lbone La

aM

L

bone aR

ar aℓ boneℓ boner gaver

gaveM

R

dog dogMbone gaveM

R

bone boneR gaveR

gaveℓ Sandyr

SandyR

Sandyℓ

SandyM

L

gave

LSandy Lgave ◮ Very similar to second-order algorithm given by McDonald (2006)

19 / 22

slide-20
SLIDE 20

Predict argument based on governor and governor’s governor

gaveM

R

bone

gaver

gaveR

Lthe theℓ

gaveMthe

ther dogℓ Ldog

theM

L

dog

dogr

dogR gaveR

La aℓ ar

gaveMa aM

L

bone

Lbone boneℓ boner

boneR gaveR

Lgave gaveℓ Sandyr

SandyM

L

gave

Sandyℓ LSandy

gaveM

R

dog

S Lgave ◮ Because left and right dependencies are assembled separately, only

captures 2nd-order dependencies where one dependency is leftward and other is rightward

20 / 22

slide-21
SLIDE 21

Outline

Projective Bilexical Dependency Grammars Simple split-head encoding O(n3) split-head CFGs via Unfold-Fold Transformations capturing 2nd-order dependencies Conclusion

21 / 22

slide-22
SLIDE 22

Conclusion and future work

◮ Presented a reduction from PBDGs to O(n3) parsable CFGs

◮ split-head CFG representation of PBDGs ◮ Unfold-fold transform

◮ CKY algorithm on resulting CFG simulates Eisner/Satta algorithm

  • n original PBDG

◮ Makes CFG techniques applicable to PBDGs

◮ max marginal parsing (Goodman 1996)

and other CFG parsing and estimation algorithms

◮ Can capture different dependencies, yielding different PDG models

◮ 2nd-order “horizontal” dependencies (McDonald 2006) ◮ what other combinations of dependencies can we capture?

(if we permit O(n4) parse time?)

◮ do any of these improve parsing accuracy? 22 / 22