Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold
Mark Johnson Microsoft Research Brown University ACL 2007
1 / 22
Transforming Projective Bilexical Dependency Grammars into - - PowerPoint PPT Presentation
Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold Mark Johnson Microsoft Research Brown University ACL 2007 1 / 22 Motivation and summary Whats the relationship between CKY parsing
1 / 22
◮ What’s the relationship between CKY parsing and the Eisner/Satta
◮ split-head encoding, collecting left and right dependents
◮ unfold-fold transform reorganizes grammar for efficient CKY
◮ Approach generalizes to 2nd-order dependencies
◮ predict argument given governor and sibling (McDonald 2006) ◮ predict argument given governor and governor’s governor
◮ In principle can use any CFG parsing or estimation algorithm for
◮ transformed grammars typically too large to enumerate ◮ my CKY implementations transform grammar on the fly 2 / 22
3 / 22
◮ Projective Bilexical Dependency Grammar (PBDG)
◮ A dependency parse generated by the PBDG
◮ Weights can be attached to dependencies (and preserved in CFG
4 / 22
5 / 22
◮ Naive encoding allows dependencies on different sides of head to
gave the dog Xthe Xdog Xdog Xgave Xgave Xbone Xa a Xbone bone Xgave XSandy Sandy Xgave S Sandy the dog Xthe Xdog Xdog Xbone Xa a Xbone bone Xgave Xgave gave XSandy Xgave Xgave S
6 / 22
◮ A production schema such as
7 / 22
8 / 22
◮ Replace input word u with a left variant uℓ and a right variant ur
◮ PCFG separately collects left dependencies and right dependencies
gaveR
gaveR
gaveR
uR
uR
uR
uR Xv
9 / 22
dogR
XSandy LSandy Sandyℓ Xdog gaver gaveℓ
gaveR gaveR
La aℓ
aR
ar Xa Lbone boneℓ Lbone boner
boneR
Xbone
SandyR
Sandyr Lgave Lgave Xgave S
gaveR
Lthe theℓ
theR
ther Xthe Ldog dogℓ Ldog dogr
10 / 22
◮ Heads of Lu and uR are always at right (left) edge
uR
uR
uR
uR
uR
uR
uR Xv
◮ Xu
uR
◮ uR
uR Xv
uR uR
11 / 22
12 / 22
◮ Unfold-fold originally proposed for transforming recursive programs;
◮ Unfolding a nonterminal replaces it with its expansion
◮ Folding is the inverse of unfolding (replace RHS with nonterminal)
◮ Transformed grammar generates same language (Sato 1992)
13 / 22
◮ Unfold Xv responsible for O(n4) parse time
vR
vR Lu ◮ Introduce new non-terminals xMy (doesn’t change language) xMy
xR Ly ◮ Fold two children of Lu into xMy
vR Lu xMy
xR Ly
vMu xMy
xR Ly
14 / 22
uR
vR
uR
v′R
uMv′
vR
uR
v′R
vMu uR
◮ Xv constituents (which cause O(n4) parse time) no longer used ◮ Head annotations now all phrase peripheral ⇒ O(n3) parse time ◮ Dependencies can be recovered from parse tree ◮ Basically same as Eisner and Satta O(n3) algorithm
◮ explains why Inside-Outside sanity check fails for Eisner/Satta ◮ two copies of each terminal ⇒ each terminals’ Outside
15 / 22
S dogr ther Ldog
theR theMdog
theℓ Lthe Ldog dogℓ gaver
gaveR gaveMdog
gaveℓ
dogR gaveR
aℓ ar
aR
boneℓ Lbone
aMbone
La Lbone
gaveMbone boneR
boner
gaveR
Lgave Lgave
SandyR SandyMgave
LSandy Sandyℓ Sandyr
16 / 22
◮ Weighted PBDG; all pairs of heads have some dependency weight ◮ Dependency weights precomputed before parsing begins ◮ Timing results on a 3.6GHz Pentium 4 machine parsing section 24
◮ CKY parsers with grammars hard-coded in C (no rule lookup) ◮ Dependency accuracy of Viterbi parses = 0.8918 for all grammars ◮ Feature extraction is much slower than even naive CFG
17 / 22
18 / 22
S ther
theR theM
L
dog
theℓ Lthe Ldog
dogR
dogℓ dogr Lbone La
aM
L
bone aR
ar aℓ boneℓ boner gaver
gaveM
R
dog dogMbone gaveM
R
bone boneR gaveR
gaveℓ Sandyr
SandyR
Sandyℓ
SandyM
L
gave
LSandy Lgave ◮ Very similar to second-order algorithm given by McDonald (2006)
19 / 22
gaveM
R
bone
gaver
gaveR
Lthe theℓ
gaveMthe
ther dogℓ Ldog
theM
L
dog
dogr
dogR gaveR
La aℓ ar
gaveMa aM
L
bone
Lbone boneℓ boner
boneR gaveR
Lgave gaveℓ Sandyr
SandyM
L
gave
Sandyℓ LSandy
gaveM
R
dog
S Lgave ◮ Because left and right dependencies are assembled separately, only
20 / 22
21 / 22
◮ Presented a reduction from PBDGs to O(n3) parsable CFGs
◮ split-head CFG representation of PBDGs ◮ Unfold-fold transform
◮ CKY algorithm on resulting CFG simulates Eisner/Satta algorithm
◮ Makes CFG techniques applicable to PBDGs
◮ max marginal parsing (Goodman 1996)
◮ Can capture different dependencies, yielding different PDG models
◮ 2nd-order “horizontal” dependencies (McDonald 2006) ◮ what other combinations of dependencies can we capture?
◮ do any of these improve parsing accuracy? 22 / 22