[PPT] - Weighted Tree Transducers in Natural Language Processing Andreas PowerPoint Presentation

SLIDE 1

Weighted Tree Transducers in Natural Language Processing

Andreas Maletti

Universitat Rovira i Virgili Tarragona, Spain email: andreas.maletti@urv.cat

Wrocław — May 17, 2010

Weighted Tree Transducers in NLP Andreas Maletti · 1

SLIDE 2

Collaborators

Joint work with

JOOST ENGELFRIET, LIACS, Leiden, The Netherlands ZOLT´

AN F ¨ UL ¨ OP, University of Szeged, Hungary

JONATHAN GRAEHL, USC, Los Angeles, CA, USA MARK HOPKINS, Language Weaver Inc., Los Angeles, CA, USA KEVIN KNIGHT, USC, Los Angeles, CA, USA ERIC LILIN, Universit´ e de Lille, France GIORGIO SATTA, University of Padua, Italy HEIKO VOGLER, TU Dresden, Germany

Weighted Tree Transducers in NLP Andreas Maletti · 2

SLIDE 3

1

Machine Translation

2

Weighted Tree Transducer

3

Expressive Power

4

Standard Algorithms

5

Implementation

Weighted Tree Transducers in NLP Andreas Maletti · 3

SLIDE 4

Motivation

Example (Input in Catalan)

Benvolguda i benvolgut membre de la comunitat universit` aria, Avui dilluns es duu a terme el darrer Consell de Govern del meu mandat com a rector; el proper dia 6 de maig, com correspon, hi haur` a una nova elecci´

on tota la comunitat universit`

aria podr` a escollir nou rector o rectora. Aquest darrer consell t´ e, naturalment, un car` acter marcadament t` ecnic; l’ordre del dia complet el trobar` as adjunt al final d’aquest text. A continuaci´

et comento nom´

es els punts que, al meu parer, poden ser m´ es del teu inter` es.

Translation (GOOGLE TRANSLATE) to English

Dear and beloved member of the university community, Today is Monday carried out by the Governing Council last of my term as rector, the next day, May 6, as appropriate, there will be another election where the entire university community can choose new rector. This last advice is, of course, a markedly technician complete agenda can be found attached to the end of this text. Then I said only the points that I believe may be of interest.

Weighted Tree Transducers in NLP Andreas Maletti · 4

SLIDE 5

Motivation

Example (Input in Catalan)

Benvolguda i benvolgut membre de la comunitat universit` aria, Avui dilluns es duu a terme el darrer Consell de Govern del meu mandat com a rector; el proper dia 6 de maig, com correspon, hi haur` a una nova elecci´

on tota la comunitat universit`

aria podr` a escollir nou rector o rectora. Aquest darrer consell t´ e, naturalment, un car` acter marcadament t` ecnic; l’ordre del dia complet el trobar` as adjunt al final d’aquest text. A continuaci´

et comento nom´

es els punts que, al meu parer, poden ser m´ es del teu inter` es.

Translation (GOOGLE TRANSLATE) to English

Dear and beloved member of the university community, Today is Monday carried out by the Governing Council last of my term as rector, the next day, May 6, as appropriate, there will be another election where the entire university community can choose new rector. This last advice is, of course, a markedly technician complete agenda can be found attached to the end of this text. Then I said only the points that I believe may be of interest.

Weighted Tree Transducers in NLP Andreas Maletti · 4

SLIDE 6

Machine Translation System

Input sentence (Benvolguda i benvolgut ...) ⇓ Translation system ⇓ Output sentence (Dear and beloved ...)

Weighted Tree Transducers in NLP Andreas Maletti · 5

SLIDE 7

Machine Translation System

Input sentence (Benvolguda i benvolgut ...) f ⇓ Translation system ⇓ Output sentence (Dear and beloved ...) e

Statistical translation system

e = argmax

e

p(e|f)

Weighted Tree Transducers in NLP Andreas Maletti · 5

SLIDE 8

Noisy Channel Viewpoint

Input sentence (Benvolguda i benvolgut ...) f ⇓ Identity translation ⇓ Output sentence (Dear and beloved ...) e ⇐ Error signal (Noise)

Weighted Tree Transducers in NLP Andreas Maletti · 6

SLIDE 9

Noisy Channel Viewpoint

Input sentence (Benvolguda i benvolgut ...) f ⇓ Identity translation ⇓ Output sentence (Dear and beloved ...) e ⇐ Error signal (Noise)

Bayes’ theorem

e = argmax

e

p(e|f) = argmax

e

p(f|e) · p(e) p(f) = argmax

e

p(f|e) · p(e)

Weighted Tree Transducers in NLP Andreas Maletti · 6

SLIDE 10

Components

Optimization problem

e = argmax

e

p(f|e) · p(e)

Required models

p(e) — language model p(f|e) — translation model Input Sentence f ⇐ Translation model p(f|e) ⇐ Language model p(e) ⇐ Output sentence e

Weighted Tree Transducers in NLP Andreas Maletti · 7

SLIDE 11

Translation Approach

Overview

Phrase Syntax Semantics Foreign English

Weighted Tree Transducers in NLP Andreas Maletti · 8

SLIDE 12

Translation Approach

Overview

Phrase Syntax Semantics Foreign English

Weighted Tree Transducers in NLP Andreas Maletti · 8

SLIDE 13

Translation Approach

Overview

Phrase Syntax Semantics Foreign English

Weighted Tree Transducers in NLP Andreas Maletti · 8

SLIDE 14

Translation Approach

Overview

Phrase Syntax Semantics Foreign English

Weighted Tree Transducers in NLP Andreas Maletti · 8

SLIDE 15

Translation Approach

Overview

Phrase Syntax Semantics Foreign English

Weighted Tree Transducers in NLP Andreas Maletti · 8

SLIDE 16

Why Syntax?

Example

She saw the boy with the telescope.

Weighted Tree Transducers in NLP Andreas Maletti · 9

SLIDE 17

Why Syntax?

Example

She saw the boy with the telescope.

S NP She VP VB saw NP NP the boy PP PREP with NP the telescope.

Weighted Tree Transducers in NLP Andreas Maletti · 9

SLIDE 18

Why Syntax?

Example

She saw the boy with the telescope.

S NP She VP VP VB saw NP the boy PP PREP with NP the telescope.

Weighted Tree Transducers in NLP Andreas Maletti · 9

SLIDE 19

Syntactic Analysis

Output sentence

Holly picks flowers to tie them around July’s neck.

Parser output

S NN Holly VP VB picks NN flowers ATO TO to VP VB tie PP them WHOBJ PRP around NN3 July’s NN neck.

Weighted Tree Transducers in NLP Andreas Maletti · 10

SLIDE 20

Syntax-based Machine Translation

S NN Holly VP VB picks NN flowers ATO TO to VP VB tie PP them WHOBJ PRP around NN3 July’s NN neck.

Weighted Tree Transducers in NLP Andreas Maletti · 11

SLIDE 21

Syntax-based Machine Translation

S NN Holly VP VB pfl¨ uckt NN Blumen ATO TO to VP VB tie PP them WHOBJ PRP around NN3 July’s NN neck.

Weighted Tree Transducers in NLP Andreas Maletti · 11

SLIDE 22

Syntax-based Machine Translation

S NN Holly VP VB pfl¨ uckt NN Blumen ATO TO , um VP VB1 tie PP2 them WHOBJ3 PRP around NN3 July’s NN neck.

Weighted Tree Transducers in NLP Andreas Maletti · 11

SLIDE 23

Syntax-based Machine Translation

S NN Holly VP VB pfl¨ uckt NN Blumen ATO TO , um VP PP2 them WHOBJ3 PRP around NN3 July’s NN neck. VB1 tie

Weighted Tree Transducers in NLP Andreas Maletti · 11

SLIDE 24

Syntax-based Machine Translation

S NN Holly VP VB pfl¨ uckt NN Blumen ATO TO , um VP PP sie WHOBJ PRP um NN3 Julys NN Hals VB zu binden.

Weighted Tree Transducers in NLP Andreas Maletti · 11

SLIDE 25

Weight Structure

Definition

(A, +, ·, 0, 1) is a (commutative) semiring if (A, +, 0) and (A, ·, 1) commutative monoids, · distributes over +, and a · 0 = 0 for every a ∈ A.

Example

({0, 1}, max, min, 0, 1) BOOLEAN semiring (R, +, ·, 0, 1) semiring of real numbers (N ∪ {∞}, min, +, ∞, 0) any field, ring, etc.

Weighted Tree Transducers in NLP Andreas Maletti · 13

SLIDE 27

Syntax

Definition

(Q, Σ, ∆, I, R) (weighted) extended (top-down) tree transducer (xtt) Q finite set of states (considered unary) Σ and ∆ ranked alphabets I : Q → A initial weight distribution R : Q(TΣ(X)) × T∆(Q(X)) → A rule weight assignment s.t.

◮ supp(R) is finite ◮ for every (l, r) ∈ supp(R) there is k ∈ N such that l ∈ Q(CΣ(Xk))

and r ∈ T∆(Q(Xk))

◮ {l, r} ⊆ Q(X) for every (l, r) ∈ supp(R)

References

ARNOLD, DAUCHET: Bi-transductions de forˆ

ets. ICALP 1976

GRAEHL, KNIGHT: Training tree transducers. HLT-NAACL 2004

Weighted Tree Transducers in NLP Andreas Maletti · 14

SLIDE 28

Syntax — Example

S NP DT the N boy VP V saw NP DT the N door ⇒∗ S CONJ wa- [and] S′ V ra’aa [saw] NP N atefl [the boy] NP N albab [the door]

Question

How to implement this English → Arabic translation using xtt?

Weighted Tree Transducers in NLP Andreas Maletti · 15

SLIDE 29

Syntax — Example (cont’d)

Example

States {q, qS, qV, qNP} of which only q is initial q(x1) → qS(x1) (r1) q(x1) → S(CONJ(wa-), qS(x1)) (r2) qS(S(x1, VP(x2, x3))) → S′(qV(x2), qNP(x1), qNP(x3)) (r3) qV(V(saw)) → V(ra’aa) (r4) qNP(NP(DT(the), N(boy))) → NP(N(atefl)) (r5) qNP(NP(DT(the), N(door))) → NP(N(albab)) (r6)

Weighted Tree Transducers in NLP Andreas Maletti · 16

SLIDE 30

Syntax — Example (cont’d)

Example

1

Nondeterminism and epsilon rules (rules r1 and r2) q x1 → qS x1 and q x1 → S CONJ wa- qS x1

Weighted Tree Transducers in NLP Andreas Maletti · 17

SLIDE 31

Syntax — Example (cont’d)

Example

1

Nondeterminism and epsilon rules (rules r1 and r2)

2

Deep attachment of variables (rule r3) qS S x1 VP x2 x3 → S′ qV x2 qNP x1 qNP x3

Weighted Tree Transducers in NLP Andreas Maletti · 17

SLIDE 32

Syntax — Example (cont’d)

Example

1

Nondeterminism and epsilon rules (rules r1 and r2)

2

Deep attachment of variables (rule r3)

3

Finite look-ahead (rules r4 and r5) qNP NP DT the N boy → NP N atefl and qNP NP DT the N door → NP N albab

Weighted Tree Transducers in NLP Andreas Maletti · 17

SLIDE 33

Semantics

Definition

Let ξ, ζ ∈ T∆(Q(TΣ)). Then ξ

a

= ⇒M ζ if there exist

1

a rule R(q(t), u) = a = 0

2

a substitution θ: X → TΣ

3

a position w ∈ pos(ξ) such that ξ|w = q(tθ) and ζ = ξ[uθ]w

Definition

Computed transformation (t ∈ TΣ and u ∈ T∆): τM(t, u) =

q∈Q

q(t)

a1

= ⇒···

an

= ⇒u left-most derivation

I(q) · a1 · . . . · an

Weighted Tree Transducers in NLP Andreas Maletti · 18

SLIDE 34

Semantics

Definition

Let ξ, ζ ∈ T∆(Q(TΣ)). Then ξ

a

= ⇒M ζ if there exist

1

a rule R(q(t), u) = a = 0

2

a substitution θ: X → TΣ

3

a position w ∈ pos(ξ) such that ξ|w = q(tθ) and ζ = ξ[uθ]w

Definition

Computed transformation (t ∈ TΣ and u ∈ T∆): τM(t, u) =

q∈Q

q(t)

a1

= ⇒···

an

= ⇒u left-most derivation

I(q) · a1 · . . . · an

Weighted Tree Transducers in NLP Andreas Maletti · 18

SLIDE 35

Semantics — Example

Rule

q(x1) → S(CONJ(wa-), qS(x1))

Example

q S NP DT the N boy VP V saw NP DT the N door

Weighted Tree Transducers in NLP Andreas Maletti · 19

SLIDE 36

Semantics — Example

Rule

qS(S(x1, VP(x2, x3))) → S′(qV(x2), qNP(x1), qNP(x3))

Example

S CONJ wa- qS S NP DT the N boy VP V saw NP DT the N door

Weighted Tree Transducers in NLP Andreas Maletti · 19

SLIDE 37

Semantics — Example

Rule

qV(V(saw)) → V(ra’aa)

Example

S CONJ wa- S′ qV V saw qNP NP DT the N boy qNP NP DT the N door

Weighted Tree Transducers in NLP Andreas Maletti · 19

SLIDE 38

Semantics — Example

Rule

qNP(NP(DT(the), N(boy))) → NP(N(atefl))

Example

S CONJ wa- S′ V ra’aa qNP NP DT the N boy qNP NP DT the N door

Weighted Tree Transducers in NLP Andreas Maletti · 19

SLIDE 39

Semantics — Example

Rule

qNP(NP(DT(the), N(door))) → NP(N(albab))

Example

S CONJ wa- S′ V ra’aa NP N atefl qNP NP DT the N door

Weighted Tree Transducers in NLP Andreas Maletti · 19

SLIDE 40

Semantics — Example

Rule Example

S CONJ wa- S′ V ra’aa NP N atefl NP N albab

Weighted Tree Transducers in NLP Andreas Maletti · 19

SLIDE 41

Syntactic Restrictions

Definition

linear: no right-hand side contains a duplicate variable non-deleting: all right-hand sides contain all variables of their left-hand side epsilon-free: no rules of the form q(x) → u

Definition

tdtt: every left-hand side is of the form q(σ(x1, . . . , xk))

Abbreviations

ln-tdtt: linear non-deleting tdtt ln-xtt: linear non-deleting xtt

Weighted Tree Transducers in NLP Andreas Maletti · 21

SLIDE 43

Syntactic Restrictions

Definition

linear: no right-hand side contains a duplicate variable non-deleting: all right-hand sides contain all variables of their left-hand side epsilon-free: no rules of the form q(x) → u

Definition

tdtt: every left-hand side is of the form q(σ(x1, . . . , xk))

Abbreviations

ln-tdtt: linear non-deleting tdtt ln-xtt: linear non-deleting xtt

Weighted Tree Transducers in NLP Andreas Maletti · 21

SLIDE 44

Syntactic Restrictions

Definition

linear: no right-hand side contains a duplicate variable non-deleting: all right-hand sides contain all variables of their left-hand side epsilon-free: no rules of the form q(x) → u

Definition

tdtt: every left-hand side is of the form q(σ(x1, . . . , xk))

Abbreviations

ln-tdtt: linear non-deleting tdtt ln-xtt: linear non-deleting xtt

Weighted Tree Transducers in NLP Andreas Maletti · 21

SLIDE 45

Wanted Expressivity

Criteria

1

Generalize FST including epsilon rules (ln-tdtt: no, ln-xtt: yes)

2

Efficiently trainable (ln-tdtt: yes, ln-xtt: yes)

3

Can handle rotations (ln-tdtt: no, ln-xtt: yes)

4

Can handle flattenings (ln-tdtt: no, ln-xtt: yes)

5

Preservation of Recognizability (ln-tdtt: yes, ln-xtt: yes)

6

Closure under composition (ln-tdtt: yes, ln-xtt: no)

Weighted Tree Transducers in NLP Andreas Maletti · 22

SLIDE 46

Wanted Expressivity

Criteria

1

Generalize FST including epsilon rules (ln-tdtt: no, ln-xtt: yes)

2

Efficiently trainable (ln-tdtt: yes, ln-xtt: yes)

3

Can handle rotations (ln-tdtt: no, ln-xtt: yes)

4

Can handle flattenings (ln-tdtt: no, ln-xtt: yes)

5

Preservation of Recognizability (ln-tdtt: yes, ln-xtt: yes)

6

Closure under composition (ln-tdtt: yes, ln-xtt: no)

Weighted Tree Transducers in NLP Andreas Maletti · 22

SLIDE 47

Wanted Expressivity

Criteria

1

Generalize FST including epsilon rules (ln-tdtt: no, ln-xtt: yes)

2

Efficiently trainable (ln-tdtt: yes, ln-xtt: yes)

3

Can handle rotations (ln-tdtt: no, ln-xtt: yes) σ σ s t u ⇒∗ σ s σ t u

4

Can handle flattenings (ln-tdtt: no, ln-xtt: yes)

5

Preservation of Recognizability (ln-tdtt: yes, ln-xtt: yes)

6

Closure under composition (ln-tdtt: yes, ln-xtt: no)

Weighted Tree Transducers in NLP Andreas Maletti · 22

SLIDE 48

Wanted Expressivity

Criteria

1

Generalize FST including epsilon rules (ln-tdtt: no, ln-xtt: yes)

2

Efficiently trainable (ln-tdtt: yes, ln-xtt: yes)

3

Can handle rotations (ln-tdtt: no, ln-xtt: yes)

4

Can handle flattenings (ln-tdtt: no, ln-xtt: yes) σ σ s t u ⇒∗ δ s t u

5

Preservation of Recognizability (ln-tdtt: yes, ln-xtt: yes)

6

Closure under composition (ln-tdtt: yes, ln-xtt: no)

Weighted Tree Transducers in NLP Andreas Maletti · 22

SLIDE 49

Wanted Expressivity

Criteria

1

Generalize FST including epsilon rules (ln-tdtt: no, ln-xtt: yes)

2

Efficiently trainable (ln-tdtt: yes, ln-xtt: yes)

3

Can handle rotations (ln-tdtt: no, ln-xtt: yes)

4

Can handle flattenings (ln-tdtt: no, ln-xtt: yes)

5

Preservation of Recognizability (ln-tdtt: yes, ln-xtt: yes)

6

Closure under composition (ln-tdtt: yes, ln-xtt: no)

Weighted Tree Transducers in NLP Andreas Maletti · 22

SLIDE 50

Wanted Expressivity

Criteria

1

Generalize FST including epsilon rules (ln-tdtt: no, ln-xtt: yes)

2

Efficiently trainable (ln-tdtt: yes, ln-xtt: yes)

3

Can handle rotations (ln-tdtt: no, ln-xtt: yes)

4

Can handle flattenings (ln-tdtt: no, ln-xtt: yes)

5

Preservation of Recognizability (ln-tdtt: yes, ln-xtt: yes)

6

Closure under composition (ln-tdtt: yes, ln-xtt: no) Criterion \ Model ln-tdtt ln-xtt FST generalization – x trainable x x rotations – x flattenings – x

pres. recog.

x x composition x –

Weighted Tree Transducers in NLP Andreas Maletti · 22

SLIDE 51

Features of xtt

Discriminative features

Finite look-ahead Epsilon rules Deep attachment of variables

Weighted Tree Transducers in NLP Andreas Maletti · 23

SLIDE 52

Features of xtt

Discriminative features

Finite look-ahead Epsilon rules Deep attachment of variables

Weighted Tree Transducers in NLP Andreas Maletti · 23

SLIDE 53

Features of xtt

Discriminative features

Finite look-ahead Epsilon rules Deep attachment of variables

Weighted Tree Transducers in NLP Andreas Maletti · 23

SLIDE 54

Hasse Diagram (if the weight structure is not a ring)

XTOP XTOPR l-XTOP l-XTOPR ln-XTOP e-XTOP TOPR le-XTOP le-XTOPR lne-XTOP l-TOPF l-TOPR TOP l-TOP ln-TOP Weighted Tree Transducers in NLP Andreas Maletti · 24

SLIDE 55

Training

S NP DT the N boy VP V saw NP DT the N door ⇒∗ S CONJ wa- [and] S′ V ra’aa [saw] NP N atefl [the boy] NP N albab [the door]

References

GALLEY, HOPKINS, KNIGHT, MARCU: What’s in a translation rule? HLT-NAACL 2004 GRAEHL, KNIGHT, MAY: Training tree transducers.

Comput. Ling. 34, 2008

Weighted Tree Transducers in NLP Andreas Maletti · 26

SLIDE 57

Training

S NP DT the N boy VP V saw NP DT the N door ⇒∗ S CONJ wa- [and] S′ V ra’aa [saw] NP N atefl [the boy] NP N albab [the door]

References

GALLEY, HOPKINS, KNIGHT, MARCU: What’s in a translation rule? HLT-NAACL 2004 GRAEHL, KNIGHT, MAY: Training tree transducers.

Comput. Ling. 34, 2008

Weighted Tree Transducers in NLP Andreas Maletti · 26

SLIDE 58

Training (Cont’d)

the boy saw the door wa- ra’aa atefl albab

Alignment

Generate rules

S NP DT the N boy VP V saw NP DT the N door ⇒∗ S CONJ wa- [and] S′ V ra’aa [saw] NP N atefl [the boy] NP N albab [the door]

Weighted Tree Transducers in NLP Andreas Maletti · 27

SLIDE 59

Training (Cont’d)

Generated STSG rules

NP DT the N boy → NP N atefl NP DT the N door → NP N albab V saw → V ra’aa S NP VP V NP → S V NP NP S → S CONJ wa- S

Conclusion

ln-xtt efficiently trainable Can we use states? Nonlinearity? Deletion? ...

Weighted Tree Transducers in NLP Andreas Maletti · 28

SLIDE 60

Training (Cont’d)

Generated STSG rules

NP DT the N boy → NP N atefl NP DT the N door → NP N albab V saw → V ra’aa S NP VP V NP → S V NP NP S → S CONJ wa- S

Conclusion

ln-xtt efficiently trainable Can we use states? Nonlinearity? Deletion? ...

Weighted Tree Transducers in NLP Andreas Maletti · 28

SLIDE 61

Training (Cont’d)

Setting the weights

count how often an obtained rule is used in a corpus (absolute frequency) set rule weight to relative frequency

Optimize the weights

EM algorithm [DEMPSTER, LAIRD, RUBIN 1977] Gradient descent etc.

Weighted Tree Transducers in NLP Andreas Maletti · 29

SLIDE 62

Composition

Theorem

Every l-TOP ⊆ L ⊆ XTOP is not closed under composition.

Proof.

Composition closure of l-TOP is l-TOPR. By the diagram, l-TOPR ⊆ XTOP.

Reference

ARNOLD, DAUCHET: Morphismes et bimorphismes d’arbres.

Theoret. Comput. Sci. 20, 1982

Weighted Tree Transducers in NLP Andreas Maletti · 30

SLIDE 63

Composition (Cont’d)

Theorem

Every ln-TOP ⊆ L ⊆ l-XTOPR that contains rotations or flattenings is not closed under composition.

Proof.

Prove ln-TOP ; {τflat} ⊆ l-XTOPR using, e.g., σ γk σ s t u ⇒∗ σ σ s t u ⇒∗ δ s t u

Weighted Tree Transducers in NLP Andreas Maletti · 31

SLIDE 64

Composition (Cont’d)

Theorem

XTOPR is not closed under composition.

Proof.

Follow classical proof for TOPR.

Conclusion or Bad news

No (mentioned) class of xtt computes a closed class of transformation.

Weighted Tree Transducers in NLP Andreas Maletti · 32

SLIDE 65

Composition (Cont’d)

Theorem

XTOPR is not closed under composition.

Proof.

Follow classical proof for TOPR.

Conclusion or Bad news

No (mentioned) class of xtt computes a closed class of transformation.

Weighted Tree Transducers in NLP Andreas Maletti · 32

SLIDE 66

Composition (Cont’d)

Problem

Compositions are extremely important (e.g., for a framework)!

Questions

1

Identify suitable subclasses that are closed under composition (expressive vs. closure)

2

Determine whether two given l-xtt can be composed

3

What is the composition closure of l-XTOP

4

Identify superclasses that are closed under composition and still preserve recognizability (preservation vs. closure)

Reference

∼, GRAEHL, HOPKINS, KNIGHT: The power of extended top-down tree transducers. SIAM J. Comput. 39, 2009

Weighted Tree Transducers in NLP Andreas Maletti · 33

SLIDE 67

Composition (Cont’d)

Problem

Compositions are extremely important (e.g., for a framework)!

Questions

1

Identify suitable subclasses that are closed under composition (expressive vs. closure)

2

Determine whether two given l-xtt can be composed

3

What is the composition closure of l-XTOP

4

Identify superclasses that are closed under composition and still preserve recognizability (preservation vs. closure)

Reference

∼, GRAEHL, HOPKINS, KNIGHT: The power of extended top-down tree transducers. SIAM J. Comput. 39, 2009

Weighted Tree Transducers in NLP Andreas Maletti · 33

SLIDE 68

Binarization

Definition

A xtt is binarized if there are at most 3 states per rule.

Example

q σ σ x1 x2 σ x3 x4 → σ σ q x2 q x4 σ q x1 q x3

Conclusions

linear xtt are not binarizable [AHO, ULLMAN 1972] What about non-linear xtt?

Weighted Tree Transducers in NLP Andreas Maletti · 34

SLIDE 69

Binarization

Definition

A xtt is binarized if there are at most 3 states per rule.

Example

q σ σ x1 x2 σ x3 x4 → σ σ q x2 q x4 σ q x1 q x3

Conclusions

linear xtt are not binarizable [AHO, ULLMAN 1972] What about non-linear xtt?

Weighted Tree Transducers in NLP Andreas Maletti · 34

SLIDE 70

Binarization (Cont’d)

Example

q σ σ x1 x2 σ x3 x4 → σ σ q x2 q x4 σ q x1 q x3

Binarization

q x1 → σ 1 x1 2 x1 1 σ σ x1 x2 σ x3 x4 → σ q x2 q x4 2 σ σ x1 x2 σ x3 x4 → σ q x1 q x3

⇒ xtt can be binarized using non-linearity

Weighted Tree Transducers in NLP Andreas Maletti · 35

SLIDE 71

Input Product

Definition

Given τ : TΣ × T∆ → A and ϕ: TΣ → A, let ϕ ⊳ τ : TΣ × T∆ → A (ϕ ⊳ τ)(t, u) = ϕ(t) · τ(t, u)

Theorem

ϕ ⊳ τ ∈ n-XTOP for every ϕ ∈ Rec and τ ∈ n-XTOP

Weighted Tree Transducers in NLP Andreas Maletti · 36

SLIDE 72

Input Product (Cont’d)

Parsing complexity

ln-xtt M and input word w: O(|M| · |w|2 rk(M)+5)

References

∼, SATTA: Parsing and translation algorithms based on weighted extended tree transducers. ATANLP 2010 ∼: Why synchronous tree substitution grammars? HLT-NAACL 2010

Weighted Tree Transducers in NLP Andreas Maletti · 37

SLIDE 73

Input Product (Cont’d)

Deleting xtt

How to obtain input products for deleting xtt?

Partial solutions

for idempotent semirings for rings but they do not work for the (non-linear) xtt obtained from binarization

References

∼: Input products for weighted extended top-down tree

transducers. DLT 2010

Weighted Tree Transducers in NLP Andreas Maletti · 38

SLIDE 74

Input Product (Cont’d)

Deleting xtt

How to obtain input products for deleting xtt?

Partial solutions

for idempotent semirings for rings but they do not work for the (non-linear) xtt obtained from binarization

References

∼: Input products for weighted extended top-down tree

transducers. DLT 2010

Weighted Tree Transducers in NLP Andreas Maletti · 38

SLIDE 75

Preservation of Recognizability

Definition

Given τ : TΣ × T∆ → A and ϕ: TΣ → A, let τ(ϕ), range(τ): T∆ → A

τ(ϕ)
(u) =
t∈TΣ

ϕ(t) · τ(t, u)

range(τ)
(u) =
t∈TΣ

τ(t, u)

References

F ¨

UL ¨ OP, ∼, VOGLER: Backward and forward application of

extended tree series transformations. WATA 2010 MAY, KNIGHT, VOGLER: Efficient inference through cascades of weighted tree transducers. ACL 2010

Weighted Tree Transducers in NLP Andreas Maletti · 39

SLIDE 76

Preservation of Recognizability (Cont’d)

Theorem

Given τ : TΣ × T∆ → A and ϕ: TΣ → A τ(ϕ) = range(ϕ ⊳ τ)

Proof.

τ(ϕ)
(u) =
t∈TΣ

ϕ(t) · τ(t, u) =

t∈TΣ

(ϕ ⊳ τ)(t, u) =

range(ϕ ⊳ τ)
(u)

Weighted Tree Transducers in NLP Andreas Maletti · 40

SLIDE 77

Tiburon

Features

Implements xtt (and tree automata; everything also weighted) Framework with command-line interface Optimized for machine translation

Algorithms

Application of xtt to input tree/language Backward application of xtt to output language Composition (for some xtt) . . .

Reference

MAY, KNIGHT: Tiburon: A Weighted Tree Automata Toolkit. CIAA 2006

Weighted Tree Transducers in NLP Andreas Maletti · 42

SLIDE 79

Tiburon (Cont’d)

Generated STSG rules

NP DT the N boy → NP N atefl NP DT the N door → NP N albab V saw → V ra’aa S NP VP V NP → S V NP NP S → S CONJ wa- S

Example

q qNP.NP(DT(the) N(boy))

> NP(N(atefl))

qNP.NP(DT(the) N(door)) -> NP(N(albab)) qV.V(saw)

> V(ra’aa)

qS.S(x0: VP(x1: x2:))

> S(qV.x1 qNP.x0 qNP.x2)

q.x0:

> S(CONJ(wa-) qS.x0)

Weighted Tree Transducers in NLP Andreas Maletti · 43

SLIDE 80

Summary

Criteria

(a) Generalize FST; in particular, epsilon-transitions (b) Efficient training (c) Handles rotation (d) Closed under composition (e) Preserves recognizability

Models

Model \ Criterion (a) (b) (c) (d) (e) Top-down tree transducer – x – x x Synchronous context-free grammar x x – x x Synchronous tree substitution grammar x x x – x Synchronous tree adjoining grammar x x x – – Multi bottom-up tree transducer x ? x x –

Weighted Tree Transducers in NLP Andreas Maletti · 44

SLIDE 81

References

ARNOLD, DAUCHET: Bi-transductions de forˆ

ets. ICALP 1976

BAKER: Composition of top-down and bottom-up tree transducers.

Inform. Control 41. 1979

ENGELFRIET: Bottom-up and top-down tree transformations—a

comparison. Math. Syst. Theory 9. 1975

ENGELFRIET: Top-down tree transducers with regular look-ahead.

Math. Syst. Theory 10. 1976

MAY, KNIGHT: Tiburon: A Weighted Tree Automata Toolkit. CIAA 2006 ∼, GRAEHL, HOPKINS, KNIGHT: The power of extended top-down tree transducers. SIAM J. Comput. 2009

Thank You for your attention!

Weighted Tree Transducers in NLP Andreas Maletti · 45