[PPT] - Hyper-minimization for deterministic weighted tree automata Andreas PowerPoint Presentation

SLIDE 1

Hyper-minimization for deterministic weighted tree automata

Andreas Maletti and Daniel Quernheim

Institute of Computer Science, Universität Leipzig, Germany maletti@informatik.uni-leipzig.de

May 29, 2014

SLIDE 2

Overview

Weighted Tree Language

◮ Assigns weight (e.g. a probability) to each tree ◮ Weight drawn from commutative semiring; e.g. (Q, +, ·, 0, 1)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 2

SLIDE 3

Overview

Weighted Tree Language

◮ Assigns weight (e.g. a probability) to each tree ◮ Weight drawn from commutative semiring; e.g. (Q, +, ·, 0, 1)

Weighted Tree Automaton

◮ Finitely represents weighted tree language ◮ Defines the recognizable weighted tree languages

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 2

SLIDE 4

Overview

Weighted Tree Language

◮ Assigns weight (e.g. a probability) to each tree ◮ Weight drawn from commutative semiring; e.g. (Q, +, ·, 0, 1)

Weighted Tree Automaton

◮ Finitely represents weighted tree language ◮ Defines the recognizable weighted tree languages

Application

◮ Re-ranker for parse trees ◮ Representation of parses

large models

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 2

SLIDE 5

Basics

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 3

SLIDE 6

Semiring

Definition

A commutative semiring is an algebraic structure A = (A, +, ·, 0, 1)

◮ (A, +, 0) commutative monoid ◮ (A, ·, 1) commutative monoid ◮ · distributes over +

a · (a1 + a2) = (a · a1) + (a · a2)

◮ 0 · a = 0 for all a ∈ A

Examples: (N, +, ·, 0, 1) and (Q, +, ·, 0, 1)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 4

SLIDE 7

Semiring

Definition

A commutative semiring is an algebraic structure A = (A, +, ·, 0, 1)

◮ (A, +, 0) commutative monoid ◮ (A, ·, 1) commutative monoid ◮ · distributes over +

a · (a1 + a2) = (a · a1) + (a · a2)

◮ 0 · a = 0 for all a ∈ A

Examples: (N, +, ·, 0, 1) and (Q, +, ·, 0, 1)

Definition

A commutative semifield is a commutative semiring A = (A, +, ·, 0, 1)

◮ for all a ∈ A \ {0} there exists a−1 ∈ A with a · a−1 = 1

Example: (Q, +, ·, 0, 1)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 4

SLIDE 8

Syntax

Definition

Weighted tree automaton (WTA) is tuple (Q, Σ, A, F, µ) where

◮ finite set Q

states

◮ ranked alphabet Σ

input symbols

◮ commutative semiring A = (A, +, ·, 0, 1)

weight structure

◮ F ⊆ Q

final states

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 5

SLIDE 9

Syntax

Definition

Weighted tree automaton (WTA) is tuple (Q, Σ, A, F, µ) where

◮ finite set Q

states

◮ ranked alphabet Σ

input symbols

◮ commutative semiring A = (A, +, ·, 0, 1)

weight structure

◮ F ⊆ Q

final states

◮ µ = (µk)k∈N with µk : Σk → AQ×Qk

weighted transitions

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 5

SLIDE 10

Syntax

Definition

Weighted tree automaton (WTA) is tuple (Q, Σ, A, F, µ) where

◮ finite set Q

states

◮ ranked alphabet Σ

input symbols

◮ commutative semiring A = (A, +, ·, 0, 1)

weight structure

◮ F ⊆ Q

final states

◮ µ = (µk)k∈N with µk : Σk → AQ×Qk

weighted transitions

Sample Transition

q σ q1 . . . qk with weight µk(σ)q,q1···qk

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 5

SLIDE 11

Syntax — Illustration

Sample Automaton

3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 6

SLIDE 12

Semantics

Definition

Let t ∈ TΣ(Q) and W = pos(t).

◮ Run on t: map r: W → Q with r(w) = t(w) if t(w) ∈ Q

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 7

SLIDE 13

Semantics

Definition

Let t ∈ TΣ(Q) and W = pos(t).

◮ Run on t: map r: W → Q with r(w) = t(w) if t(w) ∈ Q ◮ Weight of r

wt(r) =

w∈W

t(w)∈Σ

µk(t(w))r(w),r(w1)···r(wk)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 7

SLIDE 14

Semantics

Definition

Let t ∈ TΣ(Q) and W = pos(t).

◮ Run on t: map r: W → Q with r(w) = t(w) if t(w) ∈ Q ◮ Weight of r

wt(r) =

w∈W

t(w)∈Σ

µk(t(w))r(w),r(w1)···r(wk)

◮ Recognized weighted tree language

M(t) =

r run on t

r(ε)∈F

wt(r)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 7

SLIDE 15

Semantics — Illustration

Sample Automaton

3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1

Sample Runs

Input tree: f a b Runs: 6 1 2 with weight 0

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 8

SLIDE 16

Semantics — Illustration

Sample Automaton

3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1

Sample Runs

Input tree: f a b Runs: f a b with weight

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 8

SLIDE 17

Semantics — Illustration

Sample Automaton

3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1

Sample Runs

Input tree: f a b Runs: f 1 b with weight 1

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 8

SLIDE 18

Semantics — Illustration

Sample Automaton

3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1

Sample Runs

Input tree: f a b Runs: f 1 2 with weight 1

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 8

SLIDE 19

Semantics — Illustration

Sample Automaton

3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1

Sample Runs

Input tree: f a b Runs: 3 1 2 with weight 0.3

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 8

SLIDE 20

Determinism

Definition

Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly

ne q ∈ Q such that µk(σ)q,w = 0
A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 9

SLIDE 21

Determinism

Definition

Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly

ne q ∈ Q such that µk(σ)q,w = 0

Notes

◮ Deterministic WTA does not use addition

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 9

SLIDE 22

Determinism

Definition

Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly

ne q ∈ Q such that µk(σ)q,w = 0

Notes

◮ Deterministic WTA does not use addition ◮ Recognizable = deterministically recognizable

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 9

SLIDE 23

Determinism

Definition

Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly

ne q ∈ Q such that µk(σ)q,w = 0

Notes

◮ Deterministic WTA does not use addition ◮ Recognizable = deterministically recognizable ◮ Determinization possible in locally-finite semirings

[Borchardt, Vogler 2003]

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 9

SLIDE 24

Determinism

Definition

Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly

ne q ∈ Q such that µk(σ)q,w = 0

Notes

◮ Deterministic WTA does not use addition ◮ Recognizable = deterministically recognizable ◮ Determinization possible in locally-finite semirings

[Borchardt, Vogler 2003]

◮ Partial determinization for probabilities

[May, Knight 2006]

◮ Systematic presentation

[Büchse, Vogler 2009]

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 9

SLIDE 25

Hyper-minimization

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 10

SLIDE 26

Assumption

We assume a commutative semifield A = (A, +, ·, 0, 1)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 11

SLIDE 27

Minimization

equivalent = same recognized weighted tree language

Problem

Given deterministic WTA, return

◮ equivalent deterministic WTA such that ◮ no equivalent deterministic WTA is smaller

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 12

SLIDE 28

Minimization

equivalent = same recognized weighted tree language

Problem

Given deterministic WTA, return

◮ equivalent deterministic WTA ◮ minimal

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 12

SLIDE 29

Minimization

equivalent = same recognized weighted tree language

Problem

Given deterministic WTA, return

◮ equivalent deterministic WTA ◮ minimal

Theorem (M., Q. 2011)

Minimization of deterministic WTA can be done in time O(m log n)

◮ m = size of automaton ◮ n = number of states

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 12

SLIDE 30

Minimization

context = tree with exactly one occurrence of special symbol c[t] = tree obtained from context c by replacing by t

Definition

States p and q are equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for all contexts c ∈ CΣ

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 13

SLIDE 31

Minimization

context = tree with exactly one occurrence of special symbol c[t] = tree obtained from context c by replacing by t

Definition

States p and q are equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for all contexts c ∈ CΣ

Theorem (Borchardt 2003)

A trim deterministic WTA is minimal ⇐ ⇒ no pair of different, but equivalent states

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 13

SLIDE 32

Hyper-minimization

Definition

Languages L and L′ almost equal if L and L′ have finite difference (L \ L′) ∪ (L′ \ L)

Problem [Badr et al. 2009]

Given DFA, return

◮ DFA recognizing almost equal language such that ◮ no smaller DFA recogizes an almost equal language

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 14

SLIDE 33

Hyper-minimization

Definition

Languages L and L′ almost equal if L and L′ have finite difference (L \ L′) ∪ (L′ \ L)

Problem [Badr et al. 2009]

Given DFA, return

◮ DFA recognizing almost equal language ◮ hyper-minimal

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 14

SLIDE 34

Hyper-minimization

Definition

Languages L and L′ almost equal if L and L′ have finite difference (L \ L′) ∪ (L′ \ L)

Problem [Badr et al. 2009]

Given DFA, return

◮ DFA recognizing almost equal language ◮ hyper-minimal

Theorem (Holzer, M. 2009, Gawrychowsky, Je˙ z 2009)

DFA hyper-minimization can be done in time O(n log n)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 14

SLIDE 35

Weighted hyper-minimization

supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A

Three variants

Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if

◮ supp(τ1) and supp(τ2) are almost equal

reduces to the unweighted case

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 15

SLIDE 36

Weighted hyper-minimization

supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A

Three variants

Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if

◮ supp(τ1) and supp(τ2) are almost equal

reduces to the unweighted case

◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 15

SLIDE 37

Weighted hyper-minimization

supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A

Three variants

Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if

◮ supp(τ1) and supp(τ2) are almost equal

reduces to the unweighted case

◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite ◮ t∈TΣ d(τ1(t), τ2(t)) ≤ n for some distance d and n ∈ N

difficult

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 15

SLIDE 38

Weighted hyper-minimization

supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A

Three variants

Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if

◮ supp(τ1) and supp(τ2) are almost equal

reduces to the unweighted case

◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite ◮ t∈TΣ d(τ1(t), τ2(t)) ≤ n for some distance d and n ∈ N

difficult

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 15

SLIDE 39

Weighted hyper-minimization

supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A

Three variants

Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if

◮ supp(τ1) and supp(τ2) are almost equal

reduces to the unweighted case

◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite

← discussed here

◮ t∈TΣ d(τ1(t), τ2(t)) ≤ n for some distance d and n ∈ N

difficult

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 15

SLIDE 40

Weighted hyper-minimization

Definition

Two weighted tree languages τ1 and τ2 are almost equal if τ1(t) = τ2(t) for almost all t ∈ TΣ

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 16

SLIDE 41

Weighted hyper-minimization

Definition

States p and q are almost equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for almost all contexts c ∈ CΣ

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 17

SLIDE 42

Weighted hyper-minimization

Definition

States p and q are almost equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for almost all contexts c ∈ CΣ

Definition

◮ q-run = non-zero weighted run with root label q ◮ preamble state q = finitely many q-runs

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 17

SLIDE 43

Weighted hyper-minimization

Definition

States p and q are almost equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for almost all contexts c ∈ CΣ

Definition

◮ q-run = non-zero weighted run with root label q ◮ preamble state q = finitely many q-runs

Theorem

A minimal deterministic WTA is hyper-minimal ⇐ ⇒ no pair of different, but almost equivalent states

f which one is a preamble state
A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 17

SLIDE 44

Algorithm

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 18

SLIDE 45

Overview

Hyper-minimization algorithm

1. Minimize

O(m log n)

2. Compute preamble states

O(m)

3. Compute co-preamble states

O(m)

4. Identify almost equivalent states

O(m log n)

5. Merge preamble states that are almost equivalent to another state

O(m)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 19

SLIDE 46

Overview

Hyper-minimization algorithm

1. Minimize

O(m log n)

2. Compute preamble states

O(m)

3. Compute co-preamble states

O(m)

4. Identify almost equivalent states

O(m log n)

5. Merge preamble states that are almost equivalent to another state

O(m)

Definition

Co-preamble state q = finitely many c ∈ CΣ such that M(c[q]) = 0

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 19

SLIDE 47

Overview

Hyper-minimization algorithm

1. Minimize

O(m log n)

2. Compute preamble states

O(m)

3. Compute co-preamble states

O(m)

4. Identify almost equivalent states

O(m log n)

5. Merge preamble states that are almost equivalent to another state

O(m)

Definition

Co-preamble state q = finitely many c ∈ CΣ such that M(c[q]) = 0

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 19

SLIDE 48

Identification of almost equivalent states

Definition

Transition context c is of the shape σ(t1, . . . , tk) with

◮ t1, . . . , tk ∈ Q ∪ {} ◮ exactly one occurs

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 20

SLIDE 49

Identification of almost equivalent states

Definition

Transition context c is of the shape σ(t1, . . . , tk) with

◮ t1, . . . , tk ∈ Q ∪ {} ◮ exactly one occurs

Assumptions

◮ total order on transition contexts

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 20

SLIDE 50

Identification of almost equivalent states

Definition

Transition context c is of the shape σ(t1, . . . , tk) with

◮ t1, . . . , tk ∈ Q ∪ {} ◮ exactly one occurs

Assumptions

◮ total order on transition contexts ◮ cq smallest transition context such that cq[q] evaluates to a

co-kernel (i.e., not a co-preamble) state for each q ∈ Q

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 20

SLIDE 51

Signature

Definition

Signature of q:

c, q′, a′ | · · ·
◮ c = transition context

◮ q′ = evaluation of c[q] ◮ a′ = transition weight of c[q]

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 21

SLIDE 52

Signature

Definition

Signature of q:

c, q′, a′ | · · ·
◮ c = transition context

◮ q′ = evaluation of c[q] ◮ a′ = transition weight of c[q]

Definition

Standardized signature of q:

c, q′, a′ | q′ co-kernel state, · · ·
◮ c = transition context

◮ q′ = evaluation of c[q] ◮ a′ = transition weight of c[q] “divided by” transition weight of cq[q]

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 21

SLIDE 53

Signature

Lemma

If two states have the same signature, then they are almost equivalent

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 22

SLIDE 54

Signature

Lemma

If two states have the same signature, then they are almost equivalent

Lemma

If two different states are almost equivalent, then there exist two different states that have the same signature

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 22

SLIDE 55

Finding almost equivalent states

Approach

1. Find two different states of equal signature
2. Merge them

using a scaling factor

3. Go to 1.

This will merge more states than desired, but identifies almost equivalent states

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 23

SLIDE 56

Overview

Hyper-minimization algorithm

1. Minimize

O(m log n)

2. Compute preamble states

O(m)

3. Compute co-preamble states

O(m)

4. Identify almost equivalent states

O(m log n)

5. Merge preamble states that are almost equivalent to another state

O(m)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 24

SLIDE 57

Hyper-minimization algorithm

Theorem

We can hyper-minimize deterministic WTA in time O(m log n)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 25

SLIDE 58

Summary

Solved

◮ hyper-minimization for deterministic WTA over semifields ◮ almost equality = finitely many trees with different weight

Open

◮ Error optimization ◮ Stronger “almost equality” ◮ Avoiding requirements

(semifield; commutativity; determinism; etc.)

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 26

SLIDE 59

Thank you! References

1. Badr, Geffert, Shipman: Hyper-minimizing minimized deterministic finite state automata. ITA 43, 2009 2. Borchardt: The Myhill-Nerode theorem for recognizable tree series. Proc. DLT 2003 3. Borchardt, Vogler: Determinization of finite state weighted tree automata. JALC 8, 2003 4. Büchse, May, Vogler: Determinization of weighted tree automata using factorizations. JALC 15, 2010 5. Gawrychowski, Je˙ z: Hyper-minimisation made efficient. Proc. MFCS 2009 6. Holzer, Maletti: An n log n algorithm for hyper-minimizing states in a (minimized) deterministic automaton.

Proc. CIAA 2009

7. Maletti, Quernheim: Pushing for weighted tree automata. Proc. MFCS 2011 8. May, Knight: A better n-best list: practical determinization of weighted finite tree automata. Proc. HLT-NAACL 2006

A. Maletti and D. Quernheim

Hyper-minimization for deterministic WTA 27