Hyper-minimization for deterministic weighted tree automata Andreas - - PowerPoint PPT Presentation
Hyper-minimization for deterministic weighted tree automata Andreas - - PowerPoint PPT Presentation
Hyper-minimization for deterministic weighted tree automata Andreas Maletti and Daniel Quernheim Institute of Computer Science, Universitt Leipzig, Germany maletti@informatik.uni-leipzig.de May 29, 2014 Overview Weighted Tree Language
Overview
Weighted Tree Language
◮ Assigns weight (e.g. a probability) to each tree ◮ Weight drawn from commutative semiring; e.g. (Q, +, ·, 0, 1)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 2
Overview
Weighted Tree Language
◮ Assigns weight (e.g. a probability) to each tree ◮ Weight drawn from commutative semiring; e.g. (Q, +, ·, 0, 1)
Weighted Tree Automaton
◮ Finitely represents weighted tree language ◮ Defines the recognizable weighted tree languages
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 2
Overview
Weighted Tree Language
◮ Assigns weight (e.g. a probability) to each tree ◮ Weight drawn from commutative semiring; e.g. (Q, +, ·, 0, 1)
Weighted Tree Automaton
◮ Finitely represents weighted tree language ◮ Defines the recognizable weighted tree languages
Application
◮ Re-ranker for parse trees ◮ Representation of parses
large models
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 2
Basics
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 3
Semiring
Definition
A commutative semiring is an algebraic structure A = (A, +, ·, 0, 1)
◮ (A, +, 0) commutative monoid ◮ (A, ·, 1) commutative monoid ◮ · distributes over +
a · (a1 + a2) = (a · a1) + (a · a2)
◮ 0 · a = 0 for all a ∈ A
Examples: (N, +, ·, 0, 1) and (Q, +, ·, 0, 1)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 4
Semiring
Definition
A commutative semiring is an algebraic structure A = (A, +, ·, 0, 1)
◮ (A, +, 0) commutative monoid ◮ (A, ·, 1) commutative monoid ◮ · distributes over +
a · (a1 + a2) = (a · a1) + (a · a2)
◮ 0 · a = 0 for all a ∈ A
Examples: (N, +, ·, 0, 1) and (Q, +, ·, 0, 1)
Definition
A commutative semifield is a commutative semiring A = (A, +, ·, 0, 1)
◮ for all a ∈ A \ {0} there exists a−1 ∈ A with a · a−1 = 1
Example: (Q, +, ·, 0, 1)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 4
Syntax
Definition
Weighted tree automaton (WTA) is tuple (Q, Σ, A, F, µ) where
◮ finite set Q
states
◮ ranked alphabet Σ
input symbols
◮ commutative semiring A = (A, +, ·, 0, 1)
weight structure
◮ F ⊆ Q
final states
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 5
Syntax
Definition
Weighted tree automaton (WTA) is tuple (Q, Σ, A, F, µ) where
◮ finite set Q
states
◮ ranked alphabet Σ
input symbols
◮ commutative semiring A = (A, +, ·, 0, 1)
weight structure
◮ F ⊆ Q
final states
◮ µ = (µk)k∈N with µk : Σk → AQ×Qk
weighted transitions
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 5
Syntax
Definition
Weighted tree automaton (WTA) is tuple (Q, Σ, A, F, µ) where
◮ finite set Q
states
◮ ranked alphabet Σ
input symbols
◮ commutative semiring A = (A, +, ·, 0, 1)
weight structure
◮ F ⊆ Q
final states
◮ µ = (µk)k∈N with µk : Σk → AQ×Qk
weighted transitions
Sample Transition
q σ q1 . . . qk with weight µk(σ)q,q1···qk
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 5
Syntax — Illustration
Sample Automaton
3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 6
Semantics
Definition
Let t ∈ TΣ(Q) and W = pos(t).
◮ Run on t: map r: W → Q with r(w) = t(w) if t(w) ∈ Q
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 7
Semantics
Definition
Let t ∈ TΣ(Q) and W = pos(t).
◮ Run on t: map r: W → Q with r(w) = t(w) if t(w) ∈ Q ◮ Weight of r
wt(r) =
- w∈W
t(w)∈Σ
µk(t(w))r(w),r(w1)···r(wk)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 7
Semantics
Definition
Let t ∈ TΣ(Q) and W = pos(t).
◮ Run on t: map r: W → Q with r(w) = t(w) if t(w) ∈ Q ◮ Weight of r
wt(r) =
- w∈W
t(w)∈Σ
µk(t(w))r(w),r(w1)···r(wk)
◮ Recognized weighted tree language
M(t) =
- r run on t
r(ε)∈F
wt(r)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 7
Semantics — Illustration
Sample Automaton
3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1
Sample Runs
Input tree: f a b Runs: 6 1 2 with weight 0
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 8
Semantics — Illustration
Sample Automaton
3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1
Sample Runs
Input tree: f a b Runs: f a b with weight
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 8
Semantics — Illustration
Sample Automaton
3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1
Sample Runs
Input tree: f a b Runs: f 1 b with weight 1
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 8
Semantics — Illustration
Sample Automaton
3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1
Sample Runs
Input tree: f a b Runs: f 1 2 with weight 1
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 8
Semantics — Illustration
Sample Automaton
3 6 1 2 4 5 f/0.5 f/0.3 a/1 b/1 a/1 a/1
Sample Runs
Input tree: f a b Runs: 3 1 2 with weight 0.3
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 8
Determinism
Definition
Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly
- ne q ∈ Q such that µk(σ)q,w = 0
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 9
Determinism
Definition
Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly
- ne q ∈ Q such that µk(σ)q,w = 0
Notes
◮ Deterministic WTA does not use addition
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 9
Determinism
Definition
Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly
- ne q ∈ Q such that µk(σ)q,w = 0
Notes
◮ Deterministic WTA does not use addition ◮ Recognizable = deterministically recognizable
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 9
Determinism
Definition
Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly
- ne q ∈ Q such that µk(σ)q,w = 0
Notes
◮ Deterministic WTA does not use addition ◮ Recognizable = deterministically recognizable ◮ Determinization possible in locally-finite semirings
[Borchardt, Vogler 2003]
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 9
Determinism
Definition
Deterministic WTA: for every σ ∈ Σk and w ∈ Qk there exists exactly
- ne q ∈ Q such that µk(σ)q,w = 0
Notes
◮ Deterministic WTA does not use addition ◮ Recognizable = deterministically recognizable ◮ Determinization possible in locally-finite semirings
[Borchardt, Vogler 2003]
◮ Partial determinization for probabilities
[May, Knight 2006]
◮ Systematic presentation
[Büchse, Vogler 2009]
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 9
Hyper-minimization
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 10
Assumption
We assume a commutative semifield A = (A, +, ·, 0, 1)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 11
Minimization
equivalent = same recognized weighted tree language
Problem
Given deterministic WTA, return
◮ equivalent deterministic WTA such that ◮ no equivalent deterministic WTA is smaller
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 12
Minimization
equivalent = same recognized weighted tree language
Problem
Given deterministic WTA, return
◮ equivalent deterministic WTA ◮ minimal
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 12
Minimization
equivalent = same recognized weighted tree language
Problem
Given deterministic WTA, return
◮ equivalent deterministic WTA ◮ minimal
Theorem (M., Q. 2011)
Minimization of deterministic WTA can be done in time O(m log n)
◮ m = size of automaton ◮ n = number of states
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 12
Minimization
context = tree with exactly one occurrence of special symbol c[t] = tree obtained from context c by replacing by t
Definition
States p and q are equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for all contexts c ∈ CΣ
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 13
Minimization
context = tree with exactly one occurrence of special symbol c[t] = tree obtained from context c by replacing by t
Definition
States p and q are equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for all contexts c ∈ CΣ
Theorem (Borchardt 2003)
A trim deterministic WTA is minimal ⇐ ⇒ no pair of different, but equivalent states
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 13
Hyper-minimization
Definition
Languages L and L′ almost equal if L and L′ have finite difference (L \ L′) ∪ (L′ \ L)
Problem [Badr et al. 2009]
Given DFA, return
◮ DFA recognizing almost equal language such that ◮ no smaller DFA recogizes an almost equal language
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 14
Hyper-minimization
Definition
Languages L and L′ almost equal if L and L′ have finite difference (L \ L′) ∪ (L′ \ L)
Problem [Badr et al. 2009]
Given DFA, return
◮ DFA recognizing almost equal language ◮ hyper-minimal
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 14
Hyper-minimization
Definition
Languages L and L′ almost equal if L and L′ have finite difference (L \ L′) ∪ (L′ \ L)
Problem [Badr et al. 2009]
Given DFA, return
◮ DFA recognizing almost equal language ◮ hyper-minimal
Theorem (Holzer, M. 2009, Gawrychowsky, Je˙ z 2009)
DFA hyper-minimization can be done in time O(n log n)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 14
Weighted hyper-minimization
supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A
Three variants
Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if
◮ supp(τ1) and supp(τ2) are almost equal
reduces to the unweighted case
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 15
Weighted hyper-minimization
supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A
Three variants
Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if
◮ supp(τ1) and supp(τ2) are almost equal
reduces to the unweighted case
◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 15
Weighted hyper-minimization
supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A
Three variants
Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if
◮ supp(τ1) and supp(τ2) are almost equal
reduces to the unweighted case
◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite ◮ t∈TΣ d(τ1(t), τ2(t)) ≤ n for some distance d and n ∈ N
difficult
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 15
Weighted hyper-minimization
supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A
Three variants
Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if
◮ supp(τ1) and supp(τ2) are almost equal
reduces to the unweighted case
◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite ◮ t∈TΣ d(τ1(t), τ2(t)) ≤ n for some distance d and n ∈ N
difficult
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 15
Weighted hyper-minimization
supp(τ) = {t ∈ TΣ | τ(t) = 0} for τ : TΣ → A
Three variants
Two weighted tree languages τ1, τ2 : TΣ → A are almost equal if
◮ supp(τ1) and supp(τ2) are almost equal
reduces to the unweighted case
◮ {t ∈ TΣ | τ1(t) = τ2(t)} finite
← discussed here
◮ t∈TΣ d(τ1(t), τ2(t)) ≤ n for some distance d and n ∈ N
difficult
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 15
Weighted hyper-minimization
Definition
Two weighted tree languages τ1 and τ2 are almost equal if τ1(t) = τ2(t) for almost all t ∈ TΣ
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 16
Weighted hyper-minimization
Definition
States p and q are almost equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for almost all contexts c ∈ CΣ
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 17
Weighted hyper-minimization
Definition
States p and q are almost equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for almost all contexts c ∈ CΣ
Definition
◮ q-run = non-zero weighted run with root label q ◮ preamble state q = finitely many q-runs
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 17
Weighted hyper-minimization
Definition
States p and q are almost equivalent if there exists a ∈ A \ {0} such that M(c[p]) = a · M(c[q]) for almost all contexts c ∈ CΣ
Definition
◮ q-run = non-zero weighted run with root label q ◮ preamble state q = finitely many q-runs
Theorem
A minimal deterministic WTA is hyper-minimal ⇐ ⇒ no pair of different, but almost equivalent states
- f which one is a preamble state
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 17
Algorithm
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 18
Overview
Hyper-minimization algorithm
- 1. Minimize
O(m log n)
- 2. Compute preamble states
O(m)
- 3. Compute co-preamble states
O(m)
- 4. Identify almost equivalent states
O(m log n)
- 5. Merge preamble states that are almost equivalent to another state
O(m)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 19
Overview
Hyper-minimization algorithm
- 1. Minimize
O(m log n)
- 2. Compute preamble states
O(m)
- 3. Compute co-preamble states
O(m)
- 4. Identify almost equivalent states
O(m log n)
- 5. Merge preamble states that are almost equivalent to another state
O(m)
Definition
Co-preamble state q = finitely many c ∈ CΣ such that M(c[q]) = 0
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 19
Overview
Hyper-minimization algorithm
- 1. Minimize
O(m log n)
- 2. Compute preamble states
O(m)
- 3. Compute co-preamble states
O(m)
- 4. Identify almost equivalent states
O(m log n)
- 5. Merge preamble states that are almost equivalent to another state
O(m)
Definition
Co-preamble state q = finitely many c ∈ CΣ such that M(c[q]) = 0
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 19
Identification of almost equivalent states
Definition
Transition context c is of the shape σ(t1, . . . , tk) with
◮ t1, . . . , tk ∈ Q ∪ {} ◮ exactly one occurs
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 20
Identification of almost equivalent states
Definition
Transition context c is of the shape σ(t1, . . . , tk) with
◮ t1, . . . , tk ∈ Q ∪ {} ◮ exactly one occurs
Assumptions
◮ total order on transition contexts
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 20
Identification of almost equivalent states
Definition
Transition context c is of the shape σ(t1, . . . , tk) with
◮ t1, . . . , tk ∈ Q ∪ {} ◮ exactly one occurs
Assumptions
◮ total order on transition contexts ◮ cq smallest transition context such that cq[q] evaluates to a
co-kernel (i.e., not a co-preamble) state for each q ∈ Q
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 20
Signature
Definition
Signature of q:
- c, q′, a′ | · · ·
- ◮ c = transition context
◮ q′ = evaluation of c[q] ◮ a′ = transition weight of c[q]
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 21
Signature
Definition
Signature of q:
- c, q′, a′ | · · ·
- ◮ c = transition context
◮ q′ = evaluation of c[q] ◮ a′ = transition weight of c[q]
Definition
Standardized signature of q:
- c, q′, a′ | q′ co-kernel state, · · ·
- ◮ c = transition context
◮ q′ = evaluation of c[q] ◮ a′ = transition weight of c[q] “divided by” transition weight of cq[q]
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 21
Signature
Lemma
If two states have the same signature, then they are almost equivalent
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 22
Signature
Lemma
If two states have the same signature, then they are almost equivalent
Lemma
If two different states are almost equivalent, then there exist two different states that have the same signature
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 22
Finding almost equivalent states
Approach
- 1. Find two different states of equal signature
- 2. Merge them
using a scaling factor
- 3. Go to 1.
This will merge more states than desired, but identifies almost equivalent states
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 23
Overview
Hyper-minimization algorithm
- 1. Minimize
O(m log n)
- 2. Compute preamble states
O(m)
- 3. Compute co-preamble states
O(m)
- 4. Identify almost equivalent states
O(m log n)
- 5. Merge preamble states that are almost equivalent to another state
O(m)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 24
Hyper-minimization algorithm
Theorem
We can hyper-minimize deterministic WTA in time O(m log n)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 25
Summary
Solved
◮ hyper-minimization for deterministic WTA over semifields ◮ almost equality = finitely many trees with different weight
Open
◮ Error optimization ◮ Stronger “almost equality” ◮ Avoiding requirements
(semifield; commutativity; determinism; etc.)
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 26
Thank you! References
1. Badr, Geffert, Shipman: Hyper-minimizing minimized deterministic finite state automata. ITA 43, 2009 2. Borchardt: The Myhill-Nerode theorem for recognizable tree series. Proc. DLT 2003 3. Borchardt, Vogler: Determinization of finite state weighted tree automata. JALC 8, 2003 4. Büchse, May, Vogler: Determinization of weighted tree automata using factorizations. JALC 15, 2010 5. Gawrychowski, Je˙ z: Hyper-minimisation made efficient. Proc. MFCS 2009 6. Holzer, Maletti: An n log n algorithm for hyper-minimizing states in a (minimized) deterministic automaton.
- Proc. CIAA 2009
7. Maletti, Quernheim: Pushing for weighted tree automata. Proc. MFCS 2011 8. May, Knight: A better n-best list: practical determinization of weighted finite tree automata. Proc. HLT-NAACL 2006
- A. Maletti and D. Quernheim
Hyper-minimization for deterministic WTA 27