SLIDE 1
Random Sampling of Ordered Trees according to the Number of - - PowerPoint PPT Presentation
Random Sampling of Ordered Trees according to the Number of - - PowerPoint PPT Presentation
Random Sampling of Ordered Trees according to the Number of Occurrences of a Pattern Gwendal Collet , Julien David, Alice Jacquot GASCom, June 2nd 2016 Definitions S -Trees: T = ( root, ( T 1 , . . . , T k )) where k S 0 , S finite ex:
SLIDE 2
SLIDE 3
Definitions
S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Suffix
SLIDE 4
Definitions
S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Pattern = Prefix of suffix
SLIDE 5
Definitions
S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Not a pattern!
SLIDE 6
Definitions
S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees P 2 occurrences
SLIDE 7
Definitions
S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees P 2 overlapping
- ccurrences
+ 1 occurrence
SLIDE 8
Definitions
S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees P 2 overlapping
- ccurrences
+ 1 occurrence Problem: Given S and a S-tree P, how to sample randomly a S-tree with n node and exactly k occurrences of P?
[Chyzak, Drmota, Klausner, Kok’2008] Expected number of occurrences is Gaussian in unordered trees [Flouri, Melichar, Janousek’2009] Linear algorithm to count the number of occurrences
SLIDE 9
Idea of the algorithm
Precalculus: algorithm to generate a tree language specification Given a S-tree P: → recognizing any S-tree → marking each occurrence of P Random sampler: → translate the specification into a system of algebraic equations
- n generating series
→ build a bivariate Boltzmann sampler based on these equations ⇒ adapt Aho-Corasic algorithm on words to tree structures
SLIDE 10
Idea of the algorithm
Precalculus: algorithm to generate a tree language specification Given a S-tree P: → recognizing any S-tree → marking each occurrence of P Random sampler: → translate the specification into a system of algebraic equations
- n generating series
→ build a bivariate Boltzmann sampler based on these equations ⇒ adapt Aho-Corasic algorithm on words to tree structures Remark on Boltzmann samplers: quasi-automatically built on generating series (+ singularity extraction) uniform among elements of same size linear in approximated size quadratic in exact size by reject
SLIDE 11
Idea of the algorithm
At a given height: does a node belong to an occurrence of P? Read the tree from top to bottom → depends on nodes above, at a bounded distance (h(P)) → depends on neighbors, at a bounded distance (max(arity)h(P )) → depends on nodes below (to check later) ⇒ Only need to check a subtree of bounded size Strong dependencies between nodes at same height ⇒ Need to consider simultaneously tuples of nodes
SLIDE 12
Idea of the algorithm
At a given height: does a node belong to an occurrence of P? Read the tree from top to bottom → depends on nodes above, at a bounded distance (h(P)) → depends on neighbors, at a bounded distance (max(arity)h(P )) → depends on nodes below (to check later) ⇒ Only need to check a subtree of bounded size Strong dependencies between nodes at same height ⇒ Need to consider simultaneously tuples of nodes Build a grammar where: → Non-terminals correspond to tuples of nodes associated to a subtree which is candidate to contain an occurrence → Rules describe what happens when this subtree grows
SLIDE 13
Generalized tree grammar
Let G = (N, A, S, R) be a grammar if
- N = set of non-terminals
- A = axiom (starting non-terminal)
- S = terminals (here arities)
- R = set of rules r such that:
r = (n, (s1, . . . , s|n|), λ, (n1, . . . , n|λ|)) n ∈ N, nj ∈ N, si ∈ S λ partition of {1, 2, . . . , k
i=1 si}
|λ| number of parts in λ, |λj| = |nj|
ex:
n n1 n2 n3 1 2 3 4 5 6
r = (n, (2, 1, 0, 3), {13|245|6}, (n1, n2, n3))
|n| number of nodes in n
SLIDE 14
Generalized tree grammar
A given pattern: The grammar we obtain: marks a rule
* *
that produces an occurrence of the pattern.
* * * Let G = (N, A, S, R) be a grammar if
S = {0, 1, 2, 3}
- N = set of non-terminals
- A = axiom (starting non-terminal)
- S = terminals (here arities)
- R = set of rules r such that:
r = (n, (s1, . . . , s|n|), λ, (n1, . . . , n|λ|)) n ∈ N, nj ∈ N, si ∈ S λ partition of {1, 2, . . . , k
i=1 si}
|λ| number of parts in λ, |λj| = |nj|
ex:
n n1 n2 n3 1 2 3 4 5 6
r = (n, (2, 1, 0, 3), {13|245|6}, (n1, n2, n3))
|n| number of nodes in n
A given pattern: S = {0, 1, 2, 3} A given pattern: S = {0, 1, 2, 3} A given pattern: S = {0, 1, 2, 3}
SLIDE 15
Dealing with overlappings
P Double comb
SLIDE 16
Dealing with overlappings
P Double comb
SLIDE 17
Dealing with overlappings
P Double comb
SLIDE 18
Dealing with overlappings
P Double comb
Node belonging to two different prefixes of P
SLIDE 19
Dealing with overlappings
P Double comb
Disjoint nodes belonging to two overlapping prefixes
SLIDE 20
Dealing with overlappings
P Double comb New non-terminal!
SLIDE 21
Dealing with overlappings
P Double comb New non-terminal! ⇒ If two prefixes share at least one leaf, all their leaves must be taken in the same part of λ → Might create new non-terminals by superposing prefixes of P → Possible exponential explosion of the number of non-terminals in pathological cases (like double comb)
SLIDE 22
Dealing with overlappings
P Double comb New non-terminal! ⇒ If two prefixes share at least one leaf, all their leaves must be taken in the same part of λ → Might create new non-terminals by superposing prefixes of P → Possible exponential explosion of the number of non-terminals in pathological cases (like double comb) Remark: Costly precalculus in some cases (in the size of P) but in practice, the pattern is small compared to the generated trees Boltzmann sampler still linear, but at a cost in memory space, due to the size of the generated grammar
SLIDE 23
Backbone of the algorithm
Input: a S-tree P Output: a grammar G = (N, U, A, S, R) For each non terminal n ∈ N do For each (s1, . . . , s|n|) ∈ S|n| do Compute new tree T Compute new prefixes of P in T Compute partition λ of independant nodes Compute subtree associated to each part If new subtree T ′ ∈ N If height(new subtree T ′) = height(P) Then add T ′ to N Then add T ′ to U Add rule (n, (s1, . . . , s|n|), λ, (n1, . . . , n|λ|)) to R N ← {A}, U ← ∅, R ← ∅ Return (N,U,A,S,R)
SLIDE 24
Experimental results
Size of the generated grammar for 100 random patterns
Binary Motzkin
SLIDE 25