Random Sampling of Ordered Trees according to the Number of - - PowerPoint PPT Presentation

random sampling of ordered trees according to the number
SMART_READER_LITE
LIVE PREVIEW

Random Sampling of Ordered Trees according to the Number of - - PowerPoint PPT Presentation

Random Sampling of Ordered Trees according to the Number of Occurrences of a Pattern Gwendal Collet , Julien David, Alice Jacquot GASCom, June 2nd 2016 Definitions S -Trees: T = ( root, ( T 1 , . . . , T k )) where k S 0 , S finite ex:


slide-1
SLIDE 1

Random Sampling of Ordered Trees according to the Number of Occurrences of a Pattern

Gwendal Collet, Julien David, Alice Jacquot GASCom, June 2nd 2016

slide-2
SLIDE 2

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Prefix

slide-3
SLIDE 3

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Suffix

slide-4
SLIDE 4

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Pattern = Prefix of suffix

slide-5
SLIDE 5

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees Not a pattern!

slide-6
SLIDE 6

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees P 2 occurrences

slide-7
SLIDE 7

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees P 2 overlapping

  • ccurrences

+ 1 occurrence

slide-8
SLIDE 8

Definitions

S-Trees: T = (root, (T1, . . . , Tk)) where k ∈ S ⊃ 0, S finite ex: Binary trees (S = 0, 2), Motzkin trees (S = {0, 1, 2}), Plane trees P 2 overlapping

  • ccurrences

+ 1 occurrence Problem: Given S and a S-tree P, how to sample randomly a S-tree with n node and exactly k occurrences of P?

[Chyzak, Drmota, Klausner, Kok’2008] Expected number of occurrences is Gaussian in unordered trees [Flouri, Melichar, Janousek’2009] Linear algorithm to count the number of occurrences

slide-9
SLIDE 9

Idea of the algorithm

Precalculus: algorithm to generate a tree language specification Given a S-tree P: → recognizing any S-tree → marking each occurrence of P Random sampler: → translate the specification into a system of algebraic equations

  • n generating series

→ build a bivariate Boltzmann sampler based on these equations ⇒ adapt Aho-Corasic algorithm on words to tree structures

slide-10
SLIDE 10

Idea of the algorithm

Precalculus: algorithm to generate a tree language specification Given a S-tree P: → recognizing any S-tree → marking each occurrence of P Random sampler: → translate the specification into a system of algebraic equations

  • n generating series

→ build a bivariate Boltzmann sampler based on these equations ⇒ adapt Aho-Corasic algorithm on words to tree structures Remark on Boltzmann samplers: quasi-automatically built on generating series (+ singularity extraction) uniform among elements of same size linear in approximated size quadratic in exact size by reject

slide-11
SLIDE 11

Idea of the algorithm

At a given height: does a node belong to an occurrence of P? Read the tree from top to bottom → depends on nodes above, at a bounded distance (h(P)) → depends on neighbors, at a bounded distance (max(arity)h(P )) → depends on nodes below (to check later) ⇒ Only need to check a subtree of bounded size Strong dependencies between nodes at same height ⇒ Need to consider simultaneously tuples of nodes

slide-12
SLIDE 12

Idea of the algorithm

At a given height: does a node belong to an occurrence of P? Read the tree from top to bottom → depends on nodes above, at a bounded distance (h(P)) → depends on neighbors, at a bounded distance (max(arity)h(P )) → depends on nodes below (to check later) ⇒ Only need to check a subtree of bounded size Strong dependencies between nodes at same height ⇒ Need to consider simultaneously tuples of nodes Build a grammar where: → Non-terminals correspond to tuples of nodes associated to a subtree which is candidate to contain an occurrence → Rules describe what happens when this subtree grows

slide-13
SLIDE 13

Generalized tree grammar

Let G = (N, A, S, R) be a grammar if

  • N = set of non-terminals
  • A = axiom (starting non-terminal)
  • S = terminals (here arities)
  • R = set of rules r such that:

r = (n, (s1, . . . , s|n|), λ, (n1, . . . , n|λ|)) n ∈ N, nj ∈ N, si ∈ S λ partition of {1, 2, . . . , k

i=1 si}

|λ| number of parts in λ, |λj| = |nj|

ex:

n n1 n2 n3 1 2 3 4 5 6

r = (n, (2, 1, 0, 3), {13|245|6}, (n1, n2, n3))

|n| number of nodes in n

slide-14
SLIDE 14

Generalized tree grammar

A given pattern: The grammar we obtain: marks a rule

* *

that produces an occurrence of the pattern.

* * * Let G = (N, A, S, R) be a grammar if

S = {0, 1, 2, 3}

  • N = set of non-terminals
  • A = axiom (starting non-terminal)
  • S = terminals (here arities)
  • R = set of rules r such that:

r = (n, (s1, . . . , s|n|), λ, (n1, . . . , n|λ|)) n ∈ N, nj ∈ N, si ∈ S λ partition of {1, 2, . . . , k

i=1 si}

|λ| number of parts in λ, |λj| = |nj|

ex:

n n1 n2 n3 1 2 3 4 5 6

r = (n, (2, 1, 0, 3), {13|245|6}, (n1, n2, n3))

|n| number of nodes in n

A given pattern: S = {0, 1, 2, 3} A given pattern: S = {0, 1, 2, 3} A given pattern: S = {0, 1, 2, 3}

slide-15
SLIDE 15

Dealing with overlappings

P Double comb

slide-16
SLIDE 16

Dealing with overlappings

P Double comb

slide-17
SLIDE 17

Dealing with overlappings

P Double comb

slide-18
SLIDE 18

Dealing with overlappings

P Double comb

Node belonging to two different prefixes of P

slide-19
SLIDE 19

Dealing with overlappings

P Double comb

Disjoint nodes belonging to two overlapping prefixes

slide-20
SLIDE 20

Dealing with overlappings

P Double comb New non-terminal!

slide-21
SLIDE 21

Dealing with overlappings

P Double comb New non-terminal! ⇒ If two prefixes share at least one leaf, all their leaves must be taken in the same part of λ → Might create new non-terminals by superposing prefixes of P → Possible exponential explosion of the number of non-terminals in pathological cases (like double comb)

slide-22
SLIDE 22

Dealing with overlappings

P Double comb New non-terminal! ⇒ If two prefixes share at least one leaf, all their leaves must be taken in the same part of λ → Might create new non-terminals by superposing prefixes of P → Possible exponential explosion of the number of non-terminals in pathological cases (like double comb) Remark: Costly precalculus in some cases (in the size of P) but in practice, the pattern is small compared to the generated trees Boltzmann sampler still linear, but at a cost in memory space, due to the size of the generated grammar

slide-23
SLIDE 23

Backbone of the algorithm

Input: a S-tree P Output: a grammar G = (N, U, A, S, R) For each non terminal n ∈ N do For each (s1, . . . , s|n|) ∈ S|n| do Compute new tree T Compute new prefixes of P in T Compute partition λ of independant nodes Compute subtree associated to each part If new subtree T ′ ∈ N If height(new subtree T ′) = height(P) Then add T ′ to N Then add T ′ to U Add rule (n, (s1, . . . , s|n|), λ, (n1, . . . , n|λ|)) to R N ← {A}, U ← ∅, R ← ∅ Return (N,U,A,S,R)

slide-24
SLIDE 24

Experimental results

Size of the generated grammar for 100 random patterns

Binary Motzkin

slide-25
SLIDE 25

Thank you!