Analysis of patterns and minimal embeddings of non-Markovian - - PowerPoint PPT Presentation

analysis of patterns and minimal embeddings of non
SMART_READER_LITE
LIVE PREVIEW

Analysis of patterns and minimal embeddings of non-Markovian - - PowerPoint PPT Presentation

Analysis of patterns and minimal embeddings of non-Markovian sequences Manuel.Lladser@Colorado.EDU Department of Applied Mathematics University of Colorado Boulder AofA - April 13 2008 1 NOTATION & TERMINOLOGY. A is a finite alphabet A


slide-1
SLIDE 1

Analysis of patterns and minimal embeddings of non-Markovian sequences

Manuel.Lladser@Colorado.EDU Department of Applied Mathematics University of Colorado Boulder

AofA - April 13 2008

1

slide-2
SLIDE 2

NOTATION & TERMINOLOGY. A is a finite alphabet A∗ is the set of all words of finite length A language is a set L ⊂ A∗ X = (Xn)n≥1 is a sequence of A-valued random variables X may be non-Markovian X1 · · · Xl models a random word of length l

2

slide-3
SLIDE 3

PARADIGM. For various probabilistic models for X and languages L the frequency statistics of L are asymptotically normal. SL

n :=

@ number of prefixes in X1 · · · Xn that belong to the language L 1 A The paradigm applies for:

  • generalized patterns ⊕ i.i.d. models [BenKoch93]
  • simple patterns ⊕ stationary Markovian models [RegSzp98]
  • primitive patterns ⊕ k-order Markovian models [NicSalFla02, Nic03]
  • primitive patterns ⊕ nice dynamical sources [BouVal02, BouVal06]
  • hidden patterns ⊕ i.i.d. models [FlaSpaVal06]

3

slide-4
SLIDE 4

THE MARKOV CHAIN EMBEDDING TECHNIQUE. IF X is a homogeneous Markov chain IF L is a regular language IF G = (V, A, f, q, T) is a DFA that recognizes L IF the embedding of X into G i.e. the stochastic process XG

n := f(q, X1 · · · Xn) is a first-order homogenous Markov chain

THEN SL

n =

  number of visits the embedded process XG makes to T in the first n-steps  

4

slide-5
SLIDE 5

EXAMPLE. Consider a 1-st order Markov chain X such that P[X1 = a] = µ; P[X1 = b] = (1 − µ); P[Xn+1 = a | Xn = a] = p; P[Xn+1 = b | Xn = a] = (1 − p); P[Xn+1 = a | Xn = b] = q; P[Xn+1 = b | Xn = b] = (1 − q). Then the embedding of X into the Aho-Corasick automaton

ǫ a b ab ba abb abba a b a b b a a b a b a b a b

that recognizes matches with the regular expression {a, b}∗{ba, abba} i.e. all words of the form x = ...ba or x = ...abba is a 1-st order Markov chain.

5

slide-6
SLIDE 6

ǫ a b ab ba abb abba 1 2 3 4 5 6 a b a b b a a b a b a b a b µ (1 − µ) p (1 − p) (1 − q) q q (1 − q) p (1 − p) q (1 − q) p (1 − p)

6

slide-7
SLIDE 7

What about a completely general sequence X?

7

slide-8
SLIDE 8
  • EXAMPLE. A seemingly unbiassed coin.

Let 0 < p < 1/2 Consider the random binary sequence X = (Xn)n≥1 such that Xn+1

d

=             

Bernoulli(p)

,

1 n n

  • i=1

Xi > 1

2

Bernoulli(1/2)

,

1 n n

  • i=1

Xi = 1

2

Bernoulli(1 − p)

,

1 n n

  • i=1

Xi < 1

2

  • Question. Is there a Markovian structure where X can be

embedded into for analyzing the asymptotic distribution of the frequency statistics of a given language?

8

slide-9
SLIDE 9

GENERAL SETTING.

Given

  • a possibly non-Markovian sequence X
  • a possibly non-regular language L
  • a transformation R : A∗ → S

define XR to be the stochastic process XR

n := R(X1 · · · Xn)

Question 1. What conditions are necessary and sufficient in order for XR to be Markovian? Question 2. Given a pattern L, is there a transformation R such that XR is Markovian but also informative of the distribution of the frequency statistics of L?

9

slide-10
SLIDE 10

REMARK. The Markovianity or non-Markovianity of XR

n := R(X1 · · · Xn),

n ≥ 1 does not really depend on the range of R The above motivates to think of R : A∗ → S as an equivalence relation over A∗: u R v ⇐ ⇒ R(u) = R(v)

  • R(u) is the unique equivalence class of R that contains u
  • c ∈ R means that c is an equivalence class of R

10

slide-11
SLIDE 11
  • DEFINITION. X is embedable w.r.t. R provided that for all

u, v ∈ A∗ and c ∈ R, if u R v then

  • α∈A:R(uα)=c

P[X = uα... | X = u...] =

  • α∈A:R(vα)=c

P[X = vα... | X = v...]

11

slide-12
SLIDE 12
  • DEFINITION. X is embedable w.r.t. R provided that for all

u, v ∈ A∗ and c ∈ R, if u R v then

  • α∈A:R(uα)=c

P[X = uα... | X = u...] =

  • α∈A:R(vα)=c

P[X = vα... | X = v...]

  • Figure. Schematic partition of {0, 1, 2}∗ into equivalence classes

12

slide-13
SLIDE 13
  • DEFINITION. X is embedable w.r.t. R provided that for all

u, v ∈ A∗ and c ∈ R, if u R v then

  • α∈A:R(uα)=c

P[X = uα... | X = u...] =

  • α∈A:R(vα)=c

P[X = vα... | X = v...]

u v

  • Figure. Schematic partition of {0, 1, 2}∗ into equivalence classes

13

slide-14
SLIDE 14
  • DEFINITION. X is embedable w.r.t. R provided that for all

u, v ∈ A∗ and c ∈ R, if u R v then

  • α∈A:R(uα)=c

P[X = uα... | X = u...] =

  • α∈A:R(vα)=c

P[X = vα... | X = v...]

u v v0 u1 v1 u2 u0 v2

  • Figure. Schematic partition of {0, 1, 2}∗ into equivalence classes

14

slide-15
SLIDE 15
  • DEFINITION. X is embedable w.r.t. R provided that for all

u, v ∈ A∗ and c ∈ R, if u R v then

  • α∈A:R(uα)=c

P[X = uα... | X = u...] =

  • α∈A:R(vα)=c

P[X = vα... | X = v...]

u v v0 u1 v1 u2 u0 v2 .4 .3 .7

  • Figure. Schematic partition of {0, 1, 2}∗ into equivalence classes

15

slide-16
SLIDE 16

u v v0 u1 v1 u2 u0 v2 .4 .3 .7

THEOREM A. X is embedable w.r.t. R if and only if, for x ∈ A∗, if

we condition on having X = x... then the stochastic process XR

n := R(X1 · · · Xn),

n ≥ |x|, is a first-order homogeneous Markov chain with transition probabilities that do not depend on x

THEOREM B. For each equivalence relation R in A∗, there exists a

unique coarsest refinement R′ of R w.r.t. which X is embedable

16

slide-17
SLIDE 17

APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X? − → X = a b b a b . . .

(original sequence)

− → XR = 1 1 . . .

(non-Markovian encoding)

XR′ = 4 6 3 4 . . .

(optimal Markovian encoding)

XQ = 6 3 18 15 10 . . .

(any other Markovian encoding)

L A*/L

a abba ab abbab abb

  • Figure. Partition R = {L, A∗ \ L} s.t. XR is non-Markovian

17

slide-18
SLIDE 18

APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X? − → X = a b b a b . . .

(original sequence)

XR = 1 1 . . .

(non-Markovian encoding)

− → XR′ = 4 6 3 4 . . .

(optimal Markovian encoding)

XQ = 6 3 18 15 10 . . .

(any other Markovian encoding)

L A*/L

(0) (1) (2) (3) (4) (5) (6) a abba ab abbab abb

  • Figure. Coarsest refinement R′ of R w.r.t. which X is embedable

18

slide-19
SLIDE 19

APPLICATION/QUESTION. What is the smallest state-space for studying the frequency statistics of a language L in X? − → X = a b b a b . . .

(original sequence)

XR = 1 1 . . .

(non-Markovian encoding)

XR′ = 4 6 3 4 . . .

(optimal Markovian encoding)

− → XQ = 6 3 18 15 10 . . .

(any other Markovian encoding)

L A*/L

(0) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (14) (12) (15) (16) (13) (17) (18) a abba ab abbab abb

  • Figure. Arbitrary refinement Q of R w.r.t. which X is embedable

19

slide-20
SLIDE 20
  • REMARK. The optimal refinement R′ of R such that XR′ is

embedable is obtained through a limiting process: this makes it almost impossible to characterize de equivalence classes of R′ Motivated by this we will introduce an embedding which—while not as optimal—it is analytically tractable (!)

20

slide-21
SLIDE 21
  • DEFINITION. The Markov relation induced by X into A∗ is the

equivalence relation defined as uRXv ⇔ (∀w ∈ A∗) : P[X = uw...|X = u...]=P[X = vw...|X = v...]

21

slide-22
SLIDE 22
  • DEFINITION. The Markov relation induced by X into A∗ is the

equivalence relation defined as uRXv ⇔ (∀w ∈ A∗) : P[X = uw...|X = u...]=P[X = vw...|X = v...]

ε 1 00 01 10 11 .8 .2 .4 .6 .5 .5 u v

  • Figure. Weighted tree visualization of definition with A = {0, 1}

22

slide-23
SLIDE 23

ε 1 00 01 10 11 .8 .2 .4 .6 .5 .5 u v

An equivalence relation R is said to be right-invariant if for all u, v ∈ A∗ and α ∈ A: R(u) = R(v) = ⇒ R(uα) = R(vα) THEOREM C. X is embedable w.r.t. any right-invariant equivalence relation that is a refinement of RX; in particular, X is embedable w.r.t. RX

23

slide-24
SLIDE 24
  • EXAMPLE. Back to the seemingly unbiassed coin.

For 0 < p < 1/2, define Xn+1

d

= 8 > > > > > < > > > > > : Bernoulli(p) ,

1 n n

P

i=1

Xi > 1

2

Bernoulli(1/2) ,

1 n n

P

i=1

Xi = 1

2

Bernoulli(1 − p) ,

1 n n

P

i=1

Xi < 1

2

We aim to understand the frequency statistics of L1 = {0, 1}∗{1}, L2 = {0}∗{1}{0}∗({1}{0}∗{1}{0}∗)∗ within X

24

slide-25
SLIDE 25
  • PROPOSITION. R : {0, 1}∗ → Z defined as

R(x) = 2 8 < :

|x|

X

i=1

xi − |x| 2 9 = ; =

|x|

X

i=1

xi −

|x|

X

i=1

(1 − xi) is a right-invariant refinement of RX. In particular, XR

n := R(X1 · · · Xn)

is a first-order homogeneous Markov chain

n>0 n<0 1/2 1/2 p (1-p) p (1-p)

XR is recurrent, with period 2. Because 0 < p < 1/2, XR is positive recurrent; in particular, there exists a stationary distribution π. Observe that SL1

n

=

n

X

i=1

Xi

25

slide-26
SLIDE 26

SL1

n

=

n

P

i=1

Xi COROLLARY A. If U and V are Z-valued random variables such that P[U = n] = 2 · π(n), n = 0( mod 2); P[V = n] = 2 · π(n), n = 1( mod 2); then for L1 := {0, 1}∗{1} it applies that lim

n→∞

n=0(mod 2)

2n · SL1

n

n − 1 2 ff

d

= U; lim

n→∞

n=1(mod 2)

2n · SL1

n

n − 1 2 ff

d

= V.

26

slide-27
SLIDE 27

L2 is recognized by the automaton:

A B 1

According to the Mihill-Nerode theorem, Q : {0, 1}∗ → {A, B} defined as Q(x) :=   state in the automaton where the path associated with x ends when starting at A   is right-invariant Hence R × Q is also right-invariant and a refinement of RX. In particular, XR×Q

n

:= (XR

n , XQ n ) is a first-order homogeneous

Markov chain

27

slide-28
SLIDE 28

1/2 1/2 p p p p p p p p (1-p) 1/2 p (1-p) (1-p) (1-p) (1-p) (1-p) (1-p) A B 1 2 3 4

  • 1
  • 2
  • 3
  • 4

XR×Q is positive recurrent, with period 4. Returning times to a state have finite second moment. This allows to use the central limit theorem for additive functionals of Markov chains to obtain the following result. COROLLARY B. There exists σ > 0 such that lim

n→∞

√n · SL2

n

n − 1 2

  • d

= σ · W, where W is a standard Normal random variable

28

slide-29
SLIDE 29
  • CONCLUSION. For the same non-Markovian sequence X,

non-Gaussian (discrete w/phases) and Gaussian limits are obtained for the frequency statistics of different regular languages

29

slide-30
SLIDE 30

(More details in the 2008 ANALCO proceedings.)

... Thank you (!)

34