[PPT] - An Introduction to Tries Kevin Leckey Monash University 21.09.2015 PowerPoint Presentation

SLIDE 1

An Introduction to Tries

Kevin Leckey

Monash University

21.09.2015

SLIDE 2

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

SLIDE 3

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

SLIDE 4

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations → Use tree-like data structures such as a Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

SLIDE 5

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations → Use tree-like data structures such as a Trie (Information retrieval)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

SLIDE 6

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

SLIDE 7

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

SLIDE 8

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

SLIDE 9

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

SLIDE 10

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

SLIDE 11

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 12

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 13

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 14

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 15

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 16

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 17

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 18

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 19

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = 3 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 20

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = 3 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie = 4

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

SLIDE 21

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

SLIDE 22

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

SLIDE 23

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

SLIDE 24

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

independent,

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

SLIDE 25

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

independent,
P(ξj = 0) = 1/2 = P(ξj = 1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

SLIDE 26

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

independent,
P(ξj = 0) = 1/2 = P(ξj = 1)

More general models allow ξ1, ξ2, . . . to be dependent (e.g. evolving as a Markov chain)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

SLIDE 27

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 28

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ1, Ξ2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 29

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 30

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ1, Ξ2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 31

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 32

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 33

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ3 Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 34

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 35

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ4 Ξ2 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 36

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ4 Ξ2 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 37

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1, Ξ4 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 38

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ4 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 39

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 40

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ5 Ξ2 Ξ1 Ξ4 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 41

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3, Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 42

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ5 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 43

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3, Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 44

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ5 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 45

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

SLIDE 46

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

SLIDE 47

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

SLIDE 48

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

SLIDE 49

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =

1 −

1 2 kn−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

SLIDE 50

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =

1 −

1 2 kn−1 Consequence: P(Dn ≤ α log2(n)) =

1 − n−αn−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

SLIDE 51

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =

1 −

1 2 kn−1 Consequence: P(Dn ≤ α log2(n)) =

1 − n−αn−1 n→∞

− →

1,

if α > 1, 0, if α < 1.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

SLIDE 52

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

SLIDE 53

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈

1 − 2−x

n n−1

n→∞

− → e−2−x (Limit is a Gumbel distribution known from extreme value theory)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

SLIDE 54

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈

1 − 2−x

n n−1

n→∞

− → e−2−x (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[Dn] = log2(n) + Ψ(log2(n)) + o(1) with periodic function Ψ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

SLIDE 55

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈

1 − 2−x

n n−1

n→∞

− → e−2−x (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[Dn] = log2(n) + Ψ(log2(n)) + o(1) with periodic function Ψ Thm (Szpankowski ’86): Var(Dn) ∼ Φ(log2(n)) with periodic function Φ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

SLIDE 56

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 57

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 58

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) =

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 59

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n})

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 60

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 61

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·

1 −
1 − n−αn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 62

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·

1 −
1 − n−αn

≤ n2−α

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 63

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·

1 −
1 − n−αn

≤ n2−α Consequence: P (Hn > α log2(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

SLIDE 64

Analysis The Height

Results on Hn

Partly proven on the previous slide: Hn 2 log2(n)

P

− → 1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

SLIDE 65

Analysis The Height

Results on Hn

Partly proven on the previous slide: Hn 2 log2(n)

P

− → 1 Thm (Devroye ’84): lim

n→∞ P(Hn − 2 log2(n) − 1 ≤ x) = exp(−2−x),

x ∈ R

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

SLIDE 66

Analysis The Height

Results on Hn

Partly proven on the previous slide: Hn 2 log2(n)

P

− → 1 Thm (Devroye ’84): lim

n→∞ P(Hn − 2 log2(n) − 1 ≤ x) = exp(−2−x),

x ∈ R Thm (Regnier ’82): E[Hn] ∼ 2 log2(n) (n → ∞) (Flajolet, Steyaert ’82 → periodic second order term)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

SLIDE 67

Analysis The Height

Summary: Typical depth: log2(n), height: 2 log2(n).

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19

SLIDE 68

Analysis The Height

Summary: Typical depth: log2(n), height: 2 log2(n). Profile (Park, Hwang, Nicod` eme, Szpankowski):

log2

n log n + O(1)

log2 n + O(1) 2 log2 n + O(1) log2

n log n + O(1)

log2 n + O(1) 2 log2 n + O(1)

(External nodes/Leaves) (Internal nodes)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19

SLIDE 69

Analysis The External Path Length

Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=

n

i=1

Dn,i, Dn,i = Dn(Ξi).

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

SLIDE 70

Analysis The External Path Length

Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=

n

i=1

Dn,i, Dn,i = Dn(Ξi).

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

SLIDE 71

Analysis The External Path Length

Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=

n

i=1

Dn,i, Dn,i = Dn(Ξi).

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Example: L6 = 2 + 3 + 4 · 4 = 21

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

SLIDE 72

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 73

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 74

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 75

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

=

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 76

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

= LKn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 77

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

= LKn + ˜ Ln−Kn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 78

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

= LKn + ˜ Ln−Kn + n

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

SLIDE 79

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 80

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 81

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 82

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

2. Find the Limits: (An,1, An,2, bn) −

→ ???

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 83

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 84

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

X

(1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 85

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

X

(1)

3. Solution to (1): Existence of a solution to (1). Here: Normal

distribution with mean 0.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 86

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

X

(1)

3. Solution to (1): Existence of a solution to (1). Here: Normal

distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed

point of a contracting map.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 87

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

1. Rescaling: Xn = (Ln − E[Ln])/
Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

X

(1)

3. Solution to (1): Existence of a solution to (1). Here: Normal

distribution with mean 0.

4. Contraction: Find a metric such that (1) corresponds to the fixed

point of a contracting map.

5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

SLIDE 88

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

Var(Ln)

d

− → N(0, 1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

SLIDE 89

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

i=1

Dn(Ξi)

Kevin Leckey

(Monash University) An Introduction to Tries 21.09.2015 15 / 19

SLIDE 90

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

i=1

Dn(Ξi)

= nE[Dn]

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

SLIDE 91

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

i=1

Dn(Ξi)

= nE[Dn] = n log2(n) + nΨ(log2(n)) + o(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

SLIDE 92

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

i=1

Dn(Ξi)

= nE[Dn] = n log2(n) + nΨ(log2(n)) + o(n)

Thm (Kirschenhofer, Prodinger ’86): Var(Ln) = n Ψ(log2(n)) + O(log2(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

SLIDE 93

Summary

Trie: tree-like data structure to store words

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 94

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 95

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 96

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 97

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 98

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n) Construction cost (path length): around n log2(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 99

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n) Construction cost (path length): around n log2(n)

Input model not very realistic, what about more general input models?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

SLIDE 100

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 101

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 102

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 103

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

P(ξ1 = a) = µa,

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 104

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

P(ξ1 = a) = µa,
P(ξj+1 = a|ξ1, . . . , ξj) = pξja

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 105

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

P(ξ1 = a) = µa,
P(ξj+1 = a|ξ1, . . . , ξj) = pξja

More general (Markov Model with k-dependency): distribution of ξj depends only on the previous k letters for some fixed k

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 106

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

P(ξ1 = a) = µa,
P(ξj+1 = a|ξ1, . . . , ξj) = pξja

More general (Markov Model with k-dependency): distribution of ξj depends only on the previous k letters for some fixed k Even more general: Dynamical Sources Model by Vall´ ee

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

SLIDE 107

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19

SLIDE 108

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19

SLIDE 109

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large Entropy in the Markov Source Model: H = π0 (−p00 log(p00) − p01 log(p01)) + π1 (−p10 log(p10) − p11 log(p11)) with stationary distribution (π0, π1) =

p10

p10 + p01 , p01 p10 + p01

Kevin Leckey

(Monash University) An Introduction to Tries 21.09.2015 18 / 19

SLIDE 110

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large Entropy in the Markov Source Model: H = π0 (−p00 log(p00) − p01 log(p01)) + π1 (−p10 log(p10) − p11 log(p11)) with stationary distribution (π0, π1) =

p10

p10 + p01 , p01 p10 + p01

Depth for Markov Sources:

E[Dn] ∼ 1 H log(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19

SLIDE 111

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

SLIDE 112

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

SLIDE 113

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

SLIDE 114

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength) Radix-Sort and -Select (Thesis L.)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

SLIDE 115

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength) Radix-Sort and -Select (Thesis L.) Lempel-Ziv Parsing Scheme (data compression)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19