An Introduction to Tries Kevin Leckey Monash University 21.09.2015 - - PowerPoint PPT Presentation

an introduction to tries
SMART_READER_LITE
LIVE PREVIEW

An Introduction to Tries Kevin Leckey Monash University 21.09.2015 - - PowerPoint PPT Presentation

An Introduction to Tries Kevin Leckey Monash University 21.09.2015 Introduction CS Background Given: Words, e.g. in binary code 1 = 11010 . . . , 2 = 00011 . . . , 3 = 01101 . . . , 4 = 00000 . . . , 5 = 11111 . . . , 6 =


slide-1
SLIDE 1

An Introduction to Tries

Kevin Leckey

Monash University

21.09.2015

slide-2
SLIDE 2

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

slide-3
SLIDE 3

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

slide-4
SLIDE 4

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations → Use tree-like data structures such as a Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

slide-5
SLIDE 5

Introduction CS Background

Given: Words, e.g. in binary code Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . . Task: Storage that allows fast search and insert/delete operations → Use tree-like data structures such as a Trie (Information retrieval)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 2 / 19

slide-6
SLIDE 6

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

slide-7
SLIDE 7

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

slide-8
SLIDE 8

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

slide-9
SLIDE 9

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

slide-10
SLIDE 10

Introduction CS Background

Constructing a Trie

Ξ1 = 11010 . . . , Ξ2 = 00011 . . . , Ξ3 = 01101 . . . , Ξ4 = 00000 . . . , Ξ5 = 11111 . . . , Ξ6 = 11100 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 3 / 19

slide-11
SLIDE 11

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-12
SLIDE 12

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-13
SLIDE 13

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-14
SLIDE 14

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-15
SLIDE 15

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-16
SLIDE 16

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-17
SLIDE 17

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-18
SLIDE 18

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-19
SLIDE 19

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = 3 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-20
SLIDE 20

Introduction CS Background

Searching

Search for Ξ1 = 11010 . . .

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Searching cost = Depth of Ξ1 = 3 = Shortest prefix of Ξ1 not shared by Ξ2, . . . , Ξ6 Worst case = Height of the Trie = 4

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 4 / 19

slide-21
SLIDE 21

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

slide-22
SLIDE 22

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

slide-23
SLIDE 23

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

slide-24
SLIDE 24

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

  • independent,

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

slide-25
SLIDE 25

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

  • independent,
  • P(ξj = 0) = 1/2 = P(ξj = 1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

slide-26
SLIDE 26

Introduction The Probabilistic Model

Input Model

Generate the words Ξ1, Ξ2, . . . to be stored → Probabilistic Model The words Ξ1, Ξ2, . . . are independent and identically distributed Each word Ξi = ξ1ξ2ξ3ξ4 . . . consists of letters ξ1, ξ2, . . . that are:

  • independent,
  • P(ξj = 0) = 1/2 = P(ξj = 1)

More general models allow ξ1, ξ2, . . . to be dependent (e.g. evolving as a Markov chain)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 5 / 19

slide-27
SLIDE 27

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-28
SLIDE 28

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ1, Ξ2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-29
SLIDE 29

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-30
SLIDE 30

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ1, Ξ2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-31
SLIDE 31

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-32
SLIDE 32

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-33
SLIDE 33

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ3 Ξ2 Ξ1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-34
SLIDE 34

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-35
SLIDE 35

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ4 Ξ2 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-36
SLIDE 36

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ4 Ξ2 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-37
SLIDE 37

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1, Ξ4 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-38
SLIDE 38

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ4 Ξ1 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-39
SLIDE 39

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-40
SLIDE 40

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ5 Ξ2 Ξ1 Ξ4 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-41
SLIDE 41

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3, Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-42
SLIDE 42

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ5 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-43
SLIDE 43

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3, Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-44
SLIDE 44

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ5 Ξ3

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-45
SLIDE 45

Introduction The resulting random Tree

A recursive construction of the Trie

Ξ2 Ξ1 Ξ4 Ξ3 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 6 / 19

slide-46
SLIDE 46

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

slide-47
SLIDE 47

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . .

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

slide-48
SLIDE 48

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

slide-49
SLIDE 49

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =

  • 1 −

1 2 kn−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

slide-50
SLIDE 50

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =

  • 1 −

1 2 kn−1 Consequence: P(Dn ≤ α log2(n)) =

  • 1 − n−αn−1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

slide-51
SLIDE 51

Analysis The Depth

Consider n words Ξ1, . . . , Ξn. What is the depth of the vertex Ξ1? Recall: Depth Dn = Length of the shortest unique prefix of Ξ1 = ξ1ξ2ξ3 . . . P(Dn ≤ k) = P(Ξ2, . . . , Ξn do not start with ξ1 . . . ξk) =

  • 1 −

1 2 kn−1 Consequence: P(Dn ≤ α log2(n)) =

  • 1 − n−αn−1 n→∞

− →

  • 1,

if α > 1, 0, if α < 1.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 7 / 19

slide-52
SLIDE 52

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

slide-53
SLIDE 53

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈

  • 1 − 2−x

n n−1

n→∞

− → e−2−x (Limit is a Gumbel distribution known from extreme value theory)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

slide-54
SLIDE 54

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈

  • 1 − 2−x

n n−1

n→∞

− → e−2−x (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[Dn] = log2(n) + Ψ(log2(n)) + o(1) with periodic function Ψ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

slide-55
SLIDE 55

Analysis The Depth

Results on Dn

Shown on the previous slide: Dn log2(n)

P

− → 1 (n → ∞) Considering the previous slide more carefully: P (Dn − log2(n) < x) ≈

  • 1 − 2−x

n n−1

n→∞

− → e−2−x (Limit is a Gumbel distribution known from extreme value theory) Thm (Knuth ’72): E[Dn] = log2(n) + Ψ(log2(n)) + o(1) with periodic function Ψ Thm (Szpankowski ’86): Var(Dn) ∼ Φ(log2(n)) with periodic function Φ

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 8 / 19

slide-56
SLIDE 56

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-57
SLIDE 57

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-58
SLIDE 58

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) =

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-59
SLIDE 59

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n})

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-60
SLIDE 60

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-61
SLIDE 61

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·

  • 1 −
  • 1 − n−αn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-62
SLIDE 62

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·

  • 1 −
  • 1 − n−αn

≤ n2−α

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-63
SLIDE 63

Analysis The Height

Consider n words Ξ1, . . . , Ξn. What is the height of the resulting Trie? Def: Height Hn = max{Dn(Ξi) : i = 1, . . . , n}. The result P(Dn ≤ k) = (1 − 2−k)n−1 implies: P (Hn > α log2(n)) = P (Dn(Ξi) > α log2(n) for some i ∈ {1, . . . , n}) ≤ n · P (Dn > α log2(n)) ≤ n ·

  • 1 −
  • 1 − n−αn

≤ n2−α Consequence: P (Hn > α log2(n)) → 0 for α > 2

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 9 / 19

slide-64
SLIDE 64

Analysis The Height

Results on Hn

Partly proven on the previous slide: Hn 2 log2(n)

P

− → 1

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

slide-65
SLIDE 65

Analysis The Height

Results on Hn

Partly proven on the previous slide: Hn 2 log2(n)

P

− → 1 Thm (Devroye ’84): lim

n→∞ P(Hn − 2 log2(n) − 1 ≤ x) = exp(−2−x),

x ∈ R

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

slide-66
SLIDE 66

Analysis The Height

Results on Hn

Partly proven on the previous slide: Hn 2 log2(n)

P

− → 1 Thm (Devroye ’84): lim

n→∞ P(Hn − 2 log2(n) − 1 ≤ x) = exp(−2−x),

x ∈ R Thm (Regnier ’82): E[Hn] ∼ 2 log2(n) (n → ∞) (Flajolet, Steyaert ’82 → periodic second order term)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 10 / 19

slide-67
SLIDE 67

Analysis The Height

Summary: Typical depth: log2(n), height: 2 log2(n).

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19

slide-68
SLIDE 68

Analysis The Height

Summary: Typical depth: log2(n), height: 2 log2(n). Profile (Park, Hwang, Nicod` eme, Szpankowski):

log2

n log n + O(1)

log2 n + O(1) 2 log2 n + O(1) log2

n log n + O(1)

log2 n + O(1) 2 log2 n + O(1)

(External nodes/Leaves) (Internal nodes)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 11 / 19

slide-69
SLIDE 69

Analysis The External Path Length

Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=

n

  • i=1

Dn,i, Dn,i = Dn(Ξi).

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

slide-70
SLIDE 70

Analysis The External Path Length

Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=

n

  • i=1

Dn,i, Dn,i = Dn(Ξi).

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

slide-71
SLIDE 71

Analysis The External Path Length

Consider n words Ξ1, . . . , Ξn. External Path Length: Ln :=

n

  • i=1

Dn,i, Dn,i = Dn(Ξi).

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Example: L6 = 2 + 3 + 4 · 4 = 21

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 12 / 19

slide-72
SLIDE 72

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-73
SLIDE 73

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-74
SLIDE 74

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-75
SLIDE 75

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

=

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-76
SLIDE 76

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

= LKn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-77
SLIDE 77

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

= LKn + ˜ Ln−Kn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-78
SLIDE 78

Analysis The External Path Length

A Recursion for Ln

Ξ4 Ξ2 Ξ3 Ξ1 Ξ6 Ξ5

Kn = # words starting with 0 Ln

d

= LKn + ˜ Ln−Kn + n

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 13 / 19

slide-79
SLIDE 79

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-80
SLIDE 80

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-81
SLIDE 81

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-82
SLIDE 82

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

  • 2. Find the Limits: (An,1, An,2, bn) −

→ ???

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-83
SLIDE 83

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

  • 2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-84
SLIDE 84

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

  • 2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

  • X

(1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-85
SLIDE 85

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

  • 2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

  • X

(1)

  • 3. Solution to (1): Existence of a solution to (1). Here: Normal

distribution with mean 0.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-86
SLIDE 86

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

  • 2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

  • X

(1)

  • 3. Solution to (1): Existence of a solution to (1). Here: Normal

distribution with mean 0.

  • 4. Contraction: Find a metric such that (1) corresponds to the fixed

point of a contracting map.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-87
SLIDE 87

Analysis The External Path Length

The Contraction Method in a Nutshell

Aim: Find a limit law for Ln (after rescaling properly) Ln

d

= LKn + ˜ Ln−Kn + n

  • 1. Rescaling: Xn = (Ln − E[Ln])/
  • Var(Ln)

Xn

d

= An,1XKn + An,2 Xn−Kn + bn

  • 2. Find the Limits: (An,1, An,2, bn) −

→ (( √ 2)−1, ( √ 2)−1, 0) X d = 1 √ 2 X + 1 √ 2

  • X

(1)

  • 3. Solution to (1): Existence of a solution to (1). Here: Normal

distribution with mean 0.

  • 4. Contraction: Find a metric such that (1) corresponds to the fixed

point of a contracting map.

  • 5. Convergence: Prove convergence with respect to that metric.

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 14 / 19

slide-88
SLIDE 88

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

  • Var(Ln)

d

− → N(0, 1)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

slide-89
SLIDE 89

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

  • Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

  • i=1

Dn(Ξi)

  • Kevin Leckey

(Monash University) An Introduction to Tries 21.09.2015 15 / 19

slide-90
SLIDE 90

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

  • Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

  • i=1

Dn(Ξi)

  • = nE[Dn]

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

slide-91
SLIDE 91

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

  • Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

  • i=1

Dn(Ξi)

  • = nE[Dn] = n log2(n) + nΨ(log2(n)) + o(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

slide-92
SLIDE 92

Analysis The External Path Length

Results on Ln

Thm (Jacquet, Regnier ’88; Neininger, R¨ uschendorf 2004): Ln − E[Ln]

  • Var(Ln)

d

− → N(0, 1) From the analysis of Dn: E[Ln] = E n

  • i=1

Dn(Ξi)

  • = nE[Dn] = n log2(n) + nΨ(log2(n)) + o(n)

Thm (Kirschenhofer, Prodinger ’86): Var(Ln) = n Ψ(log2(n)) + O(log2(n))

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 15 / 19

slide-93
SLIDE 93

Summary

Trie: tree-like data structure to store words

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-94
SLIDE 94

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-95
SLIDE 95

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-96
SLIDE 96

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-97
SLIDE 97

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-98
SLIDE 98

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n) Construction cost (path length): around n log2(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-99
SLIDE 99

Summary

Trie: tree-like data structure to store words position of a word in the tree ↔ path given by shortest unique prefix Performance: Consider input: n independent words, each word is a sequence of ’coin tosses’

Typical search/insert time (depth): around log2(n) Worst search/insert time (height): around 2 log2(n) Construction cost (path length): around n log2(n)

Input model not very realistic, what about more general input models?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 16 / 19

slide-100
SLIDE 100

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-101
SLIDE 101

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-102
SLIDE 102

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-103
SLIDE 103

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

  • P(ξ1 = a) = µa,

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-104
SLIDE 104

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

  • P(ξ1 = a) = µa,
  • P(ξj+1 = a|ξ1, . . . , ξj) = pξja

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-105
SLIDE 105

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

  • P(ξ1 = a) = µa,
  • P(ξj+1 = a|ξ1, . . . , ξj) = pξja

More general (Markov Model with k-dependency): distribution of ξj depends only on the previous k letters for some fixed k

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-106
SLIDE 106

The Markov Source Model

Markov Model

Generate n words Ξ1, . . . , Ξn such that the words Ξ1, . . . , Ξn are independent and identically distributed Each word Ξk = ξ1ξ2ξ3 . . . has letters (ξj)j≥1 which are a Markov chain on {0, 1}, i.e. for some µ = (µ0, µ1) and P = (pij)i,j∈{0,1}

  • P(ξ1 = a) = µa,
  • P(ξj+1 = a|ξ1, . . . , ξj) = pξja

More general (Markov Model with k-dependency): distribution of ξj depends only on the previous k letters for some fixed k Even more general: Dynamical Sources Model by Vall´ ee

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 17 / 19

slide-107
SLIDE 107

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)?

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19

slide-108
SLIDE 108

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19

slide-109
SLIDE 109

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large Entropy in the Markov Source Model: H = π0 (−p00 log(p00) − p01 log(p01)) + π1 (−p10 log(p10) − p11 log(p11)) with stationary distribution (π0, π1) =

  • p10

p10 + p01 , p01 p10 + p01

  • Kevin Leckey

(Monash University) An Introduction to Tries 21.09.2015 18 / 19

slide-110
SLIDE 110

The Markov Source Model

Effect on Depth and related parameters: Are there very ’typical’ long prefixes for the source (e.g. because paa is very large)? → Depth/Height gets very large Entropy in the Markov Source Model: H = π0 (−p00 log(p00) − p01 log(p01)) + π1 (−p10 log(p10) − p11 log(p11)) with stationary distribution (π0, π1) =

  • p10

p10 + p01 , p01 p10 + p01

  • Depth for Markov Sources:

E[Dn] ∼ 1 H log(n)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 18 / 19

slide-111
SLIDE 111

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

slide-112
SLIDE 112

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

slide-113
SLIDE 113

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

slide-114
SLIDE 114

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength) Radix-Sort and -Select (Thesis L.)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19

slide-115
SLIDE 115

The Markov Source Model

Results for the Markov Source Model

Depth: Jacquet, Szpankowski ’89 Height: Szpankowski ’91 External Pathlength: L., Neininger, Szpankowski (SODA 2013) Dynamical Sources: Cl´ ement, Flajolet, Vall´ ee 2001 Some related problems: PATRICIA Tries and Digital Search Trees (Thesis L.→ Pathlength) Radix-Sort and -Select (Thesis L.) Lempel-Ziv Parsing Scheme (data compression)

Kevin Leckey (Monash University) An Introduction to Tries 21.09.2015 19 / 19