Faster Pattern Matching with Mismatches in Compressed Texts Karl - - PowerPoint PPT Presentation

faster pattern matching with mismatches in compressed
SMART_READER_LITE
LIVE PREVIEW

Faster Pattern Matching with Mismatches in Compressed Texts Karl - - PowerPoint PPT Presentation

Few Matches or Almost Periodicity: Faster Pattern Matching with Mismatches in Compressed Texts Karl Bringmann, Marvin Knnemann, and Philip Wellnitz Max Planck Institute for Informatics, Saarland Informatics Campus (SIC), Saarbrcken, Germany


slide-1
SLIDE 1

Few Matches or Almost Periodicity:

Faster Pattern Matching with Mismatches in Compressed Texts

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz

Max Planck Institute for Informatics, Saarland Informatics Campus (SIC), Saarbrücken, Germany

April 25, 2020

slide-2
SLIDE 2

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Pattern Matching with Mismatches Pattern Matching

Given a text t and a pattern p, is p a substring of t? Finding ANPAN, k = 2 t p Finding CAKE P A N C A K E C A K E

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-3
SLIDE 3

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Pattern Matching with Mismatches

Pattern Matching with Mismatches Given a text t, a pattern p, and an integer k, does t have a length-|p| substring with Hamming-distance at most k to p? t p Finding ANPAN, k = 2 P A N C A K E A N P A N

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-4
SLIDE 4

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Pattern Matching with Mismatches

Pattern Matching with Mismatches Given a text t, a pattern p, and an integer k, does t have a length-|p| substring with Hamming-distance at most k to p?

  • Thm. [Gawrychowski,Uznanski’18]

Pattern matching with k mismatches on a text of length n and a pattern of length m can be solved in time

O((m + k√m) · n/m).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-5
SLIDE 5

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Pattern Matching with Mismatches

Pattern Matching with Mismatches Given a text t, a pattern p, and an integer k, does t have a length-|p| substring with Hamming-distance at most k to p?

  • Thm. [Gawrychowski,Uznanski’18]

Pattern matching with k mismatches on a text of length n and a pattern of length m can be solved in time

O((m + k√m) · n/m). Matching (conditional) lower bound [GU’18]

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-6
SLIDE 6

Basic Definitions and General Overview New Structural Insights Faster Algorithm

What if the text is much larger than the pattern? ANPANISAJAPANESESWEETROLLMOSTCOMMONLYFILLEDWITHREDBEANPASTEANPANCANALSOBEPREPAREDWITHOTHERFILLINGSINCLUDINGWHITEBEANSGRE ANPAN

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-7
SLIDE 7

Basic Definitions and General Overview New Structural Insights Faster Algorithm

What if the text is much larger than the pattern?

ANPANISAJAPANESESWEETROLLMOSTCOMMONLYFILLEDWITHREDBEANPASTEANPANCANALSOBEPREPAREDWITHOTHERFILLINGSINCLUDINGWHITEBEANSGREENBEANSSESAMEANDCHESTNUT

ANPAN

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-8
SLIDE 8

Basic Definitions and General Overview New Structural Insights Faster Algorithm

What if the text is much larger than the pattern and given in a compressed representation?

ANPANISAJAPANESESWEETROLLMOSTCOMMONLYFILLEDWITHREDBEANPASTEANPANCANALSOBEPREPAREDWITHOTHERFILLINGSINCLUDINGWHITEBEANSGREENBEANSSESAMEANDCHESTNUT

ANPAN

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-9
SLIDE 9

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Grammar Compression Straight-Line Program (SLP)

A Straight-Line Program or SLP T is a context-free grammar that generates exactly one string eval(T ).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-10
SLIDE 10

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Grammar Compression Straight-Line Program (SLP)

An SLP T is a set of non-terminals {T1, . . . , Tn} and productions of the form Ti → σ or Ti → TℓTr, where ℓ, r < i. We write eval(T ) = eval(Tn) for the generated string.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-11
SLIDE 11

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Grammar Compression Straight-Line Program (SLP)

An SLP T is a set of non-terminals {T1, . . . , Tn} and productions of the form Ti → σ or Ti → TℓTr, where ℓ, r < i. We write eval(T ) = eval(Tn) for the generated string. T1 → A; T2 → N; T3 → P T3 P

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-12
SLIDE 12

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Grammar Compression Straight-Line Program (SLP)

An SLP T is a set of non-terminals {T1, . . . , Tn} and productions of the form Ti → σ or Ti → TℓTr, where ℓ, r < i. We write eval(T ) = eval(Tn) for the generated string. T1 → A; T2 → N; T3 → P T4 → T1T2; T5 → T4T3 A N P T1 T2 T3 T4 T5

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-13
SLIDE 13

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Grammar Compression Straight-Line Program (SLP)

An SLP T is a set of non-terminals {T1, . . . , Tn} and productions of the form Ti → σ or Ti → TℓTr, where ℓ, r < i. We write eval(T ) = eval(Tn) for the generated string. T1 → A; T2 → N; T3 → P T4 → T1T2; T5 → T4T3 T6 → T5T4 A N P T1 A T2 N T1 T2 T3 T4 T4 T5 T6

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-14
SLIDE 14

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Grammar Compression Straight-Line Program (SLP)

An SLP T is a set of non-terminals {T1, . . . , Tn} and productions of the form Ti → σ or Ti → TℓTr, where ℓ, r < i. We write eval(T ) = eval(Tn) for the generated string. T1 → A; T2 → N; T3 → P T4 → T1T2; T5 → T4T3 T6 → T5T4; T7 → T6T4 A N P T1 A T2 N T3 P T1 A T2 N T1 T2 T3 T4 T4 T4 T5 T5 T6 T7

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-15
SLIDE 15

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Known Results

Problem uncompressed LZW/LZ78 text SLP text n = Ω( √ N) n = Ω(log N) Pattern O(N + m) O(n + m) ※

O(n + m) ※ Matching [KMP’77] [G’12] [J’15] PM with k

O( N

m(m + k√m))

O(n√mk2)

O(nm poly(k)) Mismatches [GU’18] [GS’13] [T’14,BLRS’15] N: length of uncompressed text m: length of pattern n: length of compressed text ※: allows compressed pattern k: number of mismatches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-16
SLIDE 16

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Known Results

Problem uncompressed LZW/LZ78 text SLP text n = Ω( √ N) n = Ω(log N) Pattern O(N + m) O(n + m) ※

O(n + m) ※ Matching [KMP’77] [G’12] [J’15] PM with k

O( N

m(m + k√m))

O(n√mk2)

O(nm poly(k)) Mismatches [GU’18] [GS’13] [T’14,BLRS’15] N: length of uncompressed text m: length of pattern n: length of compressed text ※: allows compressed pattern k: number of mismatches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-17
SLIDE 17

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Known Results

Problem uncompressed LZW/LZ78 text SLP text n = Ω( √ N) n = Ω(log N) Pattern O(N + m) O(n + m) ※

O(n + m) ※ Matching [KMP’77] [G’12] [J’15] PM with k

O( N

m(m + k√m))

O(n√mk2)

O(nm poly(k)) Mismatches [GU’18]

O(nk4 + mk) N: length of uncompressed text m: length of pattern n: length of compressed text ※: allows compressed pattern k: number of mismatches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-18
SLIDE 18

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Known Results

Problem uncompressed LZW/LZ78 text SLP text n = Ω( √ N) n = Ω(log N) Pattern O(N + m) O(n + m) ※

O(n + m) ※ Matching [KMP’77] [G’12] [J’15] PM with k

O( N

m(m + k√m))

O(n√mk2)

O(nm poly(k)) Mismatches [GU’18]

O(nk4 + mk) N: length of uncompressed text m: length of pattern n: length of compressed text ※: allows compressed pattern k: number of mismatches Improvement obtained via new structural insight in solution structure

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-19
SLIDE 19

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching Fact (Folklore)

Let text t and pattern p, |t| ≤ 3

2|p|, be given such that there are ≥ 2 matches

  • f p in t that together match t completely. Then, both p and t are periodic

with some period x and every match of p in t starts at a position 1 + i · |x|.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-20
SLIDE 20

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching Fact (Folklore)

Let text t and pattern p, |t| ≤ 3

2|p|, be given such that there are ≥ 2 matches

  • f p in t that together match t completely. Then, both p and t are periodic

with some period x and every match of p in t starts at a position 1 + i · |x|. p t

p p

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-21
SLIDE 21

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching Fact (Folklore)

Let text t and pattern p, |t| ≤ 3

2|p|, be given such that there are ≥ 2 matches

  • f p in t that together match t completely. Then, both p and t are periodic

with some period x and every match of p in t starts at a position 1 + i · |x|. p t x x

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-22
SLIDE 22

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching Fact (Folklore)

Let text t and pattern p, |t| ≤ 3

2|p|, be given such that there are ≥ 2 matches

  • f p in t that together match t completely. Then, both p and t are periodic

with some period x and every match of p in t starts at a position 1 + i · |x|. p t x x x

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-23
SLIDE 23

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching Fact (Folklore)

Let text t and pattern p, |t| ≤ 3

2|p|, be given such that there are ≥ 2 matches

  • f p in t that together match t completely. Then, both p and t are periodic

with some period x and every match of p in t starts at a position 1 + i · |x|. p t x x x x x

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-24
SLIDE 24

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching Fact (Folklore)

Let text t and pattern p, |t| ≤ 3

2|p|, be given such that there are ≥ 2 matches

  • f p in t that together match t completely. Then, both p and t are periodic

with some period x and every match of p in t starts at a position 1 + i · |x|. p t x x x x x x x x x x x x x

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-25
SLIDE 25

Basic Definitions and General Overview New Structural Insights Faster Algorithm

What is the solution structure of Pattern Matching with Mismatches?

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-26
SLIDE 26

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least 2 k-matches of p in t, then p and t are periodic and every k-match of p starts at a position 1 + i|x|?

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-27
SLIDE 27

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least two k-matches of p in t, then p and t are periodic and every k-match of p starts at a position 1 + i|x|?

t p A A A A B B B B A A B B · · · · · · · · · · · · · · · · · · Am Bm Am/2 Bm/2

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-28
SLIDE 28

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least two k-matches of p in t, then p and t are periodic and every k-match of p starts at a position 1 + i|x|?

t p A A A A B B B B A A B B · · · · · · · · · · · · · · · · · · Am Bm Am/2 Bm/2 p and t not periodic, but 2k k-matches of p in t

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-29
SLIDE 29

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least two Ω(poly(k)) k-matches of p in t, then p and t are periodic and every k-match of p starts at a position 1 + i|x|?

t p A A A A B B B B A A B B · · · · · · · · · · · · · · · · · · Am Bm Am/2 Bm/2

Insight 1

Periodicity only if number of k-matches of p in t is Ω(poly(k))

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-30
SLIDE 30

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least Ω(poly(k)) k-matches of p in t, then p and t are periodic and every k-match

  • f p starts at a position 1 + i|x|?

t p A A A A A A A A A A A A A A A · · · · · · · · · A2m Am

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-31
SLIDE 31

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least Ω(poly(k)) k-matches of p in t, then p and t are periodic and every k-match

  • f p starts at a position 1 + i|x|?

t p A A A A A A A A A A A A A A A

B at k/2 random positions each

B B B B B · · · · · · · · · A2m Am

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-32
SLIDE 32

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least Ω(poly(k)) k-matches of p in t, then p and t are periodic and every k-match

  • f p starts at a position 1 + i|x|?

t p A A A A A A A A A A A A A A A

B at k/2 random positions each

B B B B B · · · · · · · · · A2m Am O(m) k-matches of p in t, but p and t not perfectly periodic

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-33
SLIDE 33

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least Ω(poly(k)) k-matches of p in t, then p and t are periodic periodic up to O(k) mismatches and every k-match of p starts at a position 1 + i|x|?

t p A A A A A A A A A A A A A A A

B at k/2 random positions each

B B B B B · · · · · · · · · A2m Am

Insight 2

Periodicity only up to O(k) mismatches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-34
SLIDE 34

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Solution Structure of Pattern Matching with Mismatches

If there are at least Ω(poly(k)) k-matches of p in t, then p and t are periodic up to O(k) mismatches and every k-match of p starts at a position 1 + i|x|?

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-35
SLIDE 35

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most O(k2), or

t′: shortest substring of t such that any k-match of p in t is also a k-match in t′

Both t′ and p have HD O(k) to the same periodic string x and all k-matches of p in t′ start at a position 1 + i · |x|.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-36
SLIDE 36

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most O(k2), or

t′: shortest substring of t such that any k-match of p in t is also a k-match in t′

Both t′ and p have HD O(k) to the same periodic string x and all k-matches of p in t′ start at a position 1 + i · |x|.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-37
SLIDE 37

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t x∗

i · · · · · · tj

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-38
SLIDE 38

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj

Consider t′: shortest substring of t that contains all k-matches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-39
SLIDE 39

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj p1 p2 pi p16k

· · · · · ·

Split p into 16k parts pi of equal length

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-40
SLIDE 40

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi

Fix a pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-41
SLIDE 41

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi

Consider prefix xi of pi that is also a period of pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-42
SLIDE 42

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Find first 3k mismatches between p and x∗

i before and after pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-43
SLIDE 43

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi ≤ 3k mism. ≤ 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Find first 3k mismatches between p and x∗

i before and after pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-44
SLIDE 44

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj xi xi < 6k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-45
SLIDE 45

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj xi xi < 6k mism. xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi < 2 · (6 + 1)k = 14k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-46
SLIDE 46

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi ≤ 3k mism. ≤ 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-47
SLIDE 47

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-48
SLIDE 48

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Insight

Any k-match of p in t′ must match at least one pi’s exactly.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-49
SLIDE 49

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Fix a pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-50
SLIDE 50

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Fix a pi; count k-matches where pi is matched exactly

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-51
SLIDE 51

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi xi xi xi xi xi xi xi

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Consider occurrences of xi in t′

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-52
SLIDE 52

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi xi xi xi xi xi xi xi

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Problem

Up to O(m) exact matches of xi in t′.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-53
SLIDE 53

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi xi xi xi

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Consider power stretches of xi in t′ of length ≥ |pi|

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-54
SLIDE 54

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi xi xi xi

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Consider power stretches of xi in t′ of length ≥ |pi| at most 150k different power stretches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-55
SLIDE 55

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi tj

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Fix a power stretch tj of xi in t′.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-56
SLIDE 56

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi tj ≥ 2k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Fix a power stretch tj of xi in t′.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-57
SLIDE 57

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi tj ≥ 2k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Insight

Must align at least one mismatch.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-58
SLIDE 58

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result, Proof Overview

Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most 1000k 2, or Both t′ and p have HD < 20k to a periodic x; all k-matches start at position 1 + i · |x|.

p1

p t′ x∗

i · · · · · · tj pi xi xi = 3k mism. xi xi xi xi tj ≥ 2k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi · · · · · ·

Insight

At most O(k 4) matches: O(k) parts in p, O(k) stretches, O(k 2) matches per combination.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-59
SLIDE 59

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, at least one of the following holds: The number of k-matches of p in t is at most O(k2), or

t′: shortest substring of t such that any k-match of p in t is also a k-match in t′

Both t′ and p have Hamming distance O(k) to the same periodic string x and all k-matches of p in t′ start at a position 1 + i · |x|.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-60
SLIDE 60

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm Theorem (Algorithm)

Pattern matching with k mismatches on a text t given by an SLP of size n and a pattern p of length m can be solved in time O(n k3 (k log k + log m) + k m).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-61
SLIDE 61

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm Theorem (Algorithm)

Pattern matching with k mismatches on a text t given by an SLP of size n and a pattern p of length m can be solved in time O(n k3 (k log k + log m) + k m). Pattern-Compressed String [GS’13] Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a

p-pattern-compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-62
SLIDE 62

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-63
SLIDE 63

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

PC-String, inst. J1 k, p, f1 with O(k) factors PC-String, inst. J2 k, p, f2 with O(k) factors PC-String, inst. Jn k, p, fn with O(k) factors . . . SLP Instance I SLP T = T1, . . . , Tn k, p of length m T3 T2 T1 B A

O(n k3 (k log k + log m) + k m)

O(k3(k log k + log m)) T(m, k) algorithm

O(n (log m + T(m, k)) + km)

algorithm

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-64
SLIDE 64

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

PC-String, inst. J1 k, p, f1 with O(k) factors PC-String, inst. J2 k, p, f2 with O(k) factors PC-String, inst. Jn k, p, fn with O(k) factors . . . SLP Instance I SLP T = T1, . . . , Tn k, p of length m T3 T2 T1 B A O(k3(k log k + log m)) algorithm

O(n k3 (k log k + log m) + k m)

algorithm

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-65
SLIDE 65

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

PC-String, inst. J1 k, p, f1 with O(k) factors PC-String, inst. J2 k, p, f2 with O(k) factors PC-String, inst. Jn k, p, fn with O(k) factors . . . SLP Instance I SLP T = T1, . . . , Tn k, p of length m T3 T2 T1 B A O(k3(k log k + log m)) algorithm

O(n k3 (k log k + log m) + k m)

algorithm

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-66
SLIDE 66

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Theorem (Algorithm for pc-strings) Pattern matching with k mismatches on a pattern p of length m and a p-pc-string f of size O(k) representing at most 2m characters, can be solved in time O(k3(k log k + log m)).

(With O(km) preprocessing on p.)

Implementation of structural insight Need e.g. tools for finding first O(k) mismatches to a periodic string

  • r finding all power stretches of a given string in a pc-string

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-67
SLIDE 67

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Theorem (Algorithm for pc-strings) Pattern matching with k mismatches on a pattern p of length m and a p-pc-string f of size O(k) representing at most 2m characters, can be solved in time O(k3(k log k + log m)).

(With O(km) preprocessing on p.)

Implementation of structural insight Need e.g. tools for finding first O(k) mismatches to a periodic string

  • r finding all power stretches of a given string in a pc-string

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-68
SLIDE 68

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm for Pattern-Compressed Strings

Theorem (Algorithm for pc-strings) Pattern matching with k mismatches on a pattern p of length m and a p-pc-string f of size O(k) representing at most 2m characters, can be solved in time O(k3(k log k + log m)).

(With O(km) preprocessing on p.)

Implementation of structural insight Need e.g. tools for finding first O(k) mismatches to a periodic string

  • r finding all power stretches of a given string in a pc-string

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-69
SLIDE 69

Basic Definitions and General Overview New Structural Insights Faster Algorithm

Faster Algorithm Theorem (Algorithm)

Pattern matching with k mismatches on a text t given by an SLP of size n and a pattern p of length m can be solved in time O(n k3 (k log k + log m) + k m).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-70
SLIDE 70

Open Problems

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-71
SLIDE 71

Open Problems

Improve insight to O(k) mismatches in the aperiodic case

Theorem (Structural Insight′) [KW’19+]

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k), or

t′: shortest substring of t such that any k-match of p in t is also a k-match in t′

Both t′ and p have Hamming distance O(k) to the same periodic string x and all k-matches of p in t′ start at a position 1 + i · |x|.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-72
SLIDE 72

Open Problems

Improve insight to O(k) mismatches in the aperiodic case Improve dependence on k in the algorithm

Theorem (Algorithm)

Pattern matching with k mismatches on a text t given by an SLP of size n and a pattern p of length m can be solved in time O(n k3 (k log k + log m) + k m).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-73
SLIDE 73

Open Problems

Improve insight to O(k) mismatches in the aperiodic case Improve dependence on k in the algorithm Fully-compressed setting (p also given as an SLP) Pattern Matching with Errors (Edit distance instead of Hamming distance)

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-74
SLIDE 74
slide-75
SLIDE 75

Solution Structure of Pattern Matching with Mismatches

t p A2m/3−1 A A A A A A A A A A A A A A A A2m Am · · · · · · · · · · · · · · · · · ·

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-76
SLIDE 76

Solution Structure of Pattern Matching with Mismatches

t p A2m/3−1 A A A A A A A A A A A A A A A

B at 2m/3, 4m/3 in t and the middle k + 1 positions in p

B B B B A2m/3−1 A2m/3−1 A2m/3 Bk+1 A(m−k−1)/2 A(m−k−1)/2 · · · · · · · · · · · · · · · · · ·

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-77
SLIDE 77

Solution Structure of Pattern Matching with Mismatches

t p A2m/3−1 A A A A A A A A A A A A A A A

B at 2m/3, 4m/3 in t and the middle k + 1 positions in p

B B B B A2m/3−1 A2m/3−1 A2m/3 Bk+1 A(m−k−1)/2 A(m−k−1)/2 · · · · · · · · · · · · · · · · · · All matches start at the union of two intervals.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-78
SLIDE 78

Solution Structure of Pattern Matching with Mismatches

t p A2m/3−1 A A A A A A A A A A A A A A A

B at 2m/3, 4m/3 in t and the middle k + 1 positions in p

B B B B A2m/3−1 A2m/3−1 A2m/3 Bk+1 A(m−k−1)/2 A(m−k−1)/2 · · · · · · · · · · · · · · · · · ·

Insight 3

Arithmetic progression only approximates all matches

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-79
SLIDE 79

Main Result Theorem (Structural Insight)

Given strings p of length m and t of length at most 2m, at least one of the following holds: The number of k-matches of p in t is at most O(k2). t′: shortest substring of t such that any k-match of p in t is also a k-match in t′ There is a substring x of p, with |x| = O(m/k), such that δH(p, x∗[1, m]) ≤ O(k) and δH(t′, x∗[1, |t′|]) ≤ O(k). Moreover, any k-match of p in t′ starts at a position of the form 1 + i · |x| with 0 ≤ i ≤ (|t′| − |p|)/|x| (but not every starting position 1 + i · |x| necessarily yields a k-match).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-80
SLIDE 80

Main Result Theorem (Structural Insight)

Given strings p of length m and t of length at most 2m, at least one of the following holds: The number of k-matches of p in t is at most O(k2). t′: shortest substring of t such that any k-match of p in t is also a k-match in t′ There is a substring x of p, with |x| = O(m/k), such that δH(p, x∗[1, m]) ≤ O(k) and δH(t′, x∗[1, |t′|]) ≤ O(k). Moreover, any k-match of p in t′ starts at a position of the form 1 + i · |x| with 0 ≤ i ≤ (|t′| − |p|)/|x| (but not every starting position 1 + i · |x| necessarily yields a k-match).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-81
SLIDE 81

Main Result Theorem (Structural Insight)

Given strings p of length m and t of length at most 2m, at least one of the following holds: The number of k-matches of p in t is at most O(k2). t′: shortest substring of t such that any k-match of p in t is also a k-match in t′ There is a substring x of p, with |x| = O(m/k), such that δH(p, x∗[1, m]) ≤ O(k) and δH(t′, x∗[1, |t′|]) ≤ O(k). Moreover, any k-match of p in t′ starts at a position of the form 1 + i · |x| with 0 ≤ i ≤ (|t′| − |p|)/|x| (but not every starting position 1 + i · |x| necessarily yields a k-match).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-82
SLIDE 82

Main Result Theorem (Structural Insight)

Given strings p of length m and t of length at most 2m, at least one of the following holds: The number of k-matches of p in t is at most O(k2). t′: shortest substring of t such that any k-match of p in t is also a k-match in t′ There is a substring x of p, with |x| = O(m/k), such that δH(p, x∗[1, m]) ≤ O(k) and δH(t′, x∗[1, |t′|]) ≤ O(k). Moreover, any k-match of p in t′ starts at a position of the form 1 + i · |x| with 0 ≤ i ≤ (|t′| − |p|)/|x| (but not every starting position 1 + i · |x| necessarily yields a k-match).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-83
SLIDE 83

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k2), and Both t and p have HD O(k) to the same periodic string.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-84
SLIDE 84

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k2), and Both t and p have HD O(k) to the same periodic string.

t P A N C A K E P A N Finding ANPAN, k = 2 non-periodic case t P U N R A N P A M P A N Finding PANPAN, k = 2 periodic case

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-85
SLIDE 85

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k2), and Both t and p have HD O(k) to the same periodic string.

t P A N C A K E P A N p A N P A N A N P A N Finding ANPAN, k = 2 non-periodic case t P U N R A N P A M P A N Finding PANPAN, k = 2 periodic case

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-86
SLIDE 86

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k2), and Both t and p have HD O(k) to the same periodic string.

t P A N C A K E P A N p A N P A N A N P A N Finding ANPAN, k = 2 non-periodic case t P U N R A N P A M P A N Finding PANPAN, k = 2 periodic case

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-87
SLIDE 87

Main Result Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k2), and Both t and p have HD O(k) to the same periodic string.

t P A N C A K E P A N p A N P A N A N P A N Finding ANPAN, k = 2 non-periodic case t P U N R A N P A M P A N p P A N P A N P A N P A N P A N P A N Finding PANPAN, k = 2 periodic case

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-88
SLIDE 88

Main Result, Proof Overview Theorem (Structural Insight)

For pattern p and text t, |t| ≤ 2|p|, it holds at least one of: The number of k-matches of p in t is at most O(k2), and Both t and p have HD O(k) to the same periodic string.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-89
SLIDE 89

Main Result, Proof Overview Theorem (Structural Insight)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, then both t and p have a HD < 20k to the same periodic string.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-90
SLIDE 90

Main Result, Proof Overview Theorem (Structural Insight)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, then both t and p have a HD < 20k to the same periodic string.

Main Steps: At least 1000k2 k-matches of p in t and p has a HD < 6k to a specific periodic string x ∈ x(p) = ⇒ t has a Hamming Distance < 20k to x p has HD ≥ 6k to any specific periodic string x ∈ x(p) = ⇒ Less than 1000k2 k-matches of p in t

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-91
SLIDE 91

Main Result, Proof Overview Theorem (Structural Insight)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, then both t and p have a HD < 20k to the same periodic string.

Main Steps: At least 1000k2 k-matches of p in t and p has a HD < 6k to a specific periodic string x ∈ x(p) = ⇒ t has a Hamming Distance < 20k to x p has HD ≥ 6k to any specific periodic string x ∈ x(p) = ⇒ Less than 1000k2 k-matches of p in t

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-92
SLIDE 92

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-93
SLIDE 93

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

t p

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-94
SLIDE 94

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

p1

p Split p into 16k parts pi of equal length

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-95
SLIDE 95

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

p1

p

p1 p2 pi p16k

· · · · · · Split p into 16k parts pi of equal length

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-96
SLIDE 96

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

p1

p

pi

Fix a pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-97
SLIDE 97

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

p1

p

pi xi xi

Consider prefix xi of pi that is also a period of pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-98
SLIDE 98

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

p1

p

pi xi xi

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi

Find first 3k mismatches between p and x∗

i before and after pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-99
SLIDE 99

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to a periodic string x ∈ x(p), then t has HD < 20k to x.

p1

p

pi xi xi ≤ 3k mism. ≤ 3k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi

Find first 3k mismatches between p and x∗

i before and after pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-100
SLIDE 100

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then t has HD < 20k to x∗ i .

p1

p

pi xi xi < 6k mism.

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-101
SLIDE 101

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then t has HD < 20k to x∗ i .

Claim (Proof omitted)

If there are at least 2 + 16k k-matches of p in t, all starting positions of k-matches differ by (integer) multiples of |xi|.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-102
SLIDE 102

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-103
SLIDE 103

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

x∗

i

t p

pi pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-104
SLIDE 104

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

x∗

i

t x∗

i

p

pi pi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-105
SLIDE 105

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

x∗

i

t x∗

i

p

pi pi < 6k mism. to x∗

i

< 6k mism. to x∗

i

xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-106
SLIDE 106

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

x∗

i

t x∗

i

p

pi pi < 6k mism. to x∗

i

< 6k mism. to x∗

i

< 7k mism. to x∗

i

< 7k mism. to x∗

i

xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-107
SLIDE 107

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-108
SLIDE 108

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-109
SLIDE 109

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

p1 p2 pi p16k

· · · · · · Recall: Split p into 16k parts pi of equal length

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-110
SLIDE 110

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

p1 p2 pi p16k

· · · · · ·

Insight

Any k-match of p in t must match at least 15k pi’s exactly.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-111
SLIDE 111

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi

Fix a pi

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-112
SLIDE 112

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi

Fix a pi; count k-matches where pi is matched exactly

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-113
SLIDE 113

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi xi xi xi xi xi xi xi

Search for xi in t

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-114
SLIDE 114

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi xi xi xi xi xi xi xi

Problem

Up to O(m) exact matches of xi in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-115
SLIDE 115

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi xi xi xi

Search for power stretches of xi in t of length ≥ |pi|

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-116
SLIDE 116

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi xi xi xi

Insight

Only ≤ 150k different power stretches of xi in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-117
SLIDE 117

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi xi xi xi

Fix a power stretch tj of xi in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-118
SLIDE 118

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi tj

Fix a power stretch tj of xi in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-119
SLIDE 119

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi tj

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-120
SLIDE 120

Main Result, Proof Overview Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

p1

p

pi xi xi

t

xi xi xi xi tj

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi ≥ 3k mism.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-121
SLIDE 121

Main Result, Proof Overview

p1

p

pi xi xi

t

xi xi xi xi tj

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi ≥ 3k mism. ≥ 2k mism.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-122
SLIDE 122

Main Result, Proof Overview

p1

p

pi xi xi

t

xi xi xi xi tj

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi ≥ 3k mism. ≥ 2k mism.

Insight

Must align at least k mismatches.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-123
SLIDE 123

Main Result, Proof Overview

p1

p

pi xi xi

t

xi xi xi xi tj

x∗

i xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi xi ≥ 3k mism. ≥ 2k mism.

Insight

At most O(k4) matches: O(k) parts in p, O(k) streches, O(k2) matches per combination.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-124
SLIDE 124

Main Result, Proof Overview Lemma (Step 1)

Fix a pattern p of length m and a text t of length at most 2m. If the number of k-matches of p in t is at least 1000k2, and p has HD < 6k to some x∗

i , 1 ≤ i ≤ 16k, then all starting positions of k-matches differ by multiples of |xi|

and t has HD < 20k to x∗

i .

Lemma (Step 2)

Fix a pattern p of length m and a text t of length at most 2m. If the pattern p has a HD ≥ 6k to all strings x∗

i , 1 ≤ i ≤ 16k, then there are less

than 1000k2 k-matches of p in t.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-125
SLIDE 125

Main Result, Proof Overview Theorem (Structural Insight)

Given strings p of length m and t of length at most 2m, at least one of the following holds: The number of k-matches of p in t is at most O(k2). t′: shortest substring of t such that any k-match of p in t is also a k-match in t′ There is a substring x of p, with |x| = O(m/k), such that δH(p, x∗[1, m]) ≤ O(k) and δH(t′, x∗[1, |t′|]) ≤ O(k). Moreover, any k-match of p in t′ starts at a position of the form 1 + i · |x| with 0 ≤ i ≤ (|t′| − |p|)/|x| (but not every starting position 1 + i · |x| necessarily yields a k-match).

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-126
SLIDE 126

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-127
SLIDE 127

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

PC-String, inst. J1 k, p, f1 with O(k) factors PC-String, inst. J2 k, p, f2 with O(k) factors PC-String, inst. Jn k, p, fn with O(k) factors . . . SLP Instance I SLP T = T1, . . . , Tn k, p of length m T3 T2 T1 B A

O(n k3 (k log k + log m) + k m)

O(k3(k log k + log m)) T(m, k) algorithm

O(n (log m + T(m, k)) + km)

algorithm

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-128
SLIDE 128

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

PC-String, inst. J1 k, p, f1 with O(k) factors PC-String, inst. J2 k, p, f2 with O(k) factors PC-String, inst. Jn k, p, fn with O(k) factors . . . SLP Instance I SLP T = T1, . . . , Tn k, p of length m T3 T2 T1 B A O(k3(k log k + log m)) algorithm

O(n k3 (k log k + log m) + k m)

algorithm

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-129
SLIDE 129

Faster Algorithm for Pattern-Compressed Strings

Pattern-Compressed String [GS’13]

Let p be a string of length m. We call a string f = v1 . . . vq, q

i=1 |vi| ≤ 2m a p-pattern-

compressed string (pc-string) if every vi is a substring of p. We call the vi’s factors of f.

PC-String, inst. J1 k, p, f1 with O(k) factors PC-String, inst. J2 k, p, f2 with O(k) factors PC-String, inst. Jn k, p, fn with O(k) factors . . . SLP Instance I SLP T = T1, . . . , Tn k, p of length m T3 T2 T1 B A O(k3(k log k + log m)) algorithm

O(n k3 (k log k + log m) + k m)

algorithm

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-130
SLIDE 130

Faster Algorithm for Pattern-Compressed Strings

Theorem (Algorithm for pc-strings) Pattern matching with k mismatches on a pattern p of length m and a p-pc-string f of size O(k) representing at most 2m characters, can be solved in time O(k3(k log k + log m)).

(With O(km) preprocessing on p.)

Implementation of structural insight Need e.g. tools for finding first O(k) mismatches to a periodic string

  • r finding all power stretches of a given string in a pc-string

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-131
SLIDE 131

Faster Algorithm for Pattern-Compressed Strings

Theorem (Algorithm for pc-strings) Pattern matching with k mismatches on a pattern p of length m and a p-pc-string f of size O(k) representing at most 2m characters, can be solved in time O(k3(k log k + log m)).

(With O(km) preprocessing on p.)

Implementation of structural insight Need e.g. tools for finding first O(k) mismatches to a periodic string

  • r finding all power stretches of a given string in a pc-string

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-132
SLIDE 132

Faster Algorithm for Pattern-Compressed Strings

Theorem (Algorithm for pc-strings) Pattern matching with k mismatches on a pattern p of length m and a p-pc-string f of size O(k) representing at most 2m characters, can be solved in time O(k3(k log k + log m)).

(With O(km) preprocessing on p.)

Implementation of structural insight Need e.g. tools for finding first O(k) mismatches to a periodic string

  • r finding all power stretches of a given string in a pc-string

Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts

slide-133
SLIDE 133
slide-134
SLIDE 134

Navigation

Start Definition SLP Known Results Insight Ex. 1 Insight Ex. 2 Insight Ex. 3 Insight Theorem Insight Step 1 Overview Insight Step 1 Details Insight Step 2 Overview Algorithm General Idea Algorithm Overview Open Problems End Karl Bringmann, Marvin Künnemann, and Philip Wellnitz Faster Pattern Matching with Mismatches in Compressed Texts