Exact Pattern Matching p t Goal: Find all occurrences of a pattern - - PowerPoint PPT Presentation

exact pattern matching
SMART_READER_LITE
LIVE PREVIEW

Exact Pattern Matching p t Goal: Find all occurrences of a pattern - - PowerPoint PPT Presentation

Exact Pattern Matching p t Goal: Find all occurrences of a pattern in a text Input: Pattern p = p 1 p n and text t = t 1 t m Output: All positions 1< i < ( m n + 1) such that the n - letter substring of t starting at i matches p


slide-1
SLIDE 1

Goal: Find all occurrences of a pattern in a text Input: Pattern p = p1…pn and text t = t1…tm Output: All positions 1< i < (m – n + 1) such that the n- letter substring of t starting at i matches p Motivation: Searching database for a known pattern

Exact Pattern Matching

t p

slide-2
SLIDE 2
  • Naïve runtime: O(nm)
  • How?
  • On average, it should be close to O(m)
  • Why?
  • Can solve problem in O(m) time ?
  • Yes, we’ll see how (in a later lecture)

Pattern Matching: Running Time

slide-3
SLIDE 3

Naive algorithm is inefficient

slide-4
SLIDE 4

Goal: Given a set of patterns and a text, find all occurrences

  • f any of patterns in text

Input: k patterns p1,…,pk, and text t = t1…tm Output: Positions 1 < i < m where substring of t starting at i matches pj for 1 < j < k Motivation: Searching database for known multiple patterns

t p1 p2

Multiple Pattern Matching

slide-5
SLIDE 5
  • Solution: k “pattern matching problems”: O(kmn)
  • Another Solution:
  • Using “Keyword trees” => O(kn+nm) where n is

maximum length of pi

  • Preprocess all k patterns to construct a “keyword

tree”

  • Now, any given text, all occurrences of all patterns

can be found in time O(m)

Multiple Pattern Matching

slide-6
SLIDE 6

Keyword tree approach

slide-7
SLIDE 7

Keyword tree approach

slide-8
SLIDE 8

Keyword tree approach

slide-9
SLIDE 9

Keyword tree approach

slide-10
SLIDE 10

Keyword tree approach

slide-11
SLIDE 11

Keyword tree approach: Properties

slide-12
SLIDE 12

Keyword tree: Construction

slide-13
SLIDE 13
slide-14
SLIDE 14

Keyword tree: Lookup of a string

How to check all occurrences in a text t?

slide-15
SLIDE 15
  • Build keyword tree in O(kn) time; kn is total length of

all patterns

  • Start “threading” at each position in text; at most n

steps tell us if there is a match here to any pi

  • O(kn + nm)
  • We’re down from O(kmn) to this
  • The next big idea, Aho-Corasick algorithm: O(kn + m)

Keyword tree approach: Complexity

slide-16
SLIDE 16

Aho-Corasick algorithm: Key idea

HERSHE

Exploit the redundancy in the patterns

HERS SHE HE

slide-17
SLIDE 17

Aho-Corasick algorithm: Key idea

HERSHE

Exploit the redundancy in the patterns

HERS SHE HE

slide-18
SLIDE 18

Aho-Corasick algorithm

With failing edges and node labels

slide-19
SLIDE 19
  • Transition among the different nodes by following edges

depending on next character seen (say “h”)

  • If outgoing edge with label “h”, follow it
  • If no such edge, and are at root, stay
  • If no such edge, and at non-root, follow dashes edge (“fail”

transition); DO NOT CONSUME THE CHARACTER (say “h”)

Rules

Consider text “hershe”