Joint Advanced Student School 2004
Complexity Analysis of String Algorithms
Sequential Pattern Matching:
Analysis of Knuth-Morris-Pratt type algorithms
using the Subadditive Ergodic Theorem
14 July 2017
Tobias Reichl JASS04 - Sequential Pattern Matching
1
Overview
1. Pattern Matching
•
•
Sequential Algorithms
Knuth-Morris-Pratt-Algorithm
2. Probabilistic tools
•
•
Subadditive Ergodic Theorem
Martingales and Azuma's Inequality
3. Analysis of KMP-Algorithms
•
•
•
Properties of KMP
Establishing subadditivity
Analysis
Tobias Reichl JASS04 - Sequential Pattern Matching
2
Pattern Matching
Pattern-text comparison: M(l,k)=1
Pattern p
abcde
Text t xxxxxabxxxabcxxxabcde
Alignment position AP
n
1 ,
m
1
• Text t pattern p
1
t
l
is
compared
to
p
k
• Comparison: M (l , k )
otherwise
0
• Alignment Position:
M ( AP (k 1), k ) 1
for some k.
Tobias Reichl JASS04 - Sequential Pattern Matching
3
Sequential Algorithms - Definition
i.
Semi-sequential: AP are non-decreasing.
ii.
Strongly semi-sequential: (i) and comparisons
M li , ki define non-decreasing text positions li .
iii.
Sequential: (i) and
M l , k 1 t
l 1
l ( k 1)
k 1
1
p
abcde
Text is compared only if following a
prefix of the pattern. Example: xxxxxabxxxabcxxxabcde
iv.
Strongly sequential: (i), (ii) and (iii)
Tobias Reichl JASS04 - Sequential Pattern Matching
4
Example: Naive / brute force algorithm
+1
+1
+1
abcde
abcde
abcde
xxxxxabxxxabcxxxabcde
• Every text position is alignment position.
• Text is scanned until...
– pattern is found - then done.
– mismatch occurs - then shift by one and retry.
• Sequential algorithm.
Tobias Reichl JASS04 - Sequential Pattern Matching
5
Knuth-Morris-Pratt type algorithms (1)
+S
ababcde
ababcde
xxxxxabxxxabcxxxabcde
• Idea: (Morris-Pratt) Disreagard APs already
known not to be followed by a prefix of p.
• Knowledge:
– Already processed pattern
– Pre-processing of p.
• Strongly sequential algorithm.
Tobias Reichl JASS04 - Sequential Pattern Matching
6
Knuth-Morris-Pratt type algorithms (2)
• Morris-Pratt:
ababcde
ababcde
xxxxxabxxxabcxxxabcde
S min{ k ; min{ s 0 : psk11 p1k ( s 1) } }
• Knuth-Morris-Pratt:
ababcde (KMP also skips
mismatching letters)
ababcde
xxxxxabxxxabcxxxabcde
S min{ k ; min{ s : psk11 p1k ( s 1) and pkk pkkss } }
Tobias Reichl JASS04 - Sequential Pattern Matching
7
Pattern Matching - Complexity
cr ,s t , p
M l, k
l[ r , s ]
k[1, m ]
• Overall complexity: c1, n : cn
• Pattern or text is a realization of random
sequence: Cn
• Question: complexity of KMP?
Tobias Reichl JASS04 - Sequential Pattern Matching
8
Subadditivity – Deterministic Sequence
Fekete (1923)
• Subadditivity: xm n xm xn
xn
xm
lim
inf
n n
m 1 m
• Superadditivity: xm n xm xn
xn
xm
lim
sup
n n
m 1 m
Tobias Reichl JASS04 - Sequential Pattern Matching
9
Example: Longest Common Subsequence
ababcafbcdabcde
abcdeabcdfabcab
LCS: "abcabcdabc" (10)
L1,n max{ K : X ik Y jk
ababcafb
abcdeabc
cdabcde
dfabcab
LCS: "abcab" (5), "dabc" (4)
for 1 k K
where
1 i1 i2 ik n,
and
1 j1 j2 jk n }
• Superadditive: L1,n L1,m Lm,n
• Hence:
an
E Lm
lim
sup
n n
m
m 1
?
0.8284
Tobias Reichl JASS04 - Sequential Pattern Matching
(Conjectured by
Steele in 1982)
10
Subadditivity – "Almost subadditive"
DeBruijn and Erdös (1952)
• cn positive and non-decreasing sequence
ck
2
k 1 k
• "Almost subadditive":
xm n xm xn cm n
xn
xm
lim
inf
n n
m 1 m
Tobias Reichl JASS04 - Sequential Pattern Matching
11
Subadditive Ergodic Theorem
Kingman (1976), Liggett (1985)
i.
X 0,n X 0,m X m ,n
ii. k :
iii.
X
X
m,m k
nk ,( n 1) k
, n 1 is a stationary sequence
, k 1 does not depend on m
iv. E[ X 0,1 ] and
lim
n
E[ X 0,n ]
n
E[ X 0,n ] c0 n where
inf
X 0 ,n
lim
n n
E[ X 0,m ]
m1
m
c0
: EX
(a.s.)
Tobias Reichl JASS04 - Sequential Pattern Matching
12
Almost Subadditive Ergodic Theorem
Deriennic (1983)
• Subadditivity can be relaxed to
X 0,n X 0,m X m,n An
with
lim E An n 0
n
• Then, too: lim
n
X 0, n
n
(a.s.)
Tobias Reichl JASS04 - Sequential Pattern Matching
13
Martingales
• A sequence Yn f X 1 ,, X n n 0
is a martingale with respect to the filtration
Fn ( X 0 ,, X n ) if for all n 0 :
E Yn
EYn1 | X 0 , X 1 ,, X n EYn1 | Fn Yn
• EYn1 | Fn defines a random variable depending
on the knowledge contained in X 1 ,, X n .
Tobias Reichl JASS04 - Sequential Pattern Matching
14
Martingale Differences
• The martingale difference is defined as
Dn Yn Yn1
n
so that:
Yn Y0 Di
i 1
• Observe:
E[ Dn 1 | Fn ] E[Yn 1 | Fn ] E[Yn | Fn ]
Yn Yn 0
Tobias Reichl JASS04 - Sequential Pattern Matching
15
Azuma's Inequality (1)
• Let Yn f n ( X 1 ,, X n ) be a martingale
• Define the martingale difference as
Di EYn | Fi EYn | Fi 1
(The mean of the same element but depending on
different knowledge)
• Observe:
EYn | Fn Yn
n
D
i 1
i
and
EYn | F0 EYn
E Yn | Fn E Yn | F0 Yn E Yn
(Deviation from the mean)
Tobias Reichl JASS04 - Sequential Pattern Matching
16
Hoeffding's Inequality
• Let Yn be a martingale
• Let there exist constant cn
n 0
Yn Yn1 Dn cn
• Then:
Pr Yn Yo x
Pr
Di x
i 1
n
2
x
2 exp
2 n c2
i 1 i
Tobias Reichl JASS04 - Sequential Pattern Matching
17
Azuma's Inequality (2)
• Summary:
– If Di is bounded, we know how to assess the
deviation from the mean.
– So now we need a bound on Di .
• Trick: Let X̂ i be an independent copy of X i.
• Then: E f n X 1 ,, X i ,, X n | Fi 1
E f n X 1 ,, Xˆ i ,, X n | Fi
Tobias Reichl JASS04 - Sequential Pattern Matching
18
Azuma's Inequality (3)
• Hence:
Di
E f n X 1 , , X i , , X n | Fi E f n X 1 , , X i ,, X n | Fi 1
E f n X 1 , , X i , , X n | Fi E f n X 1 ,, Xˆ i , , X n | Fi
• And we can postulate: Di ci
Tobias Reichl JASS04 - Sequential Pattern Matching
19
Azuma's Inequality (4)
• Let Yn f n X 1 ,, X n be a martingale
• If there exists constant ci such that
f n X 1 ,, X i ,, X n f n X 1 ,, Xˆ i ,, X n ci
where X̂ i is an independent copy of X i
• Then: Pr Yn EYn x
Pr f n X 1 , , X i , , X n E f n X 1 , , Xˆ i ,, X n
x
2
x
2 exp
2
n
2 c
i 1 i
Tobias Reichl JASS04 - Sequential Pattern Matching
20
KMP: Unavoidable alignment positions
• A position in the text is called unavoidable AP
if for any r,l r i and l i m it's an AP
l
when run on t r .
• KMP-like algorithms have the same set of
unavoidable alignment positions
U l 1 U l
n
U l min{ min { t p }, l 1 }
1k l
• Example:
l
k
where
abcde
xxxxxabxxxabcxxxabcde
Ul
l
Tobias Reichl JASS04 - Sequential Pattern Matching
21
Pattern Matching: l-convergence
• An algorithm is l-convergent if there exists an
increasing sequence of unavoidable alignment
positions Ui in1 satisfying
U i 1 U i l
• l-convergence indicates the maximum size
"jumps" for an algorithm.
Tobias Reichl JASS04 - Sequential Pattern Matching
22
KMP: Establishing m-convergence
•
•
•
•
Let AP be an alignment position
Define: l AP m
p m l m 1 Ul l
Hence: U l AP m and so KMP-like
algorithms are m-convergent.
Tobias Reichl JASS04 - Sequential Pattern Matching
23
KMP: Establishing subadditivity (1)
• If cn (number of comparisons) is subadditive
we can prove linear complexity of KMP-like
algorithms.
• We have to show:
cn is (almost) subadditive:
c1,n c1,r cr ,n a
• Approach:
An l-convergent sequential algorithm satisfies:
c1,n c1,r cr ,n m 2 lm
Tobias Reichl JASS04 - Sequential Pattern Matching
24
KMP: Establishing subadditivity (2)
• Proof:
– U r : the smallest unavoidable AP greater than r.
– We split c1,n c1,r cr ,n into
c1,n c1,r cU r ,n and cr , n cU , n .
r
c1, n
c1,r cU r ,n
cr ,n cU r ,n
r
Ur
Tobias Reichl JASS04 - Sequential Pattern Matching
25
KMP: Establishing subadditivity (3)
Contributing
to c1, n only
Contributing
to c1, n and c1, r
?
?
?
?
?
?
r
S1
S2
Contributing
to c1, n and cU
r
,n
Ur
• Comparisons done after r with AP before r:
S1
2
M
i
,
i
AP
1
m
AP r i r
• Comparisons with AP between r and U r :
S2
•
M AP (i 1), i lm
r APU r i m
No more than m comparisons can be saved at U r
Tobias Reichl JASS04 - Sequential Pattern Matching
26
KMP: Establishing subadditivity (4)
Contributing
to cr , n only
?
?
?
?
r
S3
Contributing
to cr , n and cU
r
,n
Ur
• Comparisons with AP between r and U r:
S3
•
U r 1
M AP (i 1), i lm
AP r
i
No more than m comparisons can be saved at U r
Tobias Reichl JASS04 - Sequential Pattern Matching
27
KMP: Establishing subadditivity (5)
• So we are able to bound:
c1,n c1,r cr ,n S1 S 2 S3 m 2 lm
• We have shown:
cn is (almost) subadditive:
c1,n c1,r cr ,n a
• Now we are able to apply the Subadditive
Ergodic Theorem.
Tobias Reichl JASS04 - Sequential Pattern Matching
28
KMP: Different Modeling Assumptions
• Deterministic Model:
Text and pattern are non random.
• Semi-Random Model:
Text is a realization of a stationary and ergodic
sequence, pattern is given.
• Stationary model:
Both text and pattern are realizations of a
stationary and ergodic sequence.
Tobias Reichl JASS04 - Sequential Pattern Matching
29
KMP: Applying the Subadditive Ergodic Theorem
• We have shown: cn is (almost) subadditive
• Deterministic Model:
max t cn t , p
lim
1 ( p )
n
n
• Semi-Random Model:
Cn ( p )
lim
2 ( p ) (a.s.)
n
n
Et Cn ( p)
lim
2 ( p)
n
n
• Stationary Model: lim Et , p Cn
3
n
n
Tobias Reichl JASS04 - Sequential Pattern Matching
30
KMP: Applying Azuma's Inequality
• C n satisfies:
Cn T1 ,, Ti ,, Tn Cn T1 ,, Tˆi ,, Tn 2m 2
where Tˆ is an independent copy of T .
i
i
• So, using Azuma's Inequality:
2
n
1 o1
Pr Cn n n 2 exp
2
2 n 2m
• C n is concentrated around its mean:
ECn n1 o1
Tobias Reichl JASS04 - Sequential Pattern Matching
31
Conclusion
• Using the Subadditive Ergodic Theorem we can
show there exists a linearity constant for the
worst and average case resp.
KMP has linear complexity.
• The Subadditive Ergodic Theorem proves the
existence of this constant but says nothing how
to compute it.
• Using Azuma's Inequality we can show that the
number of comparisons is well concentrated
around its mean.
Tobias Reichl JASS04 - Sequential Pattern Matching
32
© Copyright 2025 Paperzz