1
A Deterministic Algorithm for Summarizing Asynchronous Streams
- ver a Sliding Window
Costas Busch
Rensselaer Polytechnic Institute
Srikanta Tirthapura
Iowa State University
over a Sliding Window Costas Busch Rensselaer Polytechnic Institute - - PowerPoint PPT Presentation
A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University 1 Outline of Talk Introduction Algorithm Analysis 2 Time 1 C
1
Costas Busch
Rensselaer Polytechnic Institute
Srikanta Tirthapura
Iowa State University
2
Introduction Algorithm Analysis Outline of Talk
3
1 C
Time
1
t
Data stream: For simplicity assume unit valued elements
2
t
3
t
4
t
5
t
1
v
2
v
3
v
4
v
5
v
4
1 C
Current time Most recent time window of duration W Compute the sum of elements with time stamps in time window
] , [ C W C
1
t
Data stream:
2
t
3
t
4
t
5
t
1
v
2
v
3
v
4
v
5
v
C t W C i
i
5
Example I: All packets on a network link, maintain the number of different ip sources in the last
Example II: Large database, continuously maintain averages and frequency moments
6
Synchronous stream ti: In ascending order Asynchronous stream ti: No order guaranteed
1
t
Data stream:
2
t
3
t
4
t
5
t
1
v
2
v
3
v
4
v
5
v
7
Why Asynchronous Data Streams?
Network
Synchronous stream Asynchronous stream Synchronous Synchronous Asynchronous Merge w/o control Network delay & multi-path routing
8
Processing Requirements:
the size of data
9
Our results: A deterministic data aggregation algorithm Time:
W B O log log
Space: B W W B O log log log log
Relative Error:
10
Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002] Deterministic, Synchronous [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Merging buckets Random sampling
11
Introduction Algorithm Analysis Outline of Talk
12
1 C
Current time Time
1
t
2
t
Data stream: For simplicity assume unit valued elements
3
t
4
t
5
t
6
t
13
1 C
Current time Most recent time window of duration W
1
t
2
t
Data stream:
3
t
4
t
5
t
6
t
Compute the sum of elements with time stamps in time window
] , [ C W C
14
1
Divide time into periods of duration W
W W 2 W 3
W 4
15
1
W W 2 W 3
W 4
The sliding window may span at most two time periods
C
sliding window
T
16
1
W W 2 W 3
W 4
C
sliding window
left
S
right
S
2 1
Sum can be written as two sub-sums In two time periods
T
17
1
W W 2 W 3
W 4
C
sliding window
left
right
Data structure that maintains an estimate of In left time period
left
T
left
S
right
S
18
1
W
T
Without loss of Generality, Consider data structure in time period
left
S
left
left
19
left
1
2
L
Data structure consists of various levels
L
2
is an upper bound of the sum in a period
20
1
W
Counts up to elements
1
i
Time period Bucket at Level
Consider level
i
21
1
W
1
Increase counter value
W t
1
1
Stream:
1
t
22
1
W
2
Increase counter value
W t
2
1
Stream:
1
t
2
t
23
1
W
W t
3
1
Stream: Increase counter value
1
t
2
t
3
t
24
1
W
1 2
1 i
Increase counter value
W t i
1 2
1
1
Stream:
1
t
2
t
3
t
1 2
1 i
t
25
W t i
1 2
1
1
Stream:
1
t
2
t
3
t
1 2
1 i
t
1
2
i
t
1
W
1
2 W 1 2 W
W
1
2
i i
2
i
2
Counter threshold of reached
1
i
Split bucket
26
W t i
1 2
1
1
Stream:
1
t
2
t
3
t
1 2
1 i
t
1
2
i
t
1
2 W 1 2 W
W
i
2
i
2
New buckets have threshold also
1
2
i
27
2 1
1 2
1
W t i
Stream:
1
t
2
t
3
t
1 2
1 i
t
1
2
i
t
1 2
1 i
t
1
2 W 1 2 W
W
1 2
i i
2
Increase appropriate bucket
28
W t W
i
2 2
1
2
Stream:
1
t
2
t
3
t
1 2
1 i
t
1
2
i
t
1 2
1 i
t
1
2 W 1 2 W
W
1 2
i
1 2
i
Increase appropriate bucket
2 2
1 i
t
29
Stream:
1
t
2
t
3
t
1 2
1 i
t
1
2
i
t
1 2
1 i
t
1
2 W 1 2 W
W
2 2
i
1 2
i
Increase appropriate bucket
2 2
1 i
t
3 2
1 i
t
2 1
3 2
1
W t i
30
1
2 W 1 2 W
W
1 2 W 4 3 W 1 4 3 W
W
1
x
1
2
i i
2
i
2
1
t
t
Stream: Split bucket
2 1 2 W t W
m
31
1
2 W
1 2 W 4 3 W 1 4 3 W
W
i
2
i
2
1
t
t
Stream:
1
x
32
1
2 W
1 2 W 4 3 W 1 4 3 W
W
1 2
i i
2
1
t
t
Stream:
1 m
t
Increase appropriate bucket
1
x
4 3 1 2
1
W t W
m
33
1
2 W
1 2 W 4 3 W 1 4 3 W
W
1
x
1
2
i 4
x
1
t
t
Stream:
1 m
t
1 2 W 4 3 W
1 2 W
4 3 W
8 5 W 1 8 5 W
i
2
i
2
m
t
Split bucket
34
1
2 W
1 4 3 W
W
1
x
4
x
1
t
t
Stream:
1 m
t
1 2 W
4 3 W
8 5 W 1 8 5 W
i
2
i
2
m
t
35
1
W
1
2 W 1 2 W
W
1 2 W 4 3 W 1 4 3 W
W
1 2 W
4 3 W
8 5 W 1 8 5 W
1
2
i 1
2
i 1
2
i
1
x
4
x
2
x
3
x
1
i k i
Splitting Tree
36
1
W
Leaf buckets of duration 1 are not split any further
1
t 1
1
t
2
t 1
2
t
1
2
i
Max depth =
37
1
W
1
2
i
The initial bucket may be split into many buckets Leaf buckets
38
1
W
1
2
i
Due to space limitations we only keep the last buckets
Leaf buckets
39
1
W
T
Suppose we want to find the sum
] , [ W T
40
1
W
T
a a
a a
1
2
2
2
k
2
1
2
k
Consider various levels
41
1
W
T
a a
a
1
2
2
2
1
2
k
First level with a leaf bucket that intersects timeline
a
k
2
42
1
W
T
a
k
2
Estimate of S:
1
x
2
x
z
x
z
2 1
Consider buckets on right of timeline
a z
43
1
W
T
a a
a
1
2
2
2
1
2
k
First level with a leaf bucket On right timeline
a
k
2
OR
44
Introduction Algorithm Analysis Outline of Talk
45
Suppose that we use level in order to compute the estimate
1
i
1
W
T
46 k
t
Stream:
1
b b
x x
l
r
A data element is counted in the appropriate bucket Consider splitting threshold level
1
i
47
k
t
Stream:
k
t
We can assume that the element is placed in the respective bucket
l
r
r k l
48
Stream:
k
t
We can assume that when bucket splits the element is placed in an arbitrary child bucket
l
r
l
r
k
t
2
r l
t t 1 2
r l
t t 1
2
i i
2
i
2
49
Stream:
k
t
l
r
l
r
k
t
2
r l
t t 1 2
r l
t t 1
2
i i
2
i
2
r l k l
If: GOOD! Element counted in correct bucket
50
Stream:
k
t
l
r
l
r
2
r l
t t 1 2
r l
t t 1
2
i i
2
i
2
r k r l
If: BAD! Element counted in wrong bucket
k
t
51
1
W
T
k
t
Consider Leaf Buckets If
k
1
W
GOOD!
52
1
W
T
k
t
Consider Leaf Buckets If
1
W
BAD! Element counted in wrong bucket
53
2 1
:elements of left part counted on right
1
W
T
k
t
Consider Leaf Buckets
1
W
1
2
:elements of right part counted on left
54 k
t
1
W
k
t
1
elements of left part counted on right
T
1
W
Must have been initially inserted in one of these buckets
55
Since tree depth
1
i
56
Since tree depth
1
i
Similarly, we can prove
2
i
Therefore:
) log 2 ( || | | || | |
2 1
W O Z Z S X
i
57
It can be proven
Since
i
58
It can be proven
Since Combined with
i
i
We obtain relative error :