Data Stream Processing
Part I
Motivation Data Streams Reservoir Sampling
1
Data Stream Processing Part I Motivation Data Streams Reservoir - - PowerPoint PPT Presentation
Data Stream Processing Part I Motivation Data Streams Reservoir Sampling 1 Homework 1 is due this Friday the 20th of October Motivation Data Streams Reservoir Sampling 2 Data Processing so far ... Input Document Output Document
Motivation Data Streams Reservoir Sampling
1
Motivation Data Streams Reservoir Sampling
2
Motivation Data Streams Reservoir Sampling
3
Input Document
time ºC ºC ºC
per hour 96 bytes per day
Motivation Data Streams Reservoir Sampling
4
Input Document
time
every 100 ms 3.5 Mb per day
Motivation Data Streams Reservoir Sampling
5
Input Document
time
every 100 ms 3.5 Tb per day
Motivation Data Streams Reservoir Sampling
6
Motivation Data Streams Reservoir Sampling
7
Input Document
time
Motivation Data Streams Reservoir Sampling
8
Motivation Data Streams Reservoir Sampling
9
Motivation Data Streams Reservoir Sampling
10
Motivation Data Streams Reservoir Sampling
11
Motivation Data Streams Reservoir Sampling
12
Motivation Data Streams Reservoir Sampling
13
Motivation Data Streams Reservoir Sampling
14
Motivation Data Streams Reservoir Sampling
15
Motivation Data Streams Reservoir Sampling
16
Motivation Data Streams Reservoir Sampling
17
Motivation Data Streams Reservoir Sampling
18
Motivation Data Streams Reservoir Sampling
19
Motivation Data Streams Reservoir Sampling
20
Motivation Data Streams Reservoir Sampling
21
Motivation Data Streams Reservoir Sampling
22
Motivation Data Streams Reservoir Sampling
23
Motivation Data Streams Reservoir Sampling
24
Motivation Data Streams Reservoir Sampling
25
Motivation Data Streams Reservoir Sampling
26
Motivation Data Streams Reservoir Sampling
27
1 Scan the text file, counting lines 2 Generate random line numbers [0, |lines|) 3 Sort the line numbers 4 Scan the text file, outputting selected lines Motivation Data Streams Reservoir Sampling
28
1 Scan the text file, counting lines 2 Generate random line numbers [0, |lines|) 3 Sort the line numbers 4 Scan the text file, outputting selected lines
Motivation Data Streams Reservoir Sampling
29
1 Scan the text file, counting lines 2 Generate random line numbers [0, |lines|) 3 Sort the line numbers 4 Scan the text file, outputting selected lines
Motivation Data Streams Reservoir Sampling
30
1 assign each query a random number 2 keep the queries with the top 1000 highest random numbers 3 discard the rest Motivation Data Streams Reservoir Sampling
31
1 assign each query a random number 2 keep the queries with the top 1000 highest random numbers 3 discard the rest
Motivation Data Streams Reservoir Sampling
32
1 assign each query a random number 2 keep the queries with the top 1000 highest random numbers 3 discard the rest
Motivation Data Streams Reservoir Sampling
33
Motivation Data Streams Reservoir Sampling
34
Motivation Data Streams Reservoir Sampling
35
Motivation Data Streams Reservoir Sampling
36
Motivation Data Streams Reservoir Sampling
37
2
Motivation Data Streams Reservoir Sampling
38
2
2
Motivation Data Streams Reservoir Sampling
39
2
2
Motivation Data Streams Reservoir Sampling
40
2
2
2
Motivation Data Streams Reservoir Sampling
41
Motivation Data Streams Reservoir Sampling
42
2.
Motivation Data Streams Reservoir Sampling
43
2.
Motivation Data Streams Reservoir Sampling
44
Motivation Data Streams Reservoir Sampling
45
2
Motivation Data Streams Reservoir Sampling
46
2
3
Motivation Data Streams Reservoir Sampling
47
2
3
n
Motivation Data Streams Reservoir Sampling
48
2
3
n
Motivation Data Streams Reservoir Sampling
49
n
Motivation Data Streams Reservoir Sampling
50
n each.
n n+1. Thus the first n lines each have probability
1 n+1 by construction.
Motivation Data Streams Reservoir Sampling
51
Motivation Data Streams Reservoir Sampling
52
Motivation Data Streams Reservoir Sampling
53
Motivation Data Streams Reservoir Sampling
54
n
Motivation Data Streams Reservoir Sampling
55
|lines|
Motivation Data Streams Reservoir Sampling
56
Motivation Data Streams Reservoir Sampling
57