Sharon: Shared Online Event Sequence Aggregation
Olga Poppe, Allison Rozet, Chuan Lei, Elke A. Rundensteiner, and David Maier April 18, 2018
Sharon: Shared Online Event Sequence Aggregation Olga Poppe, - - PowerPoint PPT Presentation
Sharon: Shared Online Event Sequence Aggregation Olga Poppe, Allison Rozet, Chuan Lei, Elke A. Rundensteiner, and David Maier April 18, 2018 Complex Event Processing 2 Primitive events Complex events CEP engine Input: High-rate, Output:
Olga Poppe, Allison Rozet, Chuan Lei, Elke A. Rundensteiner, and David Maier April 18, 2018
Worcester Polytechnic Institute
Primitive events Complex events
Input: High-rate, potentially unbounded event stream Output: Reliable summarized insights about the current situation in real time
Motivation Optimizer Evaluation Conclusion 2
Worcester Polytechnic Institute
Event Sequence Aggregation Queries Event Stream
!": RETURN COUNT(*) PATTERN OakSt, MainSt, StateSt WHERE [vehicle] WITHIN 10 min SLIDE 1 min !$: PATTERN OakSt, MainSt, WestSt !%: PATTERN LindenSt, ParkAve, OakSt, MainSt !&: PATTERN ParkAve, OakSt, MainSt, WestSt
Position report event
Motivation Optimizer Evaluation Conclusion 3
!": RETURN COUNT(*) PATTERN OakSt, MainSt, StateSt WHERE [vehicle] WITHIN 10 min SLIDE 1 min !$: PATTERN OakSt, MainSt, WestSt !%: PATTERN LindenSt, ParkAve, OakSt, MainSt !&: PATTERN ParkAve, OakSt, MainSt, WestSt !": RETURN COUNT(*) PATTERN OakSt, MainSt, StateSt WHERE [vehicle] WITHIN 10 min SLIDE 1 min !$: PATTERN OakSt, MainSt, WestSt !%: PATTERN LindenSt, ParkAve, OakSt, MainSt !&: PATTERN ParkAve, OakSt, MainSt, WestSt !": RETURN COUNT(*) PATTERN OakSt, MainSt, StateSt WHERE [vehicle] WITHIN 10 min SLIDE 1 min !$: PATTERN OakSt, MainSt, WestSt !%: PATTERN LindenSt, ParkAve, OakSt, MainSt !&: PATTERN ParkAve, OakSt, MainSt, WestSt
Worcester Polytechnic Institute
Motivation Optimizer Evaluation Conclusion 4
The aggregation of which sub-patterns should be shared to process the workload with minimal latency? Event Sequence Aggregation Queries Event Stream
Worcester Polytechnic Institute
Event Processing. In SIGMOD, pages 217-228, 2014.
event monitoring system. In CIDR, pages 412-422, 2007.
A-Seq. Y. Qi, L. Cao, M. Ray, and E. A. Rundensteiner. Complex event analytics: Online aggregation of stream sequence patterns. In SIGMOD, pages 229-240, 2014.
495-510, 2016.
pattern query sharing. In SIGMOD, pages 889-900, 2011.
Motivation Optimizer Evaluation Conclusion 5
Worcester Polytechnic Institute
Online yet shared event sequence aggregation: Trade-off between sharing and not sharing:
Sharing introduces overhead to combine intermediate aggregates
Intractable sharing plan search space:
Exponential in the number of sharing candidates
Motivation Optimizer Evaluation Conclusion 6
Sharing requires sequence construction Online skips sequence construction
Worcester Polytechnic Institute
Motivation Optimizer Evaluation Conclusion 7
Worcester Polytechnic Institute Non-shared:
8 Motivation Optimizer Evaluation Conclusion
Pattern from !": OakSt, MainSt, StateSt
Counts Event stream
m2
m4 s5 count(OakSt) 1 2 count(OakSt, MainSt) 1 3 count(OakSt, MainSt, StateSt) 3
Worcester Polytechnic Institute Shared:
9 Motivation Optimizer Evaluation Conclusion
Pattern from !": OakSt, MainSt, StateSt
Counts Event stream
m2
m4 s5 count(OakSt) 1 2 count(OakSt, MainSt) 1 3 count(StateSt) 1
Worcester Polytechnic Institute
10 Motivation Optimizer Evaluation Conclusion
Pattern from !": OakSt, MainSt, StateSt Pattern from !$: OakSt, MainSt, WestSt Pattern from !%: LindenSt, ParkAve, OakSt, MainSt Pattern from !&: ParkAve, OakSt, MainSt, WestSt
Benefit = Cost of not sharing
Worcester Polytechnic Institute
11 Motivation Optimizer Evaluation Conclusion
Pattern from !": OakSt, MainSt, StateSt Pattern from !$: OakSt, MainSt, WestSt Pattern from !%: LindenSt, ParkAve, OakSt, MainSt Pattern from !&: ParkAve, OakSt, MainSt, WestSt
Worcester Polytechnic Institute
Optimal sharing plan = Maximum Weight Independent Set
Motivation Optimizer Evaluation Conclusion 12
Worcester Polytechnic Institute
Motivation Optimizer Evaluation Conclusion 13
Worcester Polytechnic Institute
Challenge: Finding the optimal sharing plan is exponential in the number of vertices in the Sharon graph Sharon graph reduction principles:
Motivation Optimizer Evaluation Conclusion 14
Worcester Polytechnic Institute
Challenge: Finding the optimal sharing plan is exponential in the number of vertices in the Sharon graph Sharon graph reduction principles:
Motivation Optimizer Evaluation Conclusion 15
Worcester Polytechnic Institute
Challenge: Finding the optimal sharing plan is exponential in the number of vertices in the Sharon graph Sharon graph reduction principles:
Motivation Optimizer Evaluation Conclusion 16
Worcester Polytechnic Institute
Motivation Optimizer Evaluation Conclusion 17
Worcester Polytechnic Institute
Sharing Plan Selection Algorithm Optimal sharing plan (p2, {q3,q4}), (p4, {q2,q4}), (p6, {q1,q5}), (p7, {q6,q7}): 50
Motivation Optimizer Evaluation Conclusion 18
Worcester Polytechnic Institute
Execution infrastructure: Java 7, 1 Linux machine with 16-core 3.4 GHz CPU and 128GB of RAM Data sets:
Event sequences = Vehicle trajectories
Event sequences = Vehicle trajectories
Event sequences = Items added
[1] Unified New York City Taxi and Uber data. https://github.com/toddwschneider/nyc-taxi-data [2] A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, and R.
Motivation Optimizer Evaluation Conclusion 19
Worcester Polytechnic Institute
speed-up compared to the two-step approaches
Linear Road data set Taxi real data set
Motivation Optimizer Evaluation Conclusion 20
Worcester Polytechnic Institute
─ Sharing of intermediate aggregates ─ Online aggregation
Motivation Optimizer Evaluation Conclusion 21
Worcester Polytechnic Institute
Phases GO: Greedy EO: Exhaustive SO: Sharon Graph construction + + + Graph expansion
+ Graph reduction
Sharing plan finder + + +
search space
Motivation Optimizer Evaluation Conclusion 23
Worcester Polytechnic Institute
exhaustive search (20 queries) but 3 orders of magnitude slower than greedy (70 queries)
E-commerce data set Taxi real data set
Motivation Optimizer Evaluation Conclusion 24