Massive scale-out of expensive continuous queries E. Zeitler and - - PowerPoint PPT Presentation

massive scale out of expensive continuous queries
SMART_READER_LITE
LIVE PREVIEW

Massive scale-out of expensive continuous queries E. Zeitler and - - PowerPoint PPT Presentation

Massive scale-out of expensive continuous queries E. Zeitler and T.Risch Presentation by Thomas Pasquier Stream splitting Splitstream splistream(stream s, int q, function bfn, function rfn) user defines rfn (routing function)


slide-1
SLIDE 1

Massive scale-out of expensive continuous queries

  • E. Zeitler and T.Risch

Presentation by Thomas Pasquier

slide-2
SLIDE 2

Stream splitting

slide-3
SLIDE 3

Splitstream

  • splistream(stream s,

int q, function bfn, function rfn)

  • user defines rfn (routing function)
  • int rfn(int q, tupple t)
  • user defines bfn (broadcast function)
  • bool bfn(int q, tupple t)
slide-4
SLIDE 4

Naive implementation

slide-5
SLIDE 5

Tree shapped implemenation: maxtree

Scalable Splitting of Massive Data Streams

Erik Zeitler, Tore Risch

slide-6
SLIDE 6

Parasplit

slide-7
SLIDE 7

Parasplit*

slide-8
SLIDE 8

Evaluation: network bound

slide-9
SLIDE 9

Window router stream rate

If w large enough bound by the network However, performance decrease when p large (author state reason unknown)

slide-10
SLIDE 10

Evaluation parasplit*

Less degradation when using parasplit*

slide-11
SLIDE 11

Comparison different solutions

slide-12
SLIDE 12

Cost model and heuristic

slide-13
SLIDE 13

Cost model for Window router

Cpr = cr + cs + ce

  • cr : read cost
  • cs : split cost
  • ce : emit cost
slide-14
SLIDE 14

Cost window splitter

Cps = crw + cs (o+r+q.b) + ce(r+q.b)

  • crw : read cost per window
  • cs : split cost per tuple
  • ce : emit cost per tuple
  • o : omit %
  • r : routing %
  • b : broadcast %
  • o + r + b = 100%
slide-15
SLIDE 15

Cost model for query processor

Cpq = cr + p(cp+cm) + O

  • cr : read cost per tuple
  • cp : poll cost
  • cm : merge cost
  • O : cost for executing the query and emitting the results
slide-16
SLIDE 16

Cost model for parasplit

  • Cpr = crw + cs + ce
  • Cps = crw + cs (o+r+q.b) + ce(r+q.b)
  • Cpq = cr + p(cp+cm) + O
slide-17
SLIDE 17

Heuristic for estimating p

  • We search p such that
  • Assume:

○ 1% broadcast tuples ○ 0% omitted ○ crw = 0

  • Cps = crw + cs (o+r+q.b) + ce(r+q.b)
  • We estimate cs + ce by measuring the maximum steam rate
  • We can then estimate p, given the desired steam rate
slide-18
SLIDE 18

Efficiency

  • Measurement of the additional work incurred by executing

parasplit in comparison to executing a window splitter in a single process

  • Useful work:

○ p.Cps

  • Overhead:

○ Cpr ○ q.Cpq with O=0

slide-19
SLIDE 19

Evaluation efficiency

slide-20
SLIDE 20

Related publications

  • Event-based Systems: Opportunities and Challenges

at Exascale, Brenna et al., 2009 ○ stream splitting shown to be a bottleneck

  • MapReduce Online, Condie et al., 2010

○ does not handle scalable stream splitting

slide-21
SLIDE 21

Thank you

Questions ?