[PPT] - Spark and Friends Presented by: Jeff Rasley & John PowerPoint Presentation

SLIDE 1

Spark ¡and ¡Friends ¡

Presented ¡by: ¡Jeff ¡Rasley ¡& ¡John ¡Meehan ¡

SLIDE 2

Resilient ¡Distributed ¡Datasets: ¡ ¡

A ¡Fault-‑Tolerant ¡Abstrac;on ¡for ¡In-‑Memory ¡ Cluster ¡Compu;ng ¡

UC ¡Berkeley, ¡AMP ¡Lab ¡ NSDI ¡2012 ¡

Presented ¡by: ¡Jeff ¡Rasley ¡

SLIDE 3

Outline ¡

Mo;va;on ¡
Resilient ¡Distributed ¡Datasets ¡
Implementa;on ¡
Examples ¡
Performance ¡
Discussion ¡
Summary ¡
Demo ¡

SLIDE 4

Mo7va7on ¡

Slow ¡due ¡to ¡replica;on, ¡however ¡it ¡is ¡required ¡for ¡fault-‑tolerance ¡

SLIDE 5

Resilient ¡Distributed ¡Datasets ¡(RDDs) ¡

Significantly ¡faster, ¡but ¡what ¡about ¡fault-‑tolerance? ¡

SLIDE 6

RDDs: ¡Fault ¡Tolerance ¡

We ¡could ¡replicate ¡data ¡and/or ¡logs ¡across ¡

cluster ¡

Expensive! ¡
These ¡systems ¡exist ¡for ¡fine-‑grained ¡updates ¡

§ RAMCloud, ¡distributed ¡mem, ¡Piccolo, ¡databases, ¡etc. ¡

Instead ¡only ¡allow ¡coarse-‑grained ¡updates ¡
Log ¡determinis;c ¡transforma;on ¡opera;ons

¡ ¡

§ map, ¡join, ¡filter, ¡etc. ¡

Fault ¡recovery ¡by ¡replaying ¡update ¡lineage ¡

SLIDE 7

Tradeoffs ¡ RDDs ¡ ¡v. ¡ ¡HDFS ¡ ¡v. ¡ ¡K-‑V ¡stores ¡

SLIDE 8

Spark ¡is ¡an ¡actual ¡implementa;on ¡of ¡RDDs ¡
Works ¡with ¡the ¡Scala ¡interpreter ¡
Great ¡for ¡interac;ve ¡queries! ¡
Open ¡source: ¡spark.incubator.apache.org ¡
Read ¡data ¡from ¡HDFS ¡or ¡AWS ¡S3 ¡
Uses: ¡Spam ¡Classifica;on, ¡DNA ¡Sequencing, ¡

Interac;ve ¡Data ¡Mining ¡

Implementa7on ¡-‑ ¡Apache ¡Spark ¡

SLIDE 9

Example ¡-‑ ¡Console ¡Log ¡Mining ¡

lines ¡= ¡spark.textFile("hdfs://...") ¡ errors ¡= ¡lines.filter(_.startsWith("ERROR")) ¡ errors.persist() ¡ errors.filter(_.contains("HDFS")) ¡ ¡ ¡.map(_.split('\t')(3)) ¡ ¡ ¡.collect() ¡ Color ¡Key: ¡ Transformation ¡ Action ¡ Closure ¡

SLIDE 10

Spark ¡Opera7ons ¡

SLIDE 11

Spark: ¡Job ¡Stages ¡

Key ¡ Shaded ¡boxes ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡-‑ ¡ ¡ ¡RDDs ¡ Shaded ¡Outlines ¡ ¡ ¡ ¡-‑ ¡ ¡ ¡Par;;ons ¡ Arrows ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡-‑ ¡ ¡ ¡Data ¡transfer ¡between ¡RDDs ¡ ¡ Each stage is scheduled as a task in a pipeline to produce the final results automatically by the job scheduler.

SLIDE 12

Failure ¡Graph ¡

Itera;on ¡;mes ¡for ¡k-‑means ¡in ¡presence ¡of ¡a ¡failure. ¡One ¡machine ¡was ¡ killed ¡at ¡the ¡start ¡of ¡the ¡6th ¡itera;on, ¡resul;ng ¡in ¡par;al ¡reconstruc;on ¡of ¡ an ¡RDD ¡using ¡lineage. ¡

SLIDE 13

Performance ¡vs ¡Hadoop ¡

HadoopBinMem: ¡A ¡hadoop ¡deployment ¡that ¡converts ¡the ¡input ¡data ¡into ¡a ¡ low-‑overhead ¡binary ¡format ¡in ¡the ¡first ¡itera;on ¡to ¡eliminate ¡text ¡parsing ¡ in ¡later ¡ones, ¡and ¡stores ¡it ¡in ¡an ¡in-‑memory ¡HDFS ¡instance. ¡

SLIDE 14

Performance ¡vs ¡RAM ¡Size ¡

Itera;on ¡;mes ¡for ¡logis;c ¡regression ¡using ¡100 ¡GB ¡data ¡on ¡25 ¡machines ¡with ¡varying ¡amounts ¡of ¡ data ¡in ¡memory. ¡ ¡ Spills ¡data ¡to ¡disk ¡or ¡re-‑computes ¡the ¡par;;ons ¡that ¡don't ¡fit ¡in ¡RAM ¡each ¡;me ¡they ¡are ¡requested ¡

Entirely on disk

SLIDE 15

Discussion ¡

RDDs ¡can ¡express ¡numerous ¡systems: ¡
MapReduce ¡
DryadLINQ ¡
Hive/SQL ¡(Shark) ¡
Pregel ¡(200 ¡LOC) ¡
Itera;ve ¡MapReduce ¡(200 ¡LOC) ¡ ¡

§ e.g. ¡Haloop ¡

SLIDE 16

Pros ¡ ¡ ¡ ¡Cons ¡

Expressive ¡
Good ¡for ¡batch ¡queries ¡
Minimize ¡Disk ¡I/O ¡
Fast, ¡good ¡for... ¡
itera;ve ¡applica;ons ¡
interac;ve ¡queries ¡
Fault-‑tolerant ¡
Open-‑source ¡
Works ¡best ¡when ¡total ¡RAM ¡

size ¡> ¡RDD ¡sizes ¡

Unclear ¡how ¡

performance ¡scales ¡over ¡ 1TB ¡data ¡sets ¡

Nondeterminis;c ¡func;ons ¡

are ¡not ¡supported ¡

Doesn't ¡work ¡with ¡

asynchronous ¡fine-‑grained ¡ updates ¡

e.g. ¡an ¡incremental ¡web ¡

crawler ¡

SLIDE 17

Take-‑away: ¡Hadoop ¡vs ¡Spark ¡

Hadoop ¡
(+) Good for batch jobs of arbitrary map/reduce

functions (supports non-determinism)

(-) Very coarse data transformation model
(+) Highly supported, numerous resources available

§ Probably the reason it has so much momentum

Spark ¡
(+) Good for iterative jobs with deterministic

transformations

(+) Supports more transformations than M/R
(-) Relatively new, less support. Gaining traction

SLIDE 18

Demo ¡

5 ¡Minute ¡Demo ¡of ¡Matei ¡doing ¡some ¡ ¡ queries ¡on ¡the ¡Wikipedia ¡dataset ¡on ¡ ¡ an ¡EC2 ¡cluster ¡from ¡NSDI ¡’12 ¡

SLIDE 19

Discre7zed ¡Streams ¡

An ¡Efficient ¡and ¡Fault-‑Tolerant ¡Model ¡ for ¡Stream ¡Processing ¡on ¡Large ¡Clusters ¡

Presented by: John Meehan

UC ¡Berkeley, ¡AMP ¡Lab ¡ HotCloud ¡2012 ¡

SLIDE 20

Stream ¡Processing ¡

Con;nuous ¡queries ¡on ¡changing ¡dataset ¡
High-‑velocity ¡datasets ¡
Push-‑based ¡system ¡
Streaming ¡datasets ¡
Stock ¡;ckers ¡
Social ¡media ¡data ¡(Twiher) ¡
Sensor ¡data ¡
Modern ¡distributed ¡stream ¡systems ¡
Yahoo!’s ¡S4 ¡
Twiher’s ¡Storm ¡

SLIDE 21

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

SELECT ¡MIN(VALUE) ¡ FROM ¡WINDOW(TICKER, ¡3 ¡TUPLES) ¡

Data Flow

SLIDE 22

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

MINIMUM

Query Output

SELECT ¡MIN(VALUE) ¡ FROM ¡WINDOW(TICKER, ¡3 ¡TUPLES) ¡

Data Flow

SLIDE 23

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

MINIMUM

TICKER VALUE MSFT $70.28 TICKER VALUE MSFT $70.84 TICKER VALUE MSFT $70.55

SELECT ¡MIN(VALUE) ¡ FROM ¡WINDOW(TICKER, ¡3 ¡TUPLES) ¡

Data Flow Query Output

SLIDE 24

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

MINIMUM

TICKER VALUE MSFT $70.28 MSFT $70.84 MSFT $70.55

$70.28 Data Flow Query Output

SLIDE 25

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

MINIMUM

TICKER VALUE MSFT $70.43 TICKER VALUE MSFT $70.28 MSFT $70.84 MSFT $70.55

$70.28 Data Flow Query Output

SLIDE 26

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

MINIMUM

TICKER VALUE MSFT $70.84 MSFT $70.55 MSFT $70.43

$70.43 Data Flow Query Output

SLIDE 27

Cloud ¡Distribu7on ¡Challenges ¡

Consistency ¡
Global ¡state ¡difficult ¡to ¡achieve ¡
Fault ¡tolerance ¡
Replica;on ¡and ¡upstream ¡backup ¡
Slow ¡and ¡expensive ¡
Unifica;on ¡of ¡batch ¡processing ¡
Event-‑driven ¡systems ¡require ¡separate ¡API ¡
Difficult ¡to ¡combine ¡streaming ¡with ¡historical ¡data ¡

SLIDE 28

D-‑Streams: ¡Discre7zed ¡Streams ¡

Built ¡on ¡Spark ¡(aka ¡Spark ¡Streaming) ¡
Treats ¡streaming ¡computa;ons ¡as ¡a ¡series ¡of ¡

determinis;c ¡batch ¡computa;ons ¡

Tuples ¡are ¡divided ¡into ¡small ¡;me ¡intervals ¡
Parallelizable ¡opera;ons ¡transform ¡input ¡data ¡
Major ¡advantages ¡
Consistency ¡is ¡well-‑defined ¡
Processing ¡model ¡is ¡easy ¡to ¡unify ¡with ¡batch ¡systems ¡

SLIDE 29

INPUT ¡ DATA ¡ OUTPUT ¡ DATA ¡ PARALLELIZABLE ¡ TRANSFORMATIONS ¡

Waits ¡for ¡Time ¡ Interval, ¡collec;ng ¡ tuples ¡

TICKER VALUE MSFT $70.28 TICKER VALUE APPL $104.38 TICKER VALUE GOOG $89.33

SLIDE 30

INPUT ¡ DATA ¡ OUTPUT ¡ DATA ¡ PARALLELIZABLE ¡ TRANSFORMATIONS ¡

Sends ¡all ¡tuples ¡ ¡ as ¡a ¡batch ¡

TICKER VALUE MSFT $70.28 APPL $104.38 GOOG $89.33

SLIDE 31

Low ¡latency ¡in ¡a ¡batch ¡system ¡

Tradi;onal ¡batch ¡systems ¡(Hadoop): ¡store ¡

intermediate ¡state ¡on ¡disk ¡

Tens ¡of ¡seconds ¡latency… ¡ ¡
Too ¡slow ¡for ¡streaming ¡
Key-‑value ¡store ¡expensive ¡due ¡to ¡replica;on ¡
Solu;on: ¡RDDs ¡
Keeps ¡state ¡in ¡memory ¡
Allows ¡for ¡inexpensive ¡parallel ¡recovery ¡

SLIDE 32

Streaming ¡operators ¡on ¡RDDs ¡

Stateless ¡operators ¡
Act ¡independently ¡of ¡each ¡;me ¡interval ¡
Map, ¡reduce, ¡groupBy, ¡join ¡
Stateful ¡operators ¡
Operate ¡on ¡mul;ple ¡intervals ¡
May ¡produce ¡intermediate ¡RDDs ¡as ¡state ¡
Window, ¡incremental ¡aggrega;on, ¡;me-‑skewed ¡joins ¡
Output ¡operators ¡(write ¡to ¡external ¡file ¡systems) ¡
Transforma;on ¡operators ¡(produce ¡new ¡D-‑Stream) ¡

¡

SLIDE 33

Stateful ¡Operator ¡Example ¡

SLIDE 34

SLIDE 35

Fault ¡Recovery: ¡Normal ¡Methods ¡

Replica;on ¡
Keep ¡a ¡copy ¡of ¡all ¡data ¡on ¡a ¡separate ¡node ¡
Expensive ¡
Suscep;ble ¡to ¡double ¡node ¡failure ¡
Upstream ¡backup ¡
Each ¡upstream ¡node ¡buffers ¡data ¡sent ¡downstream ¡un;l ¡

all ¡computa;ons ¡finished ¡

Slow ¡
All ¡data ¡must ¡be ¡re-‑sent ¡to ¡standby ¡node ¡

SLIDE 36

BeWer ¡Method: ¡Parallel ¡Recovery ¡

Periodically ¡checkpoints ¡state ¡RDDs ¡
Asynchronous ¡replica;on ¡
Very ¡lightweight ¡due ¡to ¡coarse ¡granularity ¡
Recovers ¡from ¡last ¡checkpoint ¡on ¡failure ¡
Detects ¡missing ¡RDD ¡par;;ons ¡automa;cally ¡
Able ¡to ¡handle ¡“stragglers” ¡
Tuples ¡from ¡older ¡;mestamps ¡that ¡arrive ¡late ¡

SLIDE 37

Lineage ¡Tracking ¡

D-‑Streams ¡and ¡RDDs ¡both ¡track ¡lineage ¡
Dependency ¡graph ¡of ¡determinis;c ¡opera;ons ¡
RDD ¡par;;ons ¡recompute ¡all ¡missing ¡opera;ons ¡
n ¡failure ¡
Timely ¡due ¡to ¡paralleliza;on ¡of ¡opera;ons ¡

SLIDE 38

Unifica7on ¡with ¡Batch ¡/ ¡ ¡ ¡Interac7ve ¡Processing ¡

Can ¡be ¡combined ¡with ¡sta;c ¡RDDs ¡
Can ¡be ¡run ¡as ¡a ¡batch ¡job ¡on ¡previous ¡historical ¡data ¡
Ad-‑hoc ¡queries ¡using ¡Scala ¡console ¡and ¡Spark ¡

Streaming ¡

Most ¡popular ¡words ¡in ¡a ¡5-‑second ¡;me ¡range ¡ Print ¡all ¡counts ¡that ¡have ¡been ¡computed ¡

SLIDE 39

Scalability ¡

SLIDE 40

Shark: ¡SQL ¡and ¡Rich ¡ Analy7cs ¡at ¡Scale ¡

UC ¡Berkeley, ¡AMP ¡Lab ¡ Sigmod ¡2013 ¡

SLIDE 41

What ¡is ¡Shark? ¡

Port ¡of ¡Apache ¡Hive ¡on ¡Spark ¡
Compa;ble ¡with ¡exis;ng ¡Hive ¡queries ¡and ¡data ¡
Greatly ¡improved ¡performance ¡
Efficient in-memory storage
Uses arrays of primitive types rather than storing rows

SLIDE 42

Spark and Shark Presentation, Matei Zaharia

SLIDE 43

Spark and Shark Presentation, Matei Zaharia

SLIDE 44

Research ¡Contribu7ons? ¡

¡D-‑Streams? ¡
Lineage has been done before, just not with RDDs
General programming interface (ok…)
¡Shark? ¡
Very similar to Hadoop, except cutting out the disk
¡Contribu;ons ¡are ¡more ¡prac;cal ¡than ¡research ¡
High demand for low latency/high throughput
Performance gains highly impressive
Well-packaged, efficient systems

SLIDE 45

Spark ¡and ¡Friends ¡

Presented ¡by: ¡Jeff ¡Rasley ¡& ¡John ¡Meehan ¡

Resilient ¡Distributed ¡Datasets: ¡ ¡

A ¡Fault-­‑Tolerant ¡Abstrac;on ¡for ¡In-­‑Memory ¡ Cluster ¡Compu;ng ¡

Outline ¡

Mo7va7on ¡

Resilient ¡Distributed ¡Datasets ¡(RDDs) ¡

RDDs: ¡Fault ¡Tolerance ¡

cluster ¡

Tradeoffs ¡ RDDs ¡ ¡v. ¡ ¡HDFS ¡ ¡v. ¡ ¡K-­‑V ¡stores ¡

Interac;ve ¡Data ¡Mining ¡

Implementa7on ¡-­‑ ¡Apache ¡Spark ¡

Example ¡-­‑ ¡Console ¡Log ¡Mining ¡

Spark ¡Opera7ons ¡

Spark: ¡Job ¡Stages ¡

Failure ¡Graph ¡

Performance ¡vs ¡Hadoop ¡

Performance ¡vs ¡RAM ¡Size ¡

Discussion ¡

Pros ¡ ¡ ¡ ¡Cons ¡

Take-­‑away: ¡Hadoop ¡vs ¡Spark ¡

Demo ¡

Discre7zed ¡Streams ¡

An ¡Efficient ¡and ¡Fault-­‑Tolerant ¡Model ¡ for ¡Stream ¡Processing ¡on ¡Large ¡Clusters ¡

Stream ¡Processing ¡

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

Data Flow

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

Query Output

Data Flow

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

Data Flow Query Output

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

$70.28 Data Flow Query Output

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

$70.28 Data Flow Query Output

Streaming ¡Example ¡ Window ¡ ¡ (3 ¡tuples) ¡

Data Flow

$70.43 Data Flow Query Output

Cloud ¡Distribu7on ¡Challenges ¡

D-­‑Streams: ¡Discre7zed ¡Streams ¡

determinis;c ¡batch ¡computa;ons ¡

Low ¡latency ¡in ¡a ¡batch ¡system ¡

intermediate ¡state ¡on ¡disk ¡

Streaming ¡operators ¡on ¡RDDs ¡

Stateful ¡Operator ¡Example ¡

Fault ¡Recovery: ¡Normal ¡Methods ¡

BeWer ¡Method: ¡Parallel ¡Recovery ¡

Lineage ¡Tracking ¡

Unifica7on ¡with ¡Batch ¡/ ¡ ¡ ¡Interac7ve ¡Processing ¡

Streaming ¡

Scalability ¡

Shark: ¡SQL ¡and ¡Rich ¡ Analy7cs ¡at ¡Scale ¡

What ¡is ¡Shark? ¡

Research ¡Contribu7ons? ¡

Ques7ons? ¡

A ¡Fault-‑Tolerant ¡Abstrac;on ¡for ¡In-‑Memory ¡ Cluster ¡Compu;ng ¡

Tradeoffs ¡ RDDs ¡ ¡v. ¡ ¡HDFS ¡ ¡v. ¡ ¡K-‑V ¡stores ¡

Implementa7on ¡-‑ ¡Apache ¡Spark ¡

Example ¡-‑ ¡Console ¡Log ¡Mining ¡

Take-‑away: ¡Hadoop ¡vs ¡Spark ¡

An ¡Efficient ¡and ¡Fault-‑Tolerant ¡Model ¡ for ¡Stream ¡Processing ¡on ¡Large ¡Clusters ¡

D-‑Streams: ¡Discre7zed ¡Streams ¡