Architecture of Flink's Streaming Runtime Robert Metzger - - PowerPoint PPT Presentation

architecture of flink s
SMART_READER_LITE
LIVE PREVIEW

Architecture of Flink's Streaming Runtime Robert Metzger - - PowerPoint PPT Presentation

Architecture of Flink's Streaming Runtime Robert Metzger @rmetzger_ rmetzger@apache.org What is stream processing Real-world data is unbounded and is pushed to systems Right now: people are using the batch paradigm for stream analysis


slide-1
SLIDE 1

Architecture of Flink's Streaming Runtime

Robert Metzger @rmetzger_ rmetzger@apache.org

slide-2
SLIDE 2

What is stream processing

  • Real-world data is unbounded and is

pushed to systems

  • Right now: people are using the batch

paradigm for stream analysis (there was no good stream processor available)

  • New systems (Flink, Kafka) embrace

streaming nature of data

2

Web server Kafka topic Stream processing

slide-3
SLIDE 3

3

Flink is a stream processor with many faces

Streaming dataflow runtime

slide-4
SLIDE 4

Flink's streaming runtime

4

slide-5
SLIDE 5

Requirements for a stream processor

  • Low latency
  • Fast results (milliseconds)
  • High throughput
  • handle large data amounts (millions of events

per second)

  • Exactly-once guarantees
  • Correct results, also in failure cases
  • Programmability
  • Intuitive APIs

5

slide-6
SLIDE 6

Pipelining

6

Basic building block to “keep the data moving”

  • Low latency
  • Operators push

data forward

  • Data shipping as

buffers, not tuple- wise

  • Natural handling
  • f back-pressure
slide-7
SLIDE 7

Fault Tolerance in streaming

  • at least once: ensure all operators see all

events

  • Storm: Replay stream in failure case
  • Exactly once: Ensure that operators do

not perform duplicate updates to their state

  • Flink: Distributed Snapshots
  • Spark: Micro-batches on batch runtime

7

slide-8
SLIDE 8

Flink’s Distributed Snapshots

  • Lightweight approach of storing the state
  • f all operators without pausing the

execution  high throughput, low latency

  • Implemented using barriers flowing

through the topology

8

Kafka Consumer

  • ffset = 162

Element Counter value = 152

Operator state

Data Stream

barrier

Before barrier = part of the snapshot After barrier = Not in snapshot (backup till next snapshot)

slide-9
SLIDE 9

9

slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

Best of all worlds for streaming

  • Low latency
  • Thanks to pipelined engine
  • Exactly-once guarantees
  • Distributed Snapshots
  • High throughput
  • Controllable checkpointing overhead
  • Separates app logic from recovery
  • Checkpointing interval is just a config parameter

13

slide-14
SLIDE 14

Throughput of distributed grep

14

Data Generator “grep”

  • perator

30 machines, 120 cores

20.000.000 40.000.000 60.000.000 80.000.000 100.000.000 120.000.000 140.000.000 160.000.000 180.000.000 200.000.000

Flink, no fault tolerance Flink, exactly

  • nce (5s)

Storm, no fault tolerance Storm, micro- batches

aggregate throughput

  • f 175 million

elements per second aggregate throughput

  • f 9 million elements

per second

  • Flink achieves 20x

higher throughput

  • Flink throughput

almost the same with and without exactly-once

slide-15
SLIDE 15

Aggregate throughput for stream record grouping

15

10.000.000 20.000.000 30.000.000 40.000.000 50.000.000 60.000.000 70.000.000 80.000.000 90.000.000 100.000.000

Flink, no fault tolerance Flink, exactly

  • nce

Storm, no fault tolerance Storm, at least once

aggregate throughput

  • f 83 million elements

per second 8,6 million elements/s 309k elements/s  Flink achieves 260x

higher throughput with fault tolerance

30 machines, 120 cores Network transfer

slide-16
SLIDE 16

Latency in stream record grouping

16

Data Generator Receiver: Throughput / Latency measure

  • Measure time for a record to

travel from source to sink

0,00 5,00 10,00 15,00 20,00 25,00 30,00

Flink, no fault tolerance Flink, exactly

  • nce

Storm, at least once

Median latency

25 ms 1 ms

0,00 10,00 20,00 30,00 40,00 50,00 60,00

Flink, no fault tolerance Flink, exactly

  • nce

Storm, at least

  • nce

99th percentile latency 50 ms

slide-17
SLIDE 17

17

slide-18
SLIDE 18

Exactly-Once with YARN Chaos Monkey

  • Validate exactly-once guarantees with

state-machine

18

slide-19
SLIDE 19

“Faces” of Flink

19

slide-20
SLIDE 20

Faces of a stream processor

20

Stream processing Batch processing Machine Learning at scale Graph Analysis Streaming dataflow runtime

slide-21
SLIDE 21

The Flink Stack

21

Streaming dataflow runtime Specialized Abstractions / APIs Core APIs Flink Core Runtime Deployment

slide-22
SLIDE 22

APIs for stream and batch

22

case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()

DataSet API (batch): DataStream API (streaming):

slide-23
SLIDE 23

The Flink Stack

23

Streaming dataflow runtime DataSet (Java/Scala) DataStream (Java/Scala) Experimental Python API also available

Data Source

  • rders.tbl

Filter Map DataSource

lineitem.tbl

Join

Hybrid Hash buildHT probe hash-part [0] hash-part [0]

GroupRed

sort forward

API independent Dataflow Graph representation

Batch Optimizer Graph Builder

slide-24
SLIDE 24

Batch is a special case of streaming

  • Batch: run a bounded stream (data set) on

a stream processor

  • Form a global window over the entire data

set for join or grouping operations

24

slide-25
SLIDE 25

Batch-specific optimizations

  • Managed memory on- and off-heap
  • Operators (join, sort, …) with out-of-core

support

  • Optimized serialization stack for user-types
  • Cost-based Optimizer
  • Job execution depends on data size

25

slide-26
SLIDE 26

The Flink Stack

26

Streaming dataflow runtime Specialized Abstractions / APIs Core APIs Flink Core Runtime Deployment DataSet (Java/Scala) DataStream

slide-27
SLIDE 27

FlinkML: Machine Learning

  • API for ML pipelines inspired by scikit-learn
  • Collection of packaged algorithms
  • SVM, Multiple Linear Regression, Optimization, ALS, ...

27

val trainingData: DataSet[LabeledVector] = ... val testingData: DataSet[Vector] = ... val scaler = StandardScaler() val polyFeatures = PolynomialFeatures().setDegree(3) val mlr = MultipleLinearRegression() val pipeline = scaler.chainTransformer(polyFeatures).chainPredictor(mlr) pipeline.fit(trainingData) val predictions: DataSet[LabeledVector] = pipeline.predict(testingData)

slide-28
SLIDE 28

Gelly: Graph Processing

  • Graph API and library
  • Packaged algorithms
  • PageRank, SSSP, Label Propagation, Community

Detection, Connected Components

28

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); Graph<Long, Long, NullValue> graph = ... DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run( new LabelPropagation<Long>(30)).getVertices(); verticesWithCommunity.print(); env.execute();

slide-29
SLIDE 29

Flink Stack += Gelly, ML

29

Gelly ML DataSet (Java/Scala) DataStream Streaming dataflow runtime

slide-30
SLIDE 30

Integration with other systems

30

SAMOA DataSet DataStream

Hadoop M/R

Google Dataflow Cascading

Storm

Zeppelin

  • Use Hadoop Input/Output Formats
  • Mapper / Reducer implementations
  • Hadoop’s FileSystem implementations
  • Run applications implemented against Google’s Data Flow API
  • n premise with Flink
  • Run Cascading jobs on Flink, with almost no code change
  • Benefit from Flink’s vastly better performance than

MapReduce

  • Interactive, web-based data exploration
  • Machine learning on data streams
  • Compatibility layer for running Storm code
  • FlinkTopologyBuilder: one line replacement for

existing jobs

  • Wrappers for Storm Spouts and Bolts
  • Coming soon: Exactly-once with Storm
slide-31
SLIDE 31

Deployment options

Gelly Table ML SAMOA DataSet (Java/Scala) DataStream Hadoop

Local Cluster YARN Tez Embedded

Dataflow Dataflow MRQL Table Cascading Streaming dataflow runtime Storm Zeppelin

  • Start Flink in your IDE / on your machine
  • Local debugging / development using the

same code as on the cluster

  • “bare metal” standalone installation of Flink
  • n a cluster
  • Flink on Hadoop YARN (Hadoop 2.2.0+)
  • Restarts failed containers
  • Support for Kerberos-secured YARN/HDFS

setups

slide-32
SLIDE 32

The full stack

32

Gelly Table ML SAMOA DataSet (Java/Scala) DataStream

Hadoop M/R

Local Cluster Yarn Tez Embedded Dataflow

Dataflow (WiP)

MRQL Table

Cascading

Streaming dataflow runtime

Storm (WiP)

Zeppelin

slide-33
SLIDE 33

Closing

33

slide-34
SLIDE 34

tl;dr Summary

Flink is a software stack of

  • Streaming runtime
  • low latency
  • high throughput
  • fault tolerant, exactly-once data processing
  • Rich APIs for batch and stream processing
  • library ecosystem
  • integration with many systems
  • A great community of devs and users
  • Used in production

34

slide-35
SLIDE 35

What is currently happening?

  • Features in progress:
  • Master High Availability
  • Vastly improved monitoring GUI
  • Watermarks / Event time processing /

Windowing rework

  • Graduate Streaming API out of Beta
  • 0.10.0-milestone-1 is currently voted

35

slide-36
SLIDE 36

How do I get started?

36

Mailing Lists: (news | user | dev)@flink.apache.org Twitter: @ApacheFlink Blogs: flink.apache.org/blog, data-artisans.com/blog/ IRC channel: irc.freenode.net#flink Start Flink on YARN in 4 commands:

# get the hadoop2 package from the Flink download page at # http://flink.apache.org/downloads.html wget <download url> tar xvzf flink-0.9.1-bin-hadoop2.tgz cd flink-0.9.1/ ./bin/flink run -m yarn-cluster -yn 4 ./examples/flink-java- examples-0.9.1-WordCount.jar

slide-37
SLIDE 37

flink.apache.org 37

Flink Forward: 2 days conference with free training in Berlin, Germany

  • Schedule: http://flink-forward.org/?post_type=day
slide-38
SLIDE 38

Appendix

38

slide-39
SLIDE 39

Managed (off-heap) memory and out-of- core support

39

Memory runs out

slide-40
SLIDE 40

Cost-based Optimizer

40

DataSource

  • rders.tbl

Filter Map DataSource

lineitem.tbl

Join

Hybrid Hash buildHT probe broadcast forward

Combine GroupRed

sort

DataSource

  • rders.tbl

Filter Map DataSource

lineitem.tbl

Join

Hybrid Hash buildHT probe hash-part [0] hash-part [0] hash-part [0,1]

GroupRed

sort forward

Best plan depends on relative sizes

  • f input files
slide-41
SLIDE 41

41

case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Optimizer Type extraction stack Task scheduling Dataflow metadata

Pre-flight (Client) JobManager TaskManagers

Data Source

  • rders.tbl

Filter Map DataSourc e

lineitem.tbl

Join

Hybrid Hash build HT probe hash-part [0] hash-part [0]

GroupRed

sort forward

Program Dataflow Graph deploy

  • perators

track intermediate results

Local Cluster: YARN, Standalone

slide-42
SLIDE 42

Iterative processing in Flink

Flink offers built-in iterations and delta iterations to execute ML and graph algorithms efficiently

42

map join sum ID1 ID2 ID3

slide-43
SLIDE 43

Example: Matrix Factorization

43

Factorizing a matrix with 28 billion ratings for recommendations

More at: http://data-artisans.com/computing-recommendations-with-flink.html

slide-44
SLIDE 44

44

Batch aggregation

ExecutionGraph JobManager TaskManager 1 TaskManager 2

M1 M2 RP1 RP2 R1 R2

1 2 3a 3b 4a 4b 5a 5b

"Blocked" result partition

slide-45
SLIDE 45

45

Streaming window aggregation

ExecutionGraph JobManager TaskManager 1 TaskManager 2

M1 M2 RP1 RP2 R1 R2

1 2 3a 3b 4a 4b 5a 5b

"Pipelined" result partition