Better Stream Processing with Python Taking the Hipster out of - - PowerPoint PPT Presentation

better stream processing with python
SMART_READER_LITE
LIVE PREVIEW

Better Stream Processing with Python Taking the Hipster out of - - PowerPoint PPT Presentation

Better Stream Processing with Python Taking the Hipster out of Streaming Andreas Heider, Robert Wall 12.07.2017 EuroPython Who are we? Developers at Winton Winton is a global investment management and data science company, founded in


slide-1
SLIDE 1

Better Stream Processing with Python

Taking the Hipster out of Streaming

Andreas Heider, Robert Wall 12.07.2017 EuroPython

slide-2
SLIDE 2

Who are we?

  • Developers at Winton
  • Winton is a global investment management and data science

company, founded in 1997

  • We believe the scientific method can be profitably applied to the

field of investing

2

slide-3
SLIDE 3

What do we mean by Stream processing?

3

Batch Stream

slide-4
SLIDE 4

Example: Real Time Financial Market Data

4

Time Symbol Price Qty 10:15:01 AAPL $144 10 10:15:02 GOOG $940 5 10:15:03 AAPL $145 11 …

Exchange

10:15:02 GOOG 5 @ $940 10:15:01 AAPL 10 @ $144

Trades

slide-5
SLIDE 5

Stream processing: Binning

5

Time Symbol Price Qty 10:15:01 AAPL $144 10 10:15:02 GOOG $940 5 10:15:03 AAPL $145 11 …

Binning Process

Time Symbol Avg. Price Volume 10:15 AAPL $144.5 1300 10:15 GOOG $943 1250 10:16 AAPL $145.3 1450 …

slide-6
SLIDE 6

Streaming Data at Winton

6

Event Streams Event Streams

Market Data Alternative Data Internal/ Business Events

Monitoring Databases Risk Management Investment Management Analytics

Transformations

Research

slide-7
SLIDE 7

Apache Kafka

7

Producer Consumer

Topic

Partition 1 Partition 2 Partition 3

slide-8
SLIDE 8

Sprawl of Stream Processing systems

8

slide-9
SLIDE 9

Kafka Streams

9

  • Simple library, not a framework
  • Event at a time stream processing
  • Stateful processing, joins and aggregations
  • Distributed processing and fault tolerance
  • Part of main Apache Kafka project
  • Java only so far :(
slide-10
SLIDE 10

Python at Winton

Many users, with different skillsets:

  • Developers
  • Researchers
  • Operations

10

slide-11
SLIDE 11

Talking to Kafka using kafka-python

11

Hipster Stream Processing

slide-12
SLIDE 12

Python Kafka Clients

12

https://github.com/dpkp/kafka-python

  • Pure Python implementation
  • Friendly, pythonic interface

https://github.com/confluentinc/confluent-kafka-python

  • Wrapper around C library
  • Amazingly high performance and robustness
slide-13
SLIDE 13

Experiences using low-level client

13

  • What starts out as a 10 line script ends up as yet another

homegrown streaming framework

  • The devil is in the details:
  • Guaranteeing at least once (or even exactly-once processing)
  • Handling stateful processing
  • Distributing load over various machines
  • Microbatching
  • Handling rebalances nicely
slide-14
SLIDE 14

Kafka Streams for Python

https://github.com/wintoncode/winton-kafka-streams

14

slide-15
SLIDE 15

Demo

15

slide-16
SLIDE 16

Goals / Roadmap

  • 1. Clean implementation of Kafka’s core streams API in Python
  • 2. Experiment with more pythonic API/DSL
  • 3. Optimise performance via batching/numpy/Arrow
  • 4. Implement more advanced features of Kafka’s streams API

(exactly once, …)

16

slide-17
SLIDE 17

Get in touch!

  • Project on GitHub:

https://github.com/wintoncode/winton-kafka-streams

  • Roadmap:

https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md

  • Announcement on kafka-dev
  • Come to our stand and talk to us
  • Thanks to Confluent

17

slide-18
SLIDE 18

Questions?

  • Project on GitHub:

https://github.com/wintoncode/winton-kafka-streams

  • Roadmap:

https://github.com/wintoncode/winton-kafka-streams/blob/master/ROADMAP.md

  • Announcement on kafka-dev
  • Come to our stand and talk to us
  • Thanks to Confluent

18

slide-19
SLIDE 19

Backup

19

slide-20
SLIDE 20

Some words of experience

  • Not everything fits the streaming model
  • Manually changing data is tricky
  • Be careful what you put in, have recovery method
  • Stable deployment can be challenging
  • Especially Zookeeper and buggy clients
  • Set up monitoring from the start
  • We use Prometheus and Grafana
  • https://github.com/yahoo/kafka-manager

20