Distributed Workflows with Flowy EuroPython 2015 Sever Banesiu - - PowerPoint PPT Presentation

distributed workflows with flowy
SMART_READER_LITE
LIVE PREVIEW

Distributed Workflows with Flowy EuroPython 2015 Sever Banesiu - - PowerPoint PPT Presentation

Distributed Workflows with Flowy EuroPython 2015 Sever Banesiu @severb Overview 1. Distributed Workflows 2. Code + Demo 3. Workflow Engine 4. Execution Model 5. More Examples 6. Scaling EuroPython 2015 Sever Banesiu @severb What is a


slide-1
SLIDE 1

EuroPython 2015

Sever Banesiu @severb

Distributed Workflows with Flowy

slide-2
SLIDE 2
  • 1. Distributed Workflows
  • 2. Code + Demo
  • 3. Workflow Engine
  • 4. Execution Model
  • 5. More Examples
  • 6. Scaling

Overview

EuroPython 2015

Sever Banesiu @severb

slide-3
SLIDE 3

EuroPython 2015

Sever Banesiu @severb

What is a distributed workflow?

Hint A process composed of a mix of independent and interdependent units of work called tasks.

slide-4
SLIDE 4

EuroPython 2015

Sever Banesiu @severb

Workflows are usually modeled with DAGs or ad-hoc code

Note Neither provide a satisfactory solution.

slide-5
SLIDE 5

It uses single-threaded-looking Python code and gradual concurrency inference.

EuroPython 2015

Sever Banesiu @severb

Flowy

A Workflow Modeling Library

slide-6
SLIDE 6

target ads EuroPython 2015

Sever Banesiu @severb

An Example

embed subtitle extract MPEG-4 find chapters WebM extract thumbnail update DB

video URL subtitle URL embedded URL CDN URL CDN URL CDN URL ad tags

slide-7
SLIDE 7

target ads EuroPython 2015

Sever Banesiu @severb

An Ad-hoc Solution, using task queues

worker embed subtitle task queue worker worker storage find chapters

slide-8
SLIDE 8

target ads EuroPython 2015

Sever Banesiu @severb

An Ad-hoc Solution, using task queues

worker embed subtitle task queue worker worker storage find chapters

slide-9
SLIDE 9

target ads EuroPython 2015

Sever Banesiu @severb

An Ad-hoc Solution, using task queues

worker embed subtitle task queue worker worker storage find chapters extract thumbnail extract thumbnail

slide-10
SLIDE 10

target ads EuroPython 2015

Sever Banesiu @severb

An Ad-hoc Solution, using task queues

worker embed subtitle task queue worker worker storage find chapters decision

slide-11
SLIDE 11

target ads EuroPython 2015

Sever Banesiu @severb

An Ad-hoc Solution, using task queues

worker embed subtitle task queue worker worker storage decision extract thumbnail extract thumbnail

slide-12
SLIDE 12

activity EuroPython 2015

Sever Banesiu @severb

The Workflow Engine

worker activity activity queue worker activity worker storage activity decision decision decision decision queue worker worker decision worker API

* automatically schedule the corresponding decision type when an activity is finished * ensure all decisions for the same workflow execution are sequential * merge multiple queued decisions for the same workflow execution into one * provide fault tolerance with timers

slide-13
SLIDE 13

activity EuroPython 2015

Sever Banesiu @severb

The Workflow Engine

worker activity activity queue worker activity worker storage activity decision decision decision decision queue worker worker decision worker API

Not something new

slide-14
SLIDE 14

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-15
SLIDE 15

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-16
SLIDE 16

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-17
SLIDE 17

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-18
SLIDE 18

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-19
SLIDE 19

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-20
SLIDE 20

EuroPython 2015

Sever Banesiu @severb

Side Effects

!

The execution path must not change between invocations. Use only pure functions inside the workflow code. Use input data or dedicated activities for random values, current date, external reading, etc. Avoid complex computations in the workflow code.

i

slide-21
SLIDE 21

EuroPython 2015

Sever Banesiu @severb

def example(square): def workflow(a, b): a_squared = square(a) b_squared = square(b) if a_squared + b_squared > 100: return math.copysign(a_squared, a) return math.copysign(b_squared, b) return workflow

Using Task Results

slide-22
SLIDE 22

EuroPython 2015

Sever Banesiu @severb

def example(square): def workflow(a, b): a_squared = square(a) b_squared = square(b) if a_squared + b_squared > 100: return math.copysign(a_squared, a) return math.copysign(b_squared, b) return workflow

Using Task Results

slide-23
SLIDE 23

EuroPython 2015

Sever Banesiu @severb

Execution Model

def process_video(embed_subtitle, find_chapters, ...): def workflow(video_URL, subtitle_URL): new_URL = embed_subtitle(video_URL, subtitle_URL) webm_URL = encode_video(new_URL, 'webm') mpeg4_URL = encode_video(new_URL, 'mpeg4') ad_tags = target_ads(subtitle_URL) chapters = find_chapters(video_URL) thumbnails = [extract_thumbnail(video_URL, c) for c in chapters] return video_URL, webm_URL, mpeg4_URL, thumbnails, ad_tags return workflow

slide-24
SLIDE 24

EuroPython 2015

Sever Banesiu @severb

def example(sum, square): def workflow(a, b): a_squared = square(a) b_squared = square(b) if a_squared < 100: a_squared = sum(a_squared, 100) if b_squared > 100: b_squared = sum(b_squared, 100) return sum(a_squared, b_squared) return workflow

Using Task Results

slide-25
SLIDE 25

EuroPython 2015

Sever Banesiu @severb

Subworkflows

def subworkflow(sum, square): def workflow(n): n_squared = square(n) if n_squared < 100: n_squared = sum(n_squared, 100) return workflow def example(sum, example_sub): def workflow(a, b): return sum(example_sub(a_squared), example_sub(b_squared)) return workflow

slide-26
SLIDE 26

EuroPython 2015

Sever Banesiu @severb

Error Handling

def example(square): def workflow(a): try: a_squared = square(a) except: return 0 else: return a_squared + 100 return workflow

slide-27
SLIDE 27

EuroPython 2015

Sever Banesiu @severb

Error Handling

def example(square): def workflow(a): a_squared = square(a) try: return a_squared + 100 except TaskError: return 0 return workflow

slide-28
SLIDE 28

EuroPython 2015

Sever Banesiu @severb

Error Handling

def example(square): def workflow(a): a_squared = square(a) try: wait(a_squared) except TaskError: return 0 else: return a_squared + 100 return workflow

slide-29
SLIDE 29

EuroPython 2015

Sever Banesiu @severb

Error Handling

def example(sum, square): def workflow(a, b): a_squared = square(a) b_squared = square(b) return sum(a_squared, b_squared) return workflow

slide-30
SLIDE 30

EuroPython 2015

Sever Banesiu @severb

Scaling

* only configuration changes (+ heartbeat callable) * execution timers for fault tolerance * a new error type, TimeoutError * automatic retries on timeout * heartbeats * idempotent activities * activities in other languages * results and input data size restrictions * each worker is single threaded/process (use process managers) * use subworkflows if history gets too large * can scale up and down with ease (overall progress is not lost)

slide-31
SLIDE 31

EuroPython 2015

Sever Banesiu @severb

Thank you, Questions?

github.com/severb/flowy/

docs soon!