Concurrent & Multicore OCaml: A deep dive KC Sivaramakrishnan 1 - - PowerPoint PPT Presentation

concurrent multicore ocaml a deep dive
SMART_READER_LITE
LIVE PREVIEW

Concurrent & Multicore OCaml: A deep dive KC Sivaramakrishnan 1 - - PowerPoint PPT Presentation

Concurrent & Multicore OCaml: A deep dive KC Sivaramakrishnan 1 & Stephen Dolan 1 Leo White 2 , Jeremy Yallop 1,3 , Armal Guneau 4 , Anil Madhavapeddy 1,3 1 2 3 4 Concurrency Parallelism Concurrency Programming technique


slide-1
SLIDE 1

Concurrent & Multicore OCaml: A deep dive

KC Sivaramakrishnan1 & Stephen Dolan1 Leo White2, Jeremy Yallop1,3, Armaël Guéneau4, Anil Madhavapeddy1,3

1 2 3 4

slide-2
SLIDE 2

Concurrency ≠ Parallelism

  • Concurrency
  • Programming technique
  • Overlapped execution of processes
  • Parallelism
  • (Extreme) Performance hack
  • Simultaneous execution of computations
slide-3
SLIDE 3

Concurrency ≠ Parallelism

  • Concurrency
  • Programming technique
  • Overlapped execution of processes
  • Parallelism
  • (Extreme) Performance hack
  • Simultaneous execution of computations

Concurrency ∩ Parallelism ➔ Scalable Concurrency

slide-4
SLIDE 4

Concurrency ≠ Parallelism

  • Concurrency
  • Programming technique
  • Overlapped execution of processes
  • Parallelism
  • (Extreme) Performance hack
  • Simultaneous execution of computations

Concurrency ∩ Parallelism ➔ Scalable Concurrency (Fibers) (Domains)

slide-5
SLIDE 5

Schedulers

  • Multiplexing fjbers over domain(s)
  • Bake scheduler into the runtime system (GHC)
slide-6
SLIDE 6

Schedulers

  • Multiplexing fjbers over domain(s)
  • Bake scheduler into the runtime system (GHC)
  • Allow programmers to describe schedulers!
  • Parallel search —> LIFO work-stealing
  • Web-server —> FIFO runqueue
  • Data parallel —> Gang scheduling
slide-7
SLIDE 7

Schedulers

  • Multiplexing fjbers over domain(s)
  • Bake scheduler into the runtime system (GHC)
  • Allow programmers to describe schedulers!
  • Parallel search —> LIFO work-stealing
  • Web-server —> FIFO runqueue
  • Data parallel —> Gang scheduling
  • Algebraic Effects and Handlers
slide-8
SLIDE 8

Algebraic effects & handlers

slide-9
SLIDE 9
  • Programming and reasoning about computational effects

in a pure setting.

  • Cf. Monads

Algebraic effects & handlers

slide-10
SLIDE 10
  • Programming and reasoning about computational effects

in a pure setting.

  • Cf. Monads
  • Eff — http://www.eff-lang.org/

Algebraic effects & handlers

slide-11
SLIDE 11

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

slide-12
SLIDE 12

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

slide-13
SLIDE 13

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

slide-14
SLIDE 14

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-15
SLIDE 15

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-16
SLIDE 16

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-17
SLIDE 17

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

slide-18
SLIDE 18

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

val r : int = 5

slide-19
SLIDE 19

Algebraic Effects: Example

exception Foo of int let f () = 1 + (raise (Foo 3)) let r = try f () with Foo i -> i + 1

val r : int = 4

effect Foo : int -> int let f () = 1 + (perform (Foo 3)) 4 let r = try f () with effect (Foo i) k -> continue k (i + 1)

val r : int = 5

fjber — lightweight stack

slide-20
SLIDE 20

Scheduler Demo1

[1] https://github.com/kayceesrk/ocaml15-eff/tree/master/chameneos-redux

slide-21
SLIDE 21
  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS

Implementation

slide-22
SLIDE 22
  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifjes reasoning about resources - sockets, locks, etc.

Implementation

slide-23
SLIDE 23
  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifjes reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fjbers

Implementation

slide-24
SLIDE 24
  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifjes reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fjbers

Implementation

handle / continue

handler sp

call chain reference

slide-25
SLIDE 25

Implementation

handle / continue handle / continue

sp handler

call chain reference

  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifjes reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fjbers
slide-26
SLIDE 26

perform

sp

handle / continue

Implementation

handler

call chain reference

  • Fibers: Heap allocated, dynamically resized stacks
  • ~10s of bytes
  • No unnecessary closure allocation costs unlike CPS
  • One-shot delimited continuations
  • Simplifjes reasoning about resources - sockets, locks, etc.
  • Handlers —> Linked-list of fjbers
slide-27
SLIDE 27

Native-code fjbers — Vanilla

C

slide-28
SLIDE 28

Native-code fjbers — Vanilla

OCaml start program

C OCaml

slide-29
SLIDE 29

Native-code fjbers — Vanilla

OCaml start program C call

C OCaml C

slide-30
SLIDE 30

Native-code fjbers — Vanilla

OCaml start program C call OCaml callback

C OCaml C OCaml

slide-31
SLIDE 31

Native-code fjbers — Vanilla

OCaml start program C call OCaml callback C call

C OCaml C OCaml C

slide-32
SLIDE 32

Native-code fjbers — Vanilla

OCaml start program C call OCaml callback C call OCaml callback

C OCaml C OCaml C OCaml

slide-33
SLIDE 33

C

system stack

Native-code fjbers — Effects

slide-34
SLIDE 34

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program

slide-35
SLIDE 35

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program handle

slide-36
SLIDE 36

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program C call handle

C

slide-37
SLIDE 37

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program C call handle OCaml callback

C

slide-38
SLIDE 38

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program C call handle OCaml callback C call

C C

slide-39
SLIDE 39

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program C call handle OCaml callback C call

C C

  • 1. Stack overfmow checks for OCaml functions
  • Simple static analysis eliminates many checks
slide-40
SLIDE 40

C

system stack

Native-code fjbers — Effects

OCaml heap

OCaml start program C call handle OCaml callback C call

C C

  • 1. Stack overfmow checks for OCaml functions
  • Simple static analysis eliminates many checks
  • 2. FFI calls are more expensive due to stack switching
  • Specialise for calls which {allocate / pass arguments on stack / do neither}
slide-41
SLIDE 41

Performance : Vanilla OCaml

0.25 0.5 0.75 1 almabench alt-ergo-parameter_smallest_divisor alt-ergo-carte_autorisee_3 alt-ergo-parameter_relabel alt-ergo-OBF__ggjj_2 alt-ergo-parameter_def alt-ergo-parameter_def alt-ergo-OBF__yyll_1 alt-ergo-bbvv_351 alt-ergo-controler_carte_13 alt-ergo-div2_sub alt-ergo-parameter_def alt-ergo-ccgg_2055 alt-ergo-parameter_def alt-ergo-parameter_inverse_in_place alt-ergo-ccgg_1759 alt-ergo-parameter_def alt-ergo-induction_step alt-ergo-ccgg_1618 alt-ergo-ccgg_219 alt-ergo-advance_automaton_25 alt-ergo-nsec_sum_higher_than_1s alt-ergo-fill_assert_39_Alt-Ergo bdd chameneos-async chameneos-lwt cohttp-lwt core_micro cpdf-merge cpdf-reformat cpdf-squeeze cpdf-transform frama-c-deflate frama-c-idct js_of_ocaml jsontrip-sample kb kb-no-exc lexifi-g2pp menhir-fancy menhir-sql menhir-standard minilight numal-durand-kerner-aberth numal-fft numal-k-means numal-levinson-durbin numal-lu-decomposition numal-naive-multilayer numal-qr-decomposition numal-rnd_access numal-simple_access patdiff sauvola-contrast sequence sequence-cps setrip setrip-smallbuf thread-ring-async-pipe thread-ring-lwt-mvar thread-ring-lwt-stream thread-sleep-async thread-sleep-lwt valet-async valet-lwt ydump-sample

4.02.2+effects 4.02.2+vanilla

Normalised time (lower is better)

slide-42
SLIDE 42

Performance : Vanilla OCaml

0.25 0.5 0.75 1 almabench alt-ergo-parameter_smallest_divisor alt-ergo-carte_autorisee_3 alt-ergo-parameter_relabel alt-ergo-OBF__ggjj_2 alt-ergo-parameter_def alt-ergo-parameter_def alt-ergo-OBF__yyll_1 alt-ergo-bbvv_351 alt-ergo-controler_carte_13 alt-ergo-div2_sub alt-ergo-parameter_def alt-ergo-ccgg_2055 alt-ergo-parameter_def alt-ergo-parameter_inverse_in_place alt-ergo-ccgg_1759 alt-ergo-parameter_def alt-ergo-induction_step alt-ergo-ccgg_1618 alt-ergo-ccgg_219 alt-ergo-advance_automaton_25 alt-ergo-nsec_sum_higher_than_1s alt-ergo-fill_assert_39_Alt-Ergo bdd chameneos-async chameneos-lwt cohttp-lwt core_micro cpdf-merge cpdf-reformat cpdf-squeeze cpdf-transform frama-c-deflate frama-c-idct js_of_ocaml jsontrip-sample kb kb-no-exc lexifi-g2pp menhir-fancy menhir-sql menhir-standard minilight numal-durand-kerner-aberth numal-fft numal-k-means numal-levinson-durbin numal-lu-decomposition numal-naive-multilayer numal-qr-decomposition numal-rnd_access numal-simple_access patdiff sauvola-contrast sequence sequence-cps setrip setrip-smallbuf thread-ring-async-pipe thread-ring-lwt-mvar thread-ring-lwt-stream thread-sleep-async thread-sleep-lwt valet-async valet-lwt ydump-sample

4.02.2+effects 4.02.2+vanilla

Normalised time (lower is better)

4.02.2+effects ~5.4% slower

slide-43
SLIDE 43

Performance : Chameneos-Redux

Time (S) 0.45 0.9 1.35 1.8 Iterations (X100,000) 1 2 3 4 5 6 7 8 9 10

Lwt Concurrency Monad GHC Fibers

slide-44
SLIDE 44

Generator from Iterator1

[1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml

let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t

slide-45
SLIDE 45

Generator from Iterator1

[1] https://github.com/kayceesrk/ocaml15-eff/blob/master/generator.ml

(* val to_gen : 'a t -> (unit -> 'a option) *) let to_gen (type a) (t : a t) = let module M = struct effect Next : a -> unit end in let open M in let step = ref (fun () -> assert false) in let first_step () = try iter (fun x -> perform (Next x)) t; None with effect (Next v) k -> step := continue k; Some v in step := first_step; fun () -> !step () let rec iter f = function | Leaf -> () | Node (l, x, r) -> iter f l; f x; iter f r type 'a t = | Leaf | Node of 'a t * 'a * 'a t

slide-46
SLIDE 46

Performance : Generator

Time (S) 1 2 3 4 Binary tree depth 15 16 17 18 19 20 21 22 23 24 25

Iterator Fiber Generator H/W Generator

slide-47
SLIDE 47

Async I/O in direct style1

[1] https://github.com/kayceesrk/ocaml15-eff/tree/master/async-io

slide-48
SLIDE 48

Async I/O in direct style1

Callback Hell

[1] https://github.com/kayceesrk/ocaml15-eff/tree/master/async-io

slide-49
SLIDE 49

Javascript backend

  • js_of_ocaml
  • OCaml bytecode —> Javascript
slide-50
SLIDE 50

Javascript backend

  • js_of_ocaml
  • OCaml bytecode —> Javascript
  • js_of_ocaml compiler pass
  • Whole-program selective CPS transformation
slide-51
SLIDE 51

Javascript backend

  • js_of_ocaml
  • OCaml bytecode —> Javascript
  • js_of_ocaml compiler pass
  • Whole-program selective CPS transformation
  • Work-in-progress!
  • Runs “hello-effects-world”!
slide-52
SLIDE 52

fjn.

https://github.com/kayceesrk/ocaml-eff-example