Efficient Parallel Functional Programming with Hierarchical Memory - - PowerPoint PPT Presentation

efficient parallel functional programming with
SMART_READER_LITE
LIVE PREVIEW

Efficient Parallel Functional Programming with Hierarchical Memory - - PowerPoint PPT Presentation

Efficient Parallel Functional Programming with Hierarchical Memory Management Sam Westrick Carnegie Mellon University Joint work with: Ram Raghunathan, Adrien Guatto, Stefan Muller, Rohan Yadav, Umut Acar, Guy Blelloch, Matthew Fluet Setting


slide-1
SLIDE 1

Efficient Parallel Functional Programming with Hierarchical Memory Management

Sam Westrick Carnegie Mellon University

Joint work with: Ram Raghunathan, Adrien Guatto, Stefan Muller, Rohan Yadav, Umut Acar, Guy Blelloch, Matthew Fluet

slide-2
SLIDE 2
  • functional programming is good for expressing parallelism


(no side-effects, no concurrency, no race conditions)

  • the point of parallelism is to make things faster…
  • absolute efficiency is paramount


(speedup w.r.t. fastest sequential solution)

  • is parallel functional programming efficient?
  • existing implementations achieve good scalability


but not absolute efficiency

  • standard challenges:


high rate of allocation, heavy reliance upon garbage collection

Setting the Stage

slide-3
SLIDE 3

The Problem

we need more efficient memory management for parallel programs

(not just functional)

slide-4
SLIDE 4

Example: Mergesort

fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-5
SLIDE 5

Example: Mergesort

msort [2,4,3,1] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-6
SLIDE 6

Example: Mergesort

par (fn () => msort [2,4], fn () => msort [3,1]) fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-7
SLIDE 7

Example: Mergesort

msort [2,4] msort [3,1] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-8
SLIDE 8

Example: Mergesort

msort [2] msort [4] msort [3] msort [1] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-9
SLIDE 9

Example: Mergesort

[2] [4] [3] [1] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-10
SLIDE 10

Example: Mergesort

merge [2] [4] merge [3] [1] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-11
SLIDE 11

Example: Mergesort

[2,4] [1,3] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-12
SLIDE 12

Example: Mergesort

merge [2,4] [1,3] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-13
SLIDE 13

Example: Mergesort

[1,2,3,4] fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-14
SLIDE 14

Hierarchical Memory Management

fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-15
SLIDE 15

Hierarchical Memory Management

X Y Z

join fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-16
SLIDE 16

Hierarchical Memory Management

X Y Z

join fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-17
SLIDE 17

Hierarchical Memory Management

fork (spawn)

X

fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-18
SLIDE 18

Hierarchical Memory Management

fork (spawn) fresh empty heaps fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

X

slide-19
SLIDE 19

Hierarchical Memory Management

A L R fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

slide-20
SLIDE 20

Hierarchical Memory Management

A L R fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end fork (spawn)

slide-21
SLIDE 21

Hierarchical Memory Management

A L R L1 R1 L2 R2 fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end B1 B2 join

slide-22
SLIDE 22

fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

Hierarchical Memory Management

A L R L1 R1 L2 R2 B1 B2

slide-23
SLIDE 23

fun msort A = if length A < 2 then A else let val (L, R) = splitMid A val (L’, R’) = par (fn () => msort L, fn () => msort R) val B = merge L’ R’ in B end

Hierarchical Memory Management

A L R L1 R1 L2 R2 B B1 B2

slide-24
SLIDE 24

Hierarchical Memory Management

  • give each task its own heap
  • tasks allocate new data inside their own heaps
  • organize heaps to mirror the nesting structure of tasks
  • fork (spawn, async, etc): fresh heaps for children
  • join (sync, finish, etc): merge heaps into parent
slide-25
SLIDE 25

Disentanglement:

in strict purely functional programs, all pointers either point up or are internal

[Raghunathan et al, ICFP’16]

slide-26
SLIDE 26

Disentanglement:

in strict purely functional programs, all pointers either point up or are internal

[Raghunathan et al, ICFP’16]

slide-27
SLIDE 27

Disentanglement:

in strict purely functional programs, all pointers either point up or are internal

[Raghunathan et al, ICFP’16]

slide-28
SLIDE 28

Disentanglement:

in strict purely functional programs, all pointers either point up or are internal

[Raghunathan et al, ICFP’16]

slide-29
SLIDE 29

Local Garbage Collection

pick a subtree reorganize, compact, etc. inside subtree

slide-30
SLIDE 30

Local Garbage Collection

pick a subtree reorganize, compact, etc. inside subtree

slide-31
SLIDE 31

Local Garbage Collection

Disentanglement is necessary:

slide-32
SLIDE 32

Local Garbage Collection

Disentanglement is necessary:

dangling pointer

slide-33
SLIDE 33

Local Garbage Collection

  • localized within a subtree of heaps
  • independent of
  • tasks whose heaps are outside the subtree
  • other local collections (on disjoint subtrees)
  • can easily apply any existing GC algorithm
  • just ignore pointers that exit the subtree
slide-34
SLIDE 34

In-place Updates

[]

r let val r = ref [] fun f () = (r := 0 :: !r) fun g () = (r := 1 :: !r) in par (f, g) end

  • often crucial for efficiency, especially under the hood
  • but, can break disentanglement (not always)
slide-35
SLIDE 35

In-place Updates

[]

r let val r = ref [] fun f () = (r := 0 :: !r) fun g () = (r := 1 :: !r) in par (f, g) end

  • often crucial for efficiency, especially under the hood
  • but, can break disentanglement (not always)
slide-36
SLIDE 36

In-place Updates

[]

r let val r = ref [] fun f () = (r := 0 :: !r) fun g () = (r := 1 :: !r) in par (f, g) end

  • often crucial for efficiency, especially under the hood
  • but, can break disentanglement (not always)
slide-37
SLIDE 37

In-place Updates

[]

r let val r = ref [] fun f () = (r := 0 :: !r) fun g () = (r := 1 :: !r) in par (f, g) end

1

  • often crucial for efficiency, especially under the hood
  • but, can break disentanglement (not always)
slide-38
SLIDE 38

In-place Updates

[]

r let val r = ref [] fun f () = (r := 0 :: !r) fun g () = (r := 1 :: !r) in par (f, g) end

1

  • often crucial for efficiency, especially under the hood
  • but, can break disentanglement (not always)
slide-39
SLIDE 39

In-place Updates

  • often crucial for efficiency, especially under the hood
  • but, can break disentanglement (not always)
  • options:
  • enforce disentanglement dynamically with promotion


[Guatto et al, PPoPP’18]

  • weaken to permit important classes of effects


[Westrick et al, work in progress]


slide-40
SLIDE 40

Implementation

  • extend MLton compiler with fork-join library


val par : (unit -> ‘a) * (unit -> ‘b) -> ‘a * ‘b

  • block-structured heaps
  • heaps are lists of blocks:


merge heaps in O(1) time

  • no read barrier. write barrier only on mutable pointer data
  • local collections: sequential Cheney-style copying/compacting
  • work-stealing scheduler
  • GC policy influenced by scheduler decisions
slide-41
SLIDE 41

Runtime Overhead

0.00 0.50 1.00 1.50 2.00 f i b t a b u l a t e m a p m a p

  • i

n

  • p

l a c e s c a n r e d u c e f i l t e r s a m p l e s

  • r

t m e r g e s

  • r

t d m m d e d u p h i s t

  • g

r a m b a r n e s

  • h

u t a l l

  • n

e a r e s t

Ours / MLton, 1 core

slide-42
SLIDE 42

Speedups

MLton / Ours, 72 cores

18 36 54 72 90

f i b t a b u l a t e m a p m a p

  • i

n

  • p

l a c e s c a n r e d u c e f i l t e r s a m p l e s

  • r

t m e r g e s

  • r

t d m m d e d u p h i s t

  • g

r a m b a r n e s

  • h

u t a l l

  • n

e a r e s t