Concurrency and Parallelism in ML John Reppy University of Chicago - - PowerPoint PPT Presentation

concurrency and parallelism in ml
SMART_READER_LITE
LIVE PREVIEW

Concurrency and Parallelism in ML John Reppy University of Chicago - - PowerPoint PPT Presentation

Concurrency and Parallelism in ML John Reppy University of Chicago MacQueen Fest May 12, 2012 History Personal history I ML on Unix (Cardelli ML) I ML + Amber = Pegasus ML I Standard ML of New Jersey (Version 0.15 on tape) I Pegasus ML


slide-1
SLIDE 1

Concurrency and Parallelism in ML

John Reppy

University of Chicago

MacQueen Fest — May 12, 2012

slide-2
SLIDE 2

History

Personal history

I ML on Unix (Cardelli ML) I ML + Amber =

⇒ Pegasus ML

I Standard ML of New Jersey (Version 0.15 on tape) I Pegasus ML + SML/NJ =

⇒ Concurrent ML

I =

⇒ Ph.D.!!!

I =

⇒ Department 11261 at Bell Labs

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 2

slide-3
SLIDE 3

Why ML?

What makes parallelism and concurrency hard?

The sequential core matters!

I Combination of shared mutable state and concurrency leads to

data races and non-determinism.

I Adding synchronization to avoid these problems leads to

deadlock.

I Shared memory does not scale well to NUMA and Distributed

Memory architectures.

I Scaling is hard.

Claim: traditional imperative programming languages are a bad fit for concurrent and parallel programming.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 3

slide-4
SLIDE 4

Why ML?

Alternatives

I Java, C#. etc. I Haskell I X10

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 4

slide-5
SLIDE 5

Why ML?

Standard ML

Claim: what we want is a strict, statically typed, functional language. I.e., Standard ML

I Strict CBV semantics I Type system distinguishes between mutable and non-mutable

values.

I Programming style is value-oriented.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 5

slide-6
SLIDE 6

Why ML?

Challenges

SML does not come without challenges.

I Polymorphism I Higher-order functions I Garbage collection I Exceptions

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 6

slide-7
SLIDE 7

Parallel ML

The Manticore Project

I The Manticore project is our effort to address the programming

needs of commodity applications running on multicore SMP systems

I No shared memory I Preserve determinism where possible I Declarative mechanisms for fine-grain parallelism

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 7

slide-8
SLIDE 8

Parallel ML

The Manticore Project (continued ...)

Our initial language is called Parallel ML (PML).

I Sequential core language based on subset of SML: strict with no

mutable storage.

I A variety of lightweight implicitly-threaded constructs for fine-grain

parallelism.

I Explicitly-threaded parallelism based on CML: message passing

with first-class synchronization.

I Prototype implementation with good scaling on 48-way parallel

hardware for a range of applications.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 8

slide-9
SLIDE 9

Parallel ML

Implicit threading

PML provides several light-weight syntactic forms for introducing parallel computation.

I Parallel tuples provide a basic fork-join parallel computation. I Nested Data-parallel arrays provide fine-grain data-parallel

computations over sequences.

I Parallel bindings provide data-flow parallelism with cancellation of

unused subcomputations.

I Parallel cases provide non-deterministic speculative parallelism.

These forms are annotations that mark a computation as a good candidate for parallel execution, but the details are left to the implementation.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 9

slide-10
SLIDE 10

Parallel ML

Challenges revisited

SML does not come without challenges.

I Polymorphism — whole program monomorphism using MLton’s

front end

I Higher-order functions — advanced CFA techniques I Garbage collection — DGL split-heap GC and parallel global GC I Exceptions — reduce use of arithmetic exceptions

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 10

slide-11
SLIDE 11

Parallel ML

PML performance

Number of processors

4 8 16 24 30 36 40 48

Speedup

4 8 12 16 20 24 28 32 36 40 44 48 perfect RayTracer QuickSort Black−Scholes Barnes−Hut

Speedup over sequential PML.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 11

slide-12
SLIDE 12

The future

The need for shared mutable state

I Mutable storage is a very powerful communication mechanism:

essentially a broadcast mechanism supported by the memory hardware.

I Sequential algorithms and data-structures gain significant

(asymptotic) performance benefits from shared memory (e.g., union-find with path compression).

I Some algorithms seem hard/impossible to parallelize without

shared state (e.g., mesh refinement).

I But shared memory makes parallel programming hard, so we

want to be cautious in adding it to PML.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 12

slide-13
SLIDE 13

The future

The design challenge

I How do we add shared memory while preserving PML

’s declarative programming model for fine-grain parallelism?

I Some races are okay in an implicitly threaded setting. I Deadlock is not okay in an implicitly threaded setting.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 13

slide-14
SLIDE 14

The future

Limits on parallel performance: Amdahl’s Law

Number of Processors

1 2 3 4 6 8 12 16 24 32 40 48

Efficiency

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100% 99% 95% 90% 80%

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 14

slide-15
SLIDE 15

The future

Speculation

I Amdahl’s Law tells us that as the number of cores increases,

execution time will be dominated by sequential code.

I Speculation is an important tool for introducing parallelism in

  • therwise sequential code.

I PML supports both deterministic and nondeterministic

speculation.

I For many applications, we can relax determinism and still get a

correct answer.

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 15

slide-16
SLIDE 16

Conclusion

Credits

I Matthew Fluet (RIT) I Claudio Russo (MSR Cambridge) I Sven Auhagen, Lars Bergstrom, Mike Rainey, Adam Shaw, and

Yingqi Xiao (U. of Chicago Graduate Students)

I Carsen Berger, Stephen Rosen, and Nora Sandler (U. of Chicago

Undergraduates)

I Chelsea Bingiel, Nic Ford, Korie Klein, Joshua Knox, Jordan

Lewis, and Damon Wang (Past U. of Chicago Undergraduates)

I National Science Foundation

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 16

slide-17
SLIDE 17

Conclusion

Questions?

http://manticore.cs.uchicago.edu

MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 17