Concurrency and Parallelism in ML John Reppy University of Chicago - - PowerPoint PPT Presentation
Concurrency and Parallelism in ML John Reppy University of Chicago - - PowerPoint PPT Presentation
Concurrency and Parallelism in ML John Reppy University of Chicago MacQueen Fest May 12, 2012 History Personal history I ML on Unix (Cardelli ML) I ML + Amber = Pegasus ML I Standard ML of New Jersey (Version 0.15 on tape) I Pegasus ML
History
Personal history
I ML on Unix (Cardelli ML) I ML + Amber =
⇒ Pegasus ML
I Standard ML of New Jersey (Version 0.15 on tape) I Pegasus ML + SML/NJ =
⇒ Concurrent ML
I =
⇒ Ph.D.!!!
I =
⇒ Department 11261 at Bell Labs
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 2
Why ML?
What makes parallelism and concurrency hard?
The sequential core matters!
I Combination of shared mutable state and concurrency leads to
data races and non-determinism.
I Adding synchronization to avoid these problems leads to
deadlock.
I Shared memory does not scale well to NUMA and Distributed
Memory architectures.
I Scaling is hard.
Claim: traditional imperative programming languages are a bad fit for concurrent and parallel programming.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 3
Why ML?
Alternatives
I Java, C#. etc. I Haskell I X10
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 4
Why ML?
Standard ML
Claim: what we want is a strict, statically typed, functional language. I.e., Standard ML
I Strict CBV semantics I Type system distinguishes between mutable and non-mutable
values.
I Programming style is value-oriented.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 5
Why ML?
Challenges
SML does not come without challenges.
I Polymorphism I Higher-order functions I Garbage collection I Exceptions
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 6
Parallel ML
The Manticore Project
I The Manticore project is our effort to address the programming
needs of commodity applications running on multicore SMP systems
I No shared memory I Preserve determinism where possible I Declarative mechanisms for fine-grain parallelism
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 7
Parallel ML
The Manticore Project (continued ...)
Our initial language is called Parallel ML (PML).
I Sequential core language based on subset of SML: strict with no
mutable storage.
I A variety of lightweight implicitly-threaded constructs for fine-grain
parallelism.
I Explicitly-threaded parallelism based on CML: message passing
with first-class synchronization.
I Prototype implementation with good scaling on 48-way parallel
hardware for a range of applications.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 8
Parallel ML
Implicit threading
PML provides several light-weight syntactic forms for introducing parallel computation.
I Parallel tuples provide a basic fork-join parallel computation. I Nested Data-parallel arrays provide fine-grain data-parallel
computations over sequences.
I Parallel bindings provide data-flow parallelism with cancellation of
unused subcomputations.
I Parallel cases provide non-deterministic speculative parallelism.
These forms are annotations that mark a computation as a good candidate for parallel execution, but the details are left to the implementation.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 9
Parallel ML
Challenges revisited
SML does not come without challenges.
I Polymorphism — whole program monomorphism using MLton’s
front end
I Higher-order functions — advanced CFA techniques I Garbage collection — DGL split-heap GC and parallel global GC I Exceptions — reduce use of arithmetic exceptions
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 10
Parallel ML
PML performance
Number of processors
4 8 16 24 30 36 40 48
Speedup
4 8 12 16 20 24 28 32 36 40 44 48 perfect RayTracer QuickSort Black−Scholes Barnes−Hut
Speedup over sequential PML.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 11
The future
The need for shared mutable state
I Mutable storage is a very powerful communication mechanism:
essentially a broadcast mechanism supported by the memory hardware.
I Sequential algorithms and data-structures gain significant
(asymptotic) performance benefits from shared memory (e.g., union-find with path compression).
I Some algorithms seem hard/impossible to parallelize without
shared state (e.g., mesh refinement).
I But shared memory makes parallel programming hard, so we
want to be cautious in adding it to PML.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 12
The future
The design challenge
I How do we add shared memory while preserving PML
’s declarative programming model for fine-grain parallelism?
I Some races are okay in an implicitly threaded setting. I Deadlock is not okay in an implicitly threaded setting.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 13
The future
Limits on parallel performance: Amdahl’s Law
Number of Processors
1 2 3 4 6 8 12 16 24 32 40 48
Efficiency
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 100% 99% 95% 90% 80%
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 14
The future
Speculation
I Amdahl’s Law tells us that as the number of cores increases,
execution time will be dominated by sequential code.
I Speculation is an important tool for introducing parallelism in
- therwise sequential code.
I PML supports both deterministic and nondeterministic
speculation.
I For many applications, we can relax determinism and still get a
correct answer.
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 15
Conclusion
Credits
I Matthew Fluet (RIT) I Claudio Russo (MSR Cambridge) I Sven Auhagen, Lars Bergstrom, Mike Rainey, Adam Shaw, and
Yingqi Xiao (U. of Chicago Graduate Students)
I Carsen Berger, Stephen Rosen, and Nora Sandler (U. of Chicago
Undergraduates)
I Chelsea Bingiel, Nic Ford, Korie Klein, Joshua Knox, Jordan
Lewis, and Damon Wang (Past U. of Chicago Undergraduates)
I National Science Foundation
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 16
Conclusion
Questions?
http://manticore.cs.uchicago.edu
MacQueen Fest — May 12, 2012 Concurrency and Parallelism in ML 17