A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th - - PowerPoint PPT Presentation

a generic adaptive runtime autotuning framework
SMART_READER_LITE
LIVE PREVIEW

A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th - - PowerPoint PPT Presentation

A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th Annual Workshop on Charm++ and its Applications Thursday, April 16th, 2009 1 Existing Parallel Programming Models MPI Model Charm++ Model One Thread Per Processor


slide-1
SLIDE 1

A Generic Adaptive Runtime Autotuning Framework

Isaac Dooley 7th Annual Workshop on Charm++ and its Applications Thursday, April 16th, 2009

1

slide-2
SLIDE 2

Existing Parallel Programming Models

Parallel Runtime System Application

MPI Model One Thread Per Processor

Parallel Runtime System Application

Charm++ Model Overdecomposition

Dynamic Load Balancing

  • f Chare Objects to Processors

2

slide-3
SLIDE 3

Runtime System Controls the Application

Parallel Runtime System

Application Instrumented Performance

Adaptive Control System

Experiment History Knowledge of Control Points Instrumented Performance Characteristics

Application

Control Points Control Points 3

slide-4
SLIDE 4

Intelligent Tuning

Measured Performance Metrics (Input to Controller) Processor Utilization Processor Overhead Memory Utilization Cache Performance Application Decomposition Granularity Communication Volume Critical Path Profiling Descriptive Categorizations for Application Behavior as Control Point Values are Increased Task Decomposition Granularity Task Scheduling Priorities Degree of Pipeline Streaming Memory Usage Prefetch / Lookahead Distance

4

slide-5
SLIDE 5

Control Point API

Application Exposes Control Point Values: int controlPointValue = controlPoint("Control Point Name", 1, 50); Application Specified Performance: registerControlPointTiming(time); Control Point Framework Instructs Application to adapt: CkCallback myCallback (CkIndex_Main::controlPointChange(NULL),proxy); registerControlPointChangeCallback(myCallback); Describe Knowledge: controlPointPriorityArray("Control Point Name", ArrayProxy); controlPointPriorityEntry("Control Point Name", EntryMethod);

5

slide-6
SLIDE 6

Use Cases

Adjust task/data granularity Adjust scheduling priorities Adjust load balancing parameters Choose algorithmic alternatives Apply various communication optimizations

6

slide-7
SLIDE 7

Tuning Critical Path Priorities

7

slide-8
SLIDE 8

Control Point Configuration Space Pipelined Filtering

Performance within 2.0% of best Performance within 1.0% of best Performance less than 98.0% of best

Legend:

Smaller Squares Represent Lower Performance Number of Worker Chares (Pipeline Stages) 1 64 Input Slice Size 1 1024 2 4 512

8

slide-9
SLIDE 9

Number of Worker Chares (partitions) in X Dimension 1 50 Number of Worker Chares (partitions) in Y Dimension 1 50 Performance within 2.0% of best Performance within 1.0% of best Performance less than 98.0% of best

Legend:

Smaller Squares Represent Lower Performance

Control Point Configuration Space 2D Jacobi

slide-10
SLIDE 10

Future Work

Improve critical path profiles. Detect & fix more patterns of known performance problems. Use with complicated applications & algorithms such as MD and LU. Find appropriate ways to expose application knowledge. Build an expert system combining all the patterns we discover.

10

slide-11
SLIDE 11

The End

Questions? Suggestions? Isaac Dooley idooley2@uiuc.edu

11