A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th - - PowerPoint PPT Presentation

▶

Aug 31, 2023 471 likes •591 views

A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th Annual Workshop on Charm++ and its Applications Thursday, April 16th, 2009 1 Existing Parallel Programming Models MPI Model Charm++ Model One Thread Per Processor

SLIDE 1

A Generic Adaptive Runtime Autotuning Framework

Isaac Dooley 7th Annual Workshop on Charm++ and its Applications Thursday, April 16th, 2009

SLIDE 2

Existing Parallel Programming Models

Parallel Runtime System Application

MPI Model One Thread Per Processor

Parallel Runtime System Application

Charm++ Model Overdecomposition

Dynamic Load Balancing

f Chare Objects to Processors

SLIDE 3

Runtime System Controls the Application

Parallel Runtime System

Application Instrumented Performance

Adaptive Control System

Experiment History Knowledge of Control Points Instrumented Performance Characteristics

Application

Control Points Control Points 3

SLIDE 4

Intelligent Tuning

Measured Performance Metrics (Input to Controller) Processor Utilization Processor Overhead Memory Utilization Cache Performance Application Decomposition Granularity Communication Volume Critical Path Profiling Descriptive Categorizations for Application Behavior as Control Point Values are Increased Task Decomposition Granularity Task Scheduling Priorities Degree of Pipeline Streaming Memory Usage Prefetch / Lookahead Distance

SLIDE 5

Control Point API

Application Exposes Control Point Values: int controlPointValue = controlPoint("Control Point Name", 1, 50); Application Specified Performance: registerControlPointTiming(time); Control Point Framework Instructs Application to adapt: CkCallback myCallback (CkIndex_Main::controlPointChange(NULL),proxy); registerControlPointChangeCallback(myCallback); Describe Knowledge: controlPointPriorityArray("Control Point Name", ArrayProxy); controlPointPriorityEntry("Control Point Name", EntryMethod);

SLIDE 6

Use Cases

Adjust task/data granularity Adjust scheduling priorities Adjust load balancing parameters Choose algorithmic alternatives Apply various communication optimizations

SLIDE 7

Tuning Critical Path Priorities

SLIDE 8

Control Point Configuration Space Pipelined Filtering

Performance within 2.0% of best Performance within 1.0% of best Performance less than 98.0% of best

Legend:

Smaller Squares Represent Lower Performance Number of Worker Chares (Pipeline Stages) 1 64 Input Slice Size 1 1024 2 4 512

SLIDE 9

Number of Worker Chares (partitions) in X Dimension 1 50 Number of Worker Chares (partitions) in Y Dimension 1 50 Performance within 2.0% of best Performance within 1.0% of best Performance less than 98.0% of best

Legend:

Smaller Squares Represent Lower Performance

Control Point Configuration Space 2D Jacobi

SLIDE 10

Future Work

Improve critical path profiles. Detect & fix more patterns of known performance problems. Use with complicated applications & algorithms such as MD and LU. Find appropriate ways to expose application knowledge. Build an expert system combining all the patterns we discover.

SLIDE 11

A Generic Adaptive Runtime Autotuning Framework Isaac Dooley 7th - - PowerPoint PPT Presentation

A Generic Adaptive Runtime Autotuning Framework

Existing Parallel Programming Models

Runtime System Controls the Application

Intelligent Tuning

Control Point API

Use Cases

Tuning Critical Path Priorities

Control Point Configuration Space Pipelined Filtering

Control Point Configuration Space 2D Jacobi

Future Work

The End

Questions? Suggestions? Isaac Dooley idooley2@uiuc.edu