[PPT] - ADEPT Scalability Predictor in Support of Adaptive Resource PowerPoint Presentation

SLIDE 1

Arash Deshmeh, Jacob Machina, and Angela C. Sodan

University of Windsor, Canada

ADEPT Scalability Predictor in Support of Adaptive Resource Allocation

IPDPS 2010

SLIDE 2

Outline

Background: Adaptive Resource Allocation Related Work Downey Runtime/Speedup Model The ADEPT Predictor Experimental Results Anomaly Detection Automated Reliability Judgment Summary and Conclusion

SLIDE 3

Background: Adaptive Resource Allocation

Adaptive resource allocation:

Up to 70% improvement in avg. response times by

Reducing fragmentation Adapting to current load (low/high)

98% of applications said to be moldable

Requires knowing jobs’ scalability / efficiency

but not practically available yet

In fact, it is a response-time function in dependence on CPU/core resources (Burton Smith)

SLIDE 4

Illustration of Adaptive Resource Allocation

Fragmentation reduction Adaptation to current load

Job 2 with
riginal Size 10

Run at higher efficiency with smaller

sizes if high load Run at lower efficiency with larger sizes of low load

size Ideal Real Speedup

min

N

pt

N

max

N

SLIDE 5

More Background

Benefits for user:

Help in choosing job sizes tactically Determine maximum meaningful job sizes

( our data about real applications)

Relevance for resource allocation in:

Clusters (MPI jobs) SMPs (OpenMP or MPI jobs) Virtual-machine resource provisioning

SLIDE 6

Related Work

Most approaches are white-box (detailed model)

Require tools: code instrumentation, compiler/OS support,

analysis of memory-access behavior, etc.

Complex and computationally expensive

Unsuitable for large-scale use in HPC centers Valuable for cross-site or new-platform performance

projection

Black-box approaches (few observ. points, simple model)

Easy-to-use and cheap Suffer from anomalies or non-uniform scalability patterns

SLIDE 7

Goals of ADEPT Scalability Predictor

Goals of ADEPT

Achieve high prediction accuracy Provide computationally efficient approach Detect and automatically correct individual anomalies Detect and model non-uniform patterns (multi-phase) Perform reliability judgment with potential advice for

utcome improvement

Apply black-box prediction Based on Downey runtime/speedup model

SLIDE 8

Downey Model

Simple (only A and to be learned) Needs few observation points

Speedup Curves, A varies

50 100 150 200 250 300 350 100 200 300 400

Speedup Curves, varies

50 100 150 200 250 300 100 200 300 400

Speedup curves for Downey m

del and a

typical application

20 40 60 80 100 120 140 160 100 200 300 400

Typical application Downey model

Flat Linear Transitional Declining

+ (A+A-)/n +1 nA(+1) / ((n+A- 1)+A) A 1 n A+A- A+A- n High variance

(A-/2)/n + /2

(A-1/2)/n + 1 - /2 1 An / (A+(/2)(n-1)) An / ((A-1/2+n(1-/2)) A 1 n A A n 2A-1 2A-1 n Low variance T(n) S(n) n range Mode

SLIDE 9

ADEPT Predictor

1. Anomaly detection and scalability-pattern identification

2. Envelope derivation
3. Curve fitting

4. Reliability judgment

Core of ADEPT

SLIDE 10

Core: Envelope Derivation

Derives constraints from observations Calculates closed-form solutions (within certain

percentage of deviation) from pairs of observations

Use lowest and highest bounds as overall envelope

Forming the Envelope

50 100 150 200 250 300 100 200 300 400

N S Range Pair 1 Range Pair 2 Range Pair 3

SLIDE 11

Core: Curve Fitting

Prediction per target point, biased to closest observations Weighted least-squared relative errors Two-step

1. Closest point fixed
2. Extending variation by certain percentage within envelope

Constraints from envelope and two-step curve fitting make

ADEPT both accurate and fast

Speedup Prediction Using 4 Methods

50 100 150 200 100 200 300 400 500

N S

Levm ar ADEPT / Exhaus tive / Genetic

SLIDE 12

Experimental Set-Up

Experiments with MPI and OpenMP NAS benchmarks BT, CG, FT, LU, SP 7 real anonymous applications

(from administrator scalability tests)

Both interpolation and extrapolation 3 to 4 input observation points Prediction of T(n) and S(n) T(1) not always available

SLIDE 13

Experimental Results: Speedup

NAS_FT

10 20 30 40 50 60 50 100 150 Standard Biased Weighting Predictions Uniform Weighting Predictions

App_A

2 4 6 8 10 12 5 10 15 20 25

NAS_OMP_BT

1 2 3 4 5 6 10 20 30 40

NAS_OMP_CG

1 2 3 4 5 6 7 8 10 20 30 40

App_E

20 40 60 80 100 120 50 100 150 200 250 300

App_F

10 20 30 40 50 60 70 80 50 100 150

Applied fitting approach better than non-weighted Both interpolation and extrapolation work well Most extrapolation still good on twice the number of nodes Accuracy higher for closer extrapolation

SLIDE 14

Experimental Results: Runtime

NAS_BT

1 10 100 1000 50 100 150 200 250 300

NAS_CG

1 10 100 1000 50 100 150

NAS_FT

1 10 100 1000 50 100 150

App_B

1 10 100 1000 10000 500 1000 1500 2000

App_D

1 10 100 1000 10000 100000 50 100 150 200 250 300

App_E

1 10 100 1000 10000 100000 50 100 150 200 250 300

Both interpolation and extrapolation work well

Whether T(1) available or not did not make any difference Some predictions perfect match (App_A, App_C, App_G) Accuracy higher for closer extrapolation

SLIDE 15

ADEPT Predictor

1. Anomaly detection and scalability-pattern

identification

2. Envelope derivation 3. Curve fitting 4. Reliability judgment

Core of ADEPT

SLIDE 16

Anomaly Detection

Serious deviations from model can be detected

(Application never fully conforms to model)

Approach: fluctuation metric R

Ri = ((ti * ni/ni+1)/ti+1)*(1+(ni+1-ni)/ni+1) (idea is relative speedup, normalized to distance) Check whether Ri+1 > (1+)Ri with being sensitivity factor both Ri+1 and Ri are anomaly candidates

SLIDE 17

Individual Anomalous Points

Speedup curve, with anomalous point

20 40 60 80 100 120 50 100 150 200 250

R Metric Curve

0.80 1.00 1.20 1.40 1.60 1.80 2.00 2.20 50 100 150 200

R Metric Curves

0.80 1.00 1.20 1.40 1.60 1.80 2.00 2.20 50 100 150 200

Anomaly, NAS_SP

20 40 60 80 100 120 140 50 100 150 200 250 300

Anomaly, NAS_OMP_SP

1 2 3 4 5 6 7 8 9 10 20 30 40

Anomaly, Synthetic

20 40 60 80 100 120 140 160 50 100 150 200 250

Minimum of 4 input points required
Check R curve after removal of anomaly candidate
If improvement, classify as anomaly point and reduce

its weight for curve fitting

SLIDE 18

Anomaly Patterns

Stepwise NAS_OMP_FT

1 2 3 4 5 6 7 10 20 30 40

Stepwise NAS_OMP_FT, Fitted

1 2 3 4 5 6 7 8 9 10 20 30 40

Stepwise Synthetic, Fitted

50 100 150 200 250 300 50 100 150 200 250 300 350

Specially Optimized for 2^n Nodes, Fitted

10 20 30 40 50 60 50 100 150 200

Currently considered:

Stepwise scalability (minimum of 5 points required)

Model instance per phase

Specially optimized for certain numbers of nodes, e.g. powers of two

(minimum of 9 points required), regular anomalous points Omit other points from curve fitting Report suitable allocations

SLIDE 19

ADEPT Predictor

1. Anomaly detection and scalability-pattern identification 2. Envelope derivation 3. Curve fitting

4. Reliability judgment

Core of ADEPT

SLIDE 20

Automated Reliability Judgment

All input points in linear section

More input points needed (n A)

High fitting error, not explainable as anomaly

Report problem

Runner-up problem (two or more model instances

with greatly different A match) More input points needed (beyond current range)

SLIDE 21

Automatic Reliability Judgment (2)

High Fitting Error, NAS_LU

50 100 150 200 250 50 100 150 200 250 300

All Linear Speedup, App_C

5 10 15 20 25 30 35 10 20 30 40

All 3 cases (linear, high-fitting error, runner-up) successfully detected

Runner-Up Model Instance, NAS_SP

1 10 100 1000 50 100 150 200 250

SLIDE 22

Summary and Conclusion

ADEPT is accurate and efficient

For both interpolation and extrapolation (if not too far away)
Works well without serial time T(1)
Performance similar to that reported in literature for white-box

approaches

Employs envelope derivation technique to constrain search

during model fitting

Biased model fitting with efficient two-level approach Anomaly detection based on fluctuation metric and automatic

correction

Warnings by reliability judgment if prediction uncertain Suitable for production environments

Extrapolative scalability prediction as feedback to users
Adaptive resource allocation