[PPT] - Near-Optimal Adaptive Control of a Large Grid Application Det PowerPoint Presentation

SLIDE 1

Near-Optimal Adaptive Control

f a Large Grid Application

Det Buaklee Greg Tracy Mary Vernon Steve Wright

Computer Science Department University of Wisconsin - Madison

SLIDE 2

ICS’02 New York City June 26, 2002 [2]

Talk Outline

Condor
Stochastic Optimization, ATR
ATR Execution Time Analysis
Model for Minimum Execution Time
Results: Optimized ATR Performance

SLIDE 3

ICS’02 New York City June 26, 2002 [3]

Condor

Provides high throughput computation
Manages a heterogeneous & dynamic pool
MW layer supports Master-Worker applications

– Submitting node is the “master” node – Condor dynamically allocates “worker” nodes – Worker nodes can drop out during computation (min,max)

Application MW Layer Condor PVM/TCP

Communication Link

SLIDE 4

ICS’02 New York City June 26, 2002 [4]

Stochastic Optimization

Non-trivial ~ 10,000 lines + LP codes
Optimization of a model with uncertain data

– Large number of possible scenarios for the data – Arises in planning-under-uncertainty applications

x: vector of variables (unknowns)

– aim to find the x that optimizes expected model performance over all the scenarios

Objective function is an expectation Q(x)

min cTx + Q(x) subject to Ax = b, x ≥ 0 x

SLIDE 5

ICS’02 New York City June 26, 2002 [5]

Probabilistic weighted sum over the objective

for each individual scenario ωi, i=1,2,…N

Properties of Expectation Q(x)

Q(x)

x

i=1 N

Q(x) = ∑ pi Q(x;ωi)

N is number of scenarios evaluated

–Maybe sampled from the full set of scenarios –Increase N to improve the accuracy

SLIDE 6

ICS’02 New York City June 26, 2002 [6]

ATR Parallelism

N = 16 = number of scenarios evaluated G = 4 = number of task groups T = 8 = number of tasks per iteration

N

For each Iteration

master

G T

workers

SLIDE 7

ICS’02 New York City June 26, 2002 [7]

Goals

Given N and a set of workers:

Compute (near)optimal adaptive values of B, G, T

– Automated process – Fast/simple runtime computation

Compare adaptive and non-adaptive B, G, and

grouping/scheduling of tasks

Approach: LogP/LogGP/LoPC model

SLIDE 8

ICS’02 New York City June 26, 2002 [8]

ATR in parallel

Each task i returns value of ΣiQ(x;ω), and a

subgradient (slope) for this partial sum

Sum over tasks to obtain complete function

Q(x) and its subgradient

x1 Q(x1;ω1) Q(x1;ω2) Q(x1;ω3) Q(x2;ω1) Q(x2;ω2) Q(x2;ω3) Q(x)

Master Workers

At the end of each iteration, set new x to be

minimizer of the latest approximation to Q(x) x2 x3

SLIDE 9

ICS’02 New York City June 26, 2002 [9]

Execution Time Analysis

Measure LogP/LogGP/LoPC model parameters

– L (network latency) – o (message processing overhead) – G (gap per byte - Bandwidth) – P (number of Processors

master execution time worker execution time communication time

SLIDE 10

ICS’02 New York City June 26, 2002 [10]

Execution Time Measurement

One master and one worker experiment
High variability

1.35 2.69 5.19 10.36 10.56 20.54 avg Worker Execution Time (sec) 13.27 0.05 3.33 21 2411 3.30 6.74 400 400 6.25 0.03 2.25 25 2092 3.84 6.12 200 200 3.41 0.05 1.57 31 1162 3.40 5.94 100 100 7.60 0.05 2.42 32 1936 3.64 6.83 100 50 3.05 0.01 1.32 47 1405 3.56 6.04 50 50 2.06 0.01 0.38 82 915 3.36 6.51 25 25 max min avg num it. max min avg Master Time to Compute a New Iterate, x (sec) Master Time to Update Model Function m(x) (msec) T G

2 3 1

SLIDE 11

ICS’02 New York City June 26, 2002 [11]

Worker Execution Times

For a given planning problem tw is linear in

– Number of scenarios evaluated – Processor speed

10 20 30 40 50 200 400 600 800 1000 1200

Number of Scenarios Evaluated (N/G) Worker Execution Time (sec)

MIPS 600 MIPS 780 MIPS 1100 MIPS 1700

1

Total worker time = n(tw)max

SLIDE 12

ICS’02 New York City June 26, 2002 [12]

Updating m(x) after each task group (G) returns

Variability in execution time due to:

– Excessive default debug I/O – Interference from Condor administrative tasks

Eliminating both makes this execution time <1ms

i.e., negligible

Master Execution Times

0.01 0.1 1 10 100 1000 1000 2000 3000 4000

Worker Completion Event Count Time to Update m (x ) (msec)

lightly loaded master, default debug level lightly loaded master, reduced debug level isolated master, reduced debug level

2

SLIDE 13

ICS’02 New York City June 26, 2002 [13]

Hard to make prediction for the next iterate
Same characteristic for all planning problem

Master Execution Times

1 2 3 4 5 6 7 10 20 30 40 Iteration Number Time to Compute New x (sec) T = 200 T = 100 5 10 15 20 25 30 200 400 600 800 Iteration Number Time to Compute New x (sec)

SSN network design problem 20term problem

Time to computing new x

3

SLIDE 14

ICS’02 New York City June 26, 2002 [14]

Master Execution Times

Generating new x at the end of each iteration:

25 50 75 100 200 400 600 800 1000

Number of Tasks (T) Total Master Processing Time (sec)

Number of iterations (n) and time to compute x for

each iteration depends on N, T

Given N, total master processing time (tM) is fixed!

3

0.5 1 1.5 2 2.5 3 3.5

200 400 600 800 1000

Number of Task (T)

Avg. Time to

Compute New x (sec)

20 40 60 80

Number of Iteration (n)

Optimize: T is large, but not too large

SLIDE 15

ICS’02 New York City June 26, 2002 [15]

Communication Costs

2 4 6 8 2 4 6 8 10 12 14 16

Size of Data Sent (KB) Time (usec) 0.00 0.14 0.28 0.42 0.56 0.70 0.84 0.98 2 4 6 8 10 12 14 16

Size of Data Sent (KB)

Time (sec)

Experiment 1 Experiment 2

Round trip time measurement
Critical path contains one round trip time per iterate
Round trip time << worker execution time

for message sizes used in ATR (250–1200 bytes)

Between local nodes Between Wisconsin and Bologna, Italy

SLIDE 16

ICS’02 New York City June 26, 2002 [16]

Effect of Basket Size

More iterations (n) needed for larger B –

approximately linear relationship of B and n

Optimal B=1

20 40 60 80 100 120 140 160 1 2 3 4 5 6 Basket Size (B) Number of Iterations (n) maximum average minimum

SLIDE 17

ICS’02 New York City June 26, 2002 [17]

Model Vocabulary

N number of scenarios in model T number of tasks per iteration G number of groups of scenarios (units of work) B number of vectors x evaluated in parallel tM total master execution time tW individual worker execution time n total number of iterations

SLIDE 18

ICS’02 New York City June 26, 2002 [18]

Building the Model

Master, Worker, Communication Times

Total master execution time

– Variable for N, T, B – Include only time to generate new x

Worker execution time per iteration:

– Very low variation – Consistent from one iteration to another

Insignificant contributions from:

– Communication time – Master updating Q(x) – (if T not too large)

tM + n(tw)max tM + n(tw)max

SLIDE 19

ICS’02 New York City June 26, 2002 [19]

Model Validation for Homogenous Worker Pool Model: tM + ntw

WI-Argonne Flock 24.9 22.9 20.96 441 44 400 20,000 ssn WI-pool 12.1 10.3 6.32 64 44 100 10000 ssn WI-Argonne Flock 29.3 26.4 20.88 295 61 200 20,000 ssn WI-Argonne Flock 36.3 33.5 20.89 244 84 100 20,000 ssn WI-Argonne Flock 44.7 40.8 20.91 180 108 50 20,000 ssn WI-NM Flock 52.2 48.8 30.97 297 84 100 40,000 ssn WI pool 70.5 69.4 2.35 2762 597 200 5,000 20-terms

Measured Model Total (tM) num it. (n) Note Total Execution Time (min) Benchmark Average (tW) (sec) Compute New x (sec) T N Planning Problem

SLIDE 20

ICS’02 New York City June 26, 2002 [20]

Model Validation for Heterogeneous Worker Pool

200 9.67 7.15 10.21 1.68 2.11 61.3 36 150 9.63 6.65 9.78 1.37 2.86 46.77 36 150 9.07 6.85 9.42 1.38 2.76 53.18 38 150 13.75 10.73 13.88 1.36 2.86 60.71 42 50 14.35 13.96 13.82 4.18 6.62 35.8 58 50 35.07 34.22 28.62 4.19 7.03 50.02 70 50 34.65 34.23 28.62 4.21 7.04 50.37 70 Number of Workers Request Measur ed Model tw

max

tw

min

avg. tM n Non Adaptive Execution Time (min) Worker Time ( sec) Computing new X (sec)

Model: tM + n(tw)max

SLIDE 21

ICS’02 New York City June 26, 2002 [21]

Optimal Configuration for Homogenous Worker Pool

G should be equal to number of available processors
T should be large up to a point
B should be set to 1

18 min 149 min 68 min 92 min 61 min B=6 B=3 B=6 B=3 Default Debug Reduced Debug Near-Optimize ATR Execution Time Original ATR Execution Time (T = 100, G = 25)

3x – 6x faster!

SLIDE 22

ICS’02 New York City June 26, 2002 [22]

20 40 9 18 27 9 18 27 15 13 10 20 26 30 1 2 3 4 5 6 7 8

Heterogeneous task assignment

20 20

9 9 10 13 15 20 20 20 master node’s worker queue master node’s job queue per iteration benchmark: Ew:

SLIDE 23

ICS’02 New York City June 26, 2002 [23]

Adaptive task assignment

Heterogeneous & dynamic worker pool
Better utilization of worker node

Original task assignment Execution Time (min) 52% 91 35.59 39.56 100 82.94 98.6 8.06 20.33 29% 67 3.24 6.78 100 9.52 9.6 0.83 2.84 45% 86 3.41 4.78 100 8.67 9.5 1.20 2.25 49% 26 9.86 11.43 50 22.32 19.4 2.58 7.76 17% 45 8.03 10.50 50 12.66 8.9 1.69 4.02 Number

f

Workers Used Model Measured Number of Worker Model max min avg Estimated Speedup (%) Adaptive task assignment Execution time (min) Worker Time (tW) (sec)

SLIDE 24

ICS’02 New York City June 26, 2002 [24]

Conclusion

Analysis of Grid Application Execution Time
Construct, Validate a Simple Performance Model
Create an Adaptive Control scheme guided by
ur Performance Model
Optimal adaptive parameter gives large speedup

(3x-6x) over original ATR code

Adaptive task assignment gives 15-55%

speedup over original policy, for optimal parameter values

SLIDE 25

ICS’02 New York City June 26, 2002 [25]

Future Work

Apply the model to larger data sets
Apply the model to more complex objectives

such as controlling processor utilization

Apply this model to other grid applications

SLIDE 26

ICS’02 New York City June 26, 2002 [26]

Acknowledgments

Jeff Linderoth (ATR)
Jichuan Chang (MW)
condor-admin@cs.wisc.edu

SLIDE 27

ICS’02 New York City June 26, 2002 [27]

Question!?

SLIDE 28

ICS’02 New York City June 26, 2002 [28]

Stochastic Optimization Example

First month data: Demand 10 units, Price

$1.00/unit, Storage cost $0.05/unit

Possible second month scenarios:

1.80 27.0 0.05 Very cold 1.50 14.0 0.15 Cold 0.85 08.0 0.30 Warm 1.00 10.0 0.50 Normal Price Demand Prob. Scenario

SLIDE 29

ICS’02 New York City June 26, 2002 [29]

ATR

“Asynchronous Trust-Region”

algorithm for minimizing Q(x) subject to the constraints

Iterative fork-join

synchronization structure

Unpredictable number of

iteration to converge

Adjustable task parameter
15,000 lines of code

SLIDE 30

ICS’02 New York City June 26, 2002 [30]

Even More Parallelism!

Possibly generate new x before all Q(x;ωi)

return!

Now only have partial info about Q(x), so

expect lower quality estimates of x

Example:

x1 x2 Q(x)

Master Workers

Q(x1;ω1) Q(x1;ω2) Q(x1;ω3)

SLIDE 31

ICS’02 New York City June 26, 2002 [31]

ATR Vocabulary

N number of scenarios in the model (possible values for the uncertain data)

e.g., N = 5,000 or N = 40,000

G number of groups of scenarios (units of work)

e.g., G = 50 or G = 100

T number of tasks in iteration

e.g., T = 200 or T = 1,000

B number of variables x evaluated in parallel

e.g., B = 5

SLIDE 32

ICS’02 New York City June 26, 2002 [32]

Adaptive Control Algorithm

Sorting the worker list based on benchmarks

– Benchmark = execution time of a sample task group

n this worker

– Indicates the expected time needed for this worker to complete one task group

For each worker w, define

Ew = (# currently assigned tasksw +1) * benchmarkw

New task will be assigned to the worker with