[PPT] - ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , PowerPoint Presentation

SLIDE 1

ALERT: Accurate Learning for Energy and Timeliness

Chengcheng Wan, Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu

SLIDE 2

2

DNN is Deployed Everywhere

Weather forecast QA robot Auto driving Smart city Trading Text generator

SLIDE 3

3

A L E

DNN System

DNN Deployment is Challenging.

? Road

Challenges

Configuration space is huge
Environment may change dynamically
Must be low overhead

SLIDE 4

Previous Work

4

Previous Works Challenges

Low Overhead Dynamic Environment Huge Space of Configuration DNN design Resource Management

[1] H. Hoffmann et. al. Jouleguard: energy guarantees for approximate applications. SOSP, 2015. [2] C. Imes et. al. Poet: a portable approach to minimizing energy under soft real-time constraints. RTAS, 2015 [3] N. Mishra et. al. CALOREE: learning control for predictable latency and low energy. ASPLOS, 2018. [4] A. Rahmani et. al. SPECTR: formal supervisory control and coordination for many-core systems resource management. ASPLOS, 2018. …

SLIDE 5

Our ALERT System

5

A L E R T

A L E

Measurement DNN & Power Cap Selection

DNN System

? Road

Feedback-based estimation Challenges

Configuration space is huge
Environment may change dynamically
Must be low overhead

SLIDE 6

Our ALERT System

6

A L E R T

A L E

Measurement Feedback-based estimation DNN & Power Cap Selection

DNN System

? Road

ξ

Challenges

Configuration space is huge
Environment may change dynamically
Must be low overhead

SLIDE 7

Evaluation Highlights

7

✔ ALERT satisfies LAE constraints. 99.9% cases for vision; 98.5% cases for NLP ✔ Probabilistic design overcomes dynamic variability efficiently. ALERT achieves 93-99% of Oracle’s performance ✔ Coordinating App- and Sys- level improves performance. Reduces 13% energy and 27% error over prior approach

SLIDE 8

Outline

8

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

SLIDE 9

Outline

9

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

SLIDE 10

Experiment Settings

10

3

Tasks

Image classification (ImageNet) Sentence prediction (PTB) Question Answering (SQuAD)

Platforms

ODroid, CPUs, GPU

4 4

DNNs

ResNet50, VGG16, RNN, Bert

SLIDE 11

5 10 15 20 25 30 35 40 0.05 0.1 0.15 0.2 Top-5 Error Rate (%) Inference Time of One Image (s)

42 DNNs on ImageNet classifications

11

MobileNet-v1 (α=1) MobileNet-v2 (α=1.3) ResNet50 NasNet-large PnasNet-large

Tradeoffs from DNNs

High accuracy comes with long latency.

SLIDE 12

12 12.5 13 13.5 14 14.5 15 15.5 16 0.07 0.09 0.11 0.13 0.15 0.17 Average Energy (J) Inference Time of One Image (s)

12

Tradeoffs from System Settings

Power limit setting (W) Least Energy Fastest

No setting is optimal for both energy and latency.

SLIDE 13

Run-time Variability

13

Without co-locate job With co-locate job

SLIDE 14

Run-time Variability

14

Latency variation increased by co-located jobs.

Without co-locate job With co-locate job

SLIDE 15

15

Potential Solutions

10 20 30 40 50 60 70 80 90 100 Average Energy (J) Constraint Settings (deadline × accuracy_goal) Sys-level App-level Combined

∞

Deadline 0.1s 0.2s 0.3s 0.4s 0.5s 0.6s 0.7s

Combining both level achieves best performance.

SLIDE 16

Outline

16

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

SLIDE 17

Three Dimensions & Two Tasks

17

Minimize Energy

With accuracy goal and inference deadline

A L E

Inference Deadline Accuracy Goal Energy Consumption Goal

Maximize Accuracy

With energy consumption goal and inference deadline

SLIDE 18

18

Configurations Constraints

<X <Y

Optimization max( )

Maximize Accuracy Task

A L E

2,2 2,3 3,2 3,3 2,1 3,1 1,2 1,3 1,1 4,2 4,3 4,1

DNNs Power cap

SLIDE 19

How to estimate the inference latency?

Two key challenges

○ Runtime variation: The inference time may be different even for same the configuration

19

L

52 46 58 53 50 70 75 99 94

Profiling Runtime

51 …

SLIDE 20

How to estimate the inference latency?

Two key challenges

○ Runtime variation ○ Too many combinations of DNNs and resources

20

L

X … … X … X … X X … X X … X X

DNNs dl … d2 d1 Power Cap p1 p2 … pk

SLIDE 21

Potential Solution

Kalman filter

○ Estimate latency for each configuration ○ Use recent execution history

21

L

DNN2, P1

43 58 49 51

DNN1, P2

30 31 History Prediction 52 29

SLIDE 22

Potential Solution: drawback

Cannot solve the problem

○ Not enough history for each configuration

22

L

DNN2, P1

43 58 49 51

DNN1, P2

30 31 History Prediction 52 29

DNN1, P1

?

DNN2, P2

?

SLIDE 23

How to estimate the inference latency?

Global Slow-down factor ξ

○ Use recent execution history under any DNN or resources

23

L

ξ

150%

DNN2, P1 DNN1, P2 DNN1, P1 DNN2, P2

51 30 ? ? 34 20 40 30 60 45 Profiling Runtime

SLIDE 24

How to estimate the inference latency?

Mean estimation is not sufficient

○ The variation might be too big to provide a good prediction.

Different implications on DNN selection

24

L

Sequence 1

52 43 58 49 50

Sequence 2

51 50 49 49 50 15 10 99 70

Sequence 3

50 History Prediction

Mean Variation

5 1 40

SLIDE 25

How to estimate the inference latency?

Global Slow-down factor ξ

○ Use recent execution history under any DNN or resources ○ Estimate its distribution: mean and variance

25

L

History

52 43 58 49 50

Mean Variation

5

ξ

SLIDE 26

How to estimate accuracy under a deadline?

26

A

Can inference be finished before deadline?

○ If yes, training accuracy of the selected DNN ○ If not, random guess accuracy ■ Unless it’s an Anytime DNN. Inference Accuracy Time

!" #",% !&'()

SLIDE 27

What is an Anytime DNN?

27

A

Traditional DNN Anytime DNN Timeline Deadline Road Chocolate Ground Road

[1] C. Wan et. al. Orthogonalized SGD and Nested Architectures for Anytime Neural Networks . ICML, 2020.

SLIDE 28

How to estimate accuracy under a deadline?

28

A

Can inference be finished before deadline?

○ If yes, training accuracy of the selected DNN ○ If not, ■

Traditional DNN: random guess accuracy.

■

Anytime DNN: accuracy of the last output

Traditional DNN Anytime DNN Inference Accuracy Time

!" #" !&'()

Time

!" !* !+

Inference Accuracy

#+ #* #" !&'()

SLIDE 29

How to estimate accuracy under a deadline?

29

A

Accuracy- Latency Relation Latency Distribution Expectation

f Accuracy

SLIDE 30

How to manage energy?

30

Power-cap as a knob to configure system resource
Idle power: other process may still consume energy when DNN

inference has finished

E

Power Time

DNN active1 DNN active2 DNN Idle New input Latency Target

SLIDE 31

？× time Power setting × time

How to estimate the energy consumption?

31

Estimate energy from power

○ DNN active power is power setting ○ DNN idle power is estimated by Kalman filter

E

Power Time

DNN active DNN Idle New input Latency Target

SLIDE 32

Our ALERT System

32

A L E R T

A L E

DNN & Power Cap Selection Measurement Feedback-based estimation

DNN System

? Road

SLIDE 33

Outline

33

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

SLIDE 34

Experiment Settings

34

3

Platforms

CPUs, GPU

5

Scenarios

Default, Compute intensive (2), Memory intensive (2)

2

DNNs

Sparse ResNet50, RNN

2

Tasks

1. Minimize energy
2. Maximize accuracy

SLIDE 35

Schemes

35

Oracles

Oracle: Change configuration for every input. Assume perfect

knowledge of future. Emulated from profiling result.

Oracle-static: Same configuration for all inputs.

Baselines

Sys-only: Only adjust power-cap
App-only: Use an Anytime DNN
No-coord: Anytime DNN without coordination with power-cap

SLIDE 36

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Minimize Energy App-only Sys-only No-coord Sys+App(ALERT) Oracle

Evaluation: Scheduler Performance

36

Average performance normalized to Oracle_Static (Smaller is better) Violations (%)

SLIDE 37

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Minimize Error App-only Sys-only No-coord Sys+App(ALERT) Oracle

Evaluation: Scheduler Performance

37

Average performance normalized to Oracle_Static (Smaller is better) Violations (%)

SLIDE 38

How ALERT Works with Traditional DNN

38

Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment

SLIDE 39

How ALERT Works with Traditional DNN

39

Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment

SLIDE 40

How ALERT Works with Anytime +Traditional DNN

40

Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment

SLIDE 41

Conclusion

41

Understand DNN inference challenges
ALERT Run-time inference System
High performance and energy efficiency