ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , - - PowerPoint PPT Presentation

alert accurate learning for energy and timeliness
SMART_READER_LITE
LIVE PREVIEW

ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , - - PowerPoint PPT Presentation

ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu DNN is Deployed Everywhere Trading Auto Smart driving city QA robot Weather Text forecast


slide-1
SLIDE 1

ALERT: Accurate Learning for Energy and Timeliness

Chengcheng Wan, Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu

slide-2
SLIDE 2

2

DNN is Deployed Everywhere

Weather forecast QA robot Auto driving Smart city Trading Text generator

slide-3
SLIDE 3

3

A L E

DNN System

DNN Deployment is Challenging.

? Road

Challenges

  • Configuration space is huge
  • Environment may change dynamically
  • Must be low overhead
slide-4
SLIDE 4

Previous Work

4

Previous Works Challenges

Low Overhead Dynamic Environment Huge Space of Configuration DNN design Resource Management

[1] H. Hoffmann et. al. Jouleguard: energy guarantees for approximate applications. SOSP, 2015. [2] C. Imes et. al. Poet: a portable approach to minimizing energy under soft real-time constraints. RTAS, 2015 [3] N. Mishra et. al. CALOREE: learning control for predictable latency and low energy. ASPLOS, 2018. [4] A. Rahmani et. al. SPECTR: formal supervisory control and coordination for many-core systems resource management. ASPLOS, 2018. …

slide-5
SLIDE 5

Our ALERT System

5

A L E R T

A L E

Measurement DNN & Power Cap Selection

DNN System

? Road

Feedback-based estimation Challenges

  • Configuration space is huge
  • Environment may change dynamically
  • Must be low overhead
slide-6
SLIDE 6

Our ALERT System

6

A L E R T

A L E

Measurement Feedback-based estimation DNN & Power Cap Selection

DNN System

? Road

ξ

Challenges

  • Configuration space is huge
  • Environment may change dynamically
  • Must be low overhead
slide-7
SLIDE 7

Evaluation Highlights

7

✔ ALERT satisfies LAE constraints. 99.9% cases for vision; 98.5% cases for NLP ✔ Probabilistic design overcomes dynamic variability efficiently. ALERT achieves 93-99% of Oracle’s performance ✔ Coordinating App- and Sys- level improves performance. Reduces 13% energy and 27% error over prior approach

slide-8
SLIDE 8

Outline

8

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

slide-9
SLIDE 9

Outline

9

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

slide-10
SLIDE 10

Experiment Settings

10

3

Tasks

Image classification (ImageNet) Sentence prediction (PTB) Question Answering (SQuAD)

Platforms

ODroid, CPUs, GPU

4 4

DNNs

ResNet50, VGG16, RNN, Bert

slide-11
SLIDE 11

5 10 15 20 25 30 35 40 0.05 0.1 0.15 0.2 Top-5 Error Rate (%) Inference Time of One Image (s)

42 DNNs on ImageNet classifications

11

MobileNet-v1 (α=1) MobileNet-v2 (α=1.3) ResNet50 NasNet-large PnasNet-large

Tradeoffs from DNNs

High accuracy comes with long latency.

slide-12
SLIDE 12

12 12.5 13 13.5 14 14.5 15 15.5 16 0.07 0.09 0.11 0.13 0.15 0.17 Average Energy (J) Inference Time of One Image (s)

12

Tradeoffs from System Settings

Power limit setting (W) Least Energy Fastest

No setting is optimal for both energy and latency.

slide-13
SLIDE 13

Run-time Variability

13

Without co-locate job With co-locate job

slide-14
SLIDE 14

Run-time Variability

14

Latency variation increased by co-located jobs.

Without co-locate job With co-locate job

slide-15
SLIDE 15

15

Potential Solutions

10 20 30 40 50 60 70 80 90 100 Average Energy (J) Constraint Settings (deadline × accuracy_goal) Sys-level App-level Combined

Deadline 0.1s 0.2s 0.3s 0.4s 0.5s 0.6s 0.7s

Combining both level achieves best performance.

slide-16
SLIDE 16

Outline

16

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

slide-17
SLIDE 17

Three Dimensions & Two Tasks

17

Minimize Energy

With accuracy goal and inference deadline

A L E

Inference Deadline Accuracy Goal Energy Consumption Goal

Maximize Accuracy

With energy consumption goal and inference deadline

slide-18
SLIDE 18

18

Configurations Constraints

<X <Y

Optimization max( )

Maximize Accuracy Task

A L E

2,2 2,3 3,2 3,3 2,1 3,1 1,2 1,3 1,1 4,2 4,3 4,1

DNNs Power cap

slide-19
SLIDE 19

How to estimate the inference latency?

  • Two key challenges

○ Runtime variation: The inference time may be different even for same the configuration

19

L

52 46 58 53 50 70 75 99 94

Profiling Runtime

51 …

slide-20
SLIDE 20

How to estimate the inference latency?

  • Two key challenges

○ Runtime variation ○ Too many combinations of DNNs and resources

20

L

X … … X … X … X X … X X … X X

DNNs dl … d2 d1 Power Cap p1 p2 … pk

slide-21
SLIDE 21

Potential Solution

  • Kalman filter

○ Estimate latency for each configuration ○ Use recent execution history

21

L

DNN2, P1

43 58 49 51

DNN1, P2

30 31 History Prediction 52 29

slide-22
SLIDE 22

Potential Solution: drawback

  • Cannot solve the problem

○ Not enough history for each configuration

22

L

DNN2, P1

43 58 49 51

DNN1, P2

30 31 History Prediction 52 29

DNN1, P1

?

DNN2, P2

?

slide-23
SLIDE 23

How to estimate the inference latency?

  • Global Slow-down factor ξ

○ Use recent execution history under any DNN or resources

23

L

ξ

150%

DNN2, P1 DNN1, P2 DNN1, P1 DNN2, P2

51 30 ? ? 34 20 40 30 60 45 Profiling Runtime

slide-24
SLIDE 24

How to estimate the inference latency?

  • Mean estimation is not sufficient

○ The variation might be too big to provide a good prediction.

  • Different implications on DNN selection

24

L

Sequence 1

52 43 58 49 50

Sequence 2

51 50 49 49 50 15 10 99 70

Sequence 3

50 History Prediction

Mean Variation

5 1 40

slide-25
SLIDE 25

How to estimate the inference latency?

  • Global Slow-down factor ξ

○ Use recent execution history under any DNN or resources ○ Estimate its distribution: mean and variance

25

L

History

52 43 58 49 50

Mean Variation

5

ξ

slide-26
SLIDE 26

How to estimate accuracy under a deadline?

26

A

  • Can inference be finished before deadline?

○ If yes, training accuracy of the selected DNN ○ If not, random guess accuracy ■ Unless it’s an Anytime DNN. Inference Accuracy Time

!" #",% !&'()

slide-27
SLIDE 27

What is an Anytime DNN?

27

A

Traditional DNN Anytime DNN Timeline Deadline Road Chocolate Ground Road

[1] C. Wan et. al. Orthogonalized SGD and Nested Architectures for Anytime Neural Networks . ICML, 2020.

slide-28
SLIDE 28

How to estimate accuracy under a deadline?

28

A

  • Can inference be finished before deadline?

○ If yes, training accuracy of the selected DNN ○ If not, ■

Traditional DNN: random guess accuracy.

Anytime DNN: accuracy of the last output

Traditional DNN Anytime DNN Inference Accuracy Time

!" #" !&'()

Time

!" !* !+

Inference Accuracy

#+ #* #" !&'()

slide-29
SLIDE 29

How to estimate accuracy under a deadline?

29

A

Accuracy- Latency Relation Latency Distribution Expectation

  • f Accuracy
slide-30
SLIDE 30

How to manage energy?

30

  • Power-cap as a knob to configure system resource
  • Idle power: other process may still consume energy when DNN

inference has finished

E

Power Time

DNN active1 DNN active2 DNN Idle New input Latency Target

slide-31
SLIDE 31

?× time Power setting × time

How to estimate the energy consumption?

31

  • Estimate energy from power

○ DNN active power is power setting ○ DNN idle power is estimated by Kalman filter

E

Power Time

DNN active DNN Idle New input Latency Target

slide-32
SLIDE 32

Our ALERT System

32

A L E R T

A L E

DNN & Power Cap Selection Measurement Feedback-based estimation

DNN System

? Road

slide-33
SLIDE 33

Outline

33

Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results

slide-34
SLIDE 34

Experiment Settings

34

3

Platforms

CPUs, GPU

5

Scenarios

Default, Compute intensive (2), Memory intensive (2)

2

DNNs

Sparse ResNet50, RNN

2

Tasks

  • 1. Minimize energy
  • 2. Maximize accuracy
slide-35
SLIDE 35

Schemes

35

Oracles

  • Oracle: Change configuration for every input. Assume perfect

knowledge of future. Emulated from profiling result.

  • Oracle-static: Same configuration for all inputs.

Baselines

  • Sys-only: Only adjust power-cap
  • App-only: Use an Anytime DNN
  • No-coord: Anytime DNN without coordination with power-cap
slide-36
SLIDE 36

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Minimize Energy App-only Sys-only No-coord Sys+App(ALERT) Oracle

Evaluation: Scheduler Performance

36

Average performance normalized to Oracle_Static (Smaller is better) Violations (%)

slide-37
SLIDE 37

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Minimize Error App-only Sys-only No-coord Sys+App(ALERT) Oracle

Evaluation: Scheduler Performance

37

Average performance normalized to Oracle_Static (Smaller is better) Violations (%)

slide-38
SLIDE 38

How ALERT Works with Traditional DNN

38

Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment

slide-39
SLIDE 39

How ALERT Works with Traditional DNN

39

Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment

slide-40
SLIDE 40

How ALERT Works with Anytime +Traditional DNN

40

Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment

slide-41
SLIDE 41

Conclusion

41

  • Understand DNN inference challenges
  • ALERT Run-time inference System
  • High performance and energy efficiency