ALERT: Accurate Learning for Energy and Timeliness
Chengcheng Wan, Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu
ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , - - PowerPoint PPT Presentation
ALERT: Accurate Learning for Energy and Timeliness Chengcheng Wan , Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu DNN is Deployed Everywhere Trading Auto Smart driving city QA robot Weather Text forecast
Chengcheng Wan, Muhammad Husni Santriaji, Eri Rogers, Henry Hoffmann, Michael Maire and Shan Lu
2
Weather forecast QA robot Auto driving Smart city Trading Text generator
3
DNN System
? Road
4
Previous Works Challenges
Low Overhead Dynamic Environment Huge Space of Configuration DNN design Resource Management
[1] H. Hoffmann et. al. Jouleguard: energy guarantees for approximate applications. SOSP, 2015. [2] C. Imes et. al. Poet: a portable approach to minimizing energy under soft real-time constraints. RTAS, 2015 [3] N. Mishra et. al. CALOREE: learning control for predictable latency and low energy. ASPLOS, 2018. [4] A. Rahmani et. al. SPECTR: formal supervisory control and coordination for many-core systems resource management. ASPLOS, 2018. …
5
Measurement DNN & Power Cap Selection
DNN System
? Road
Feedback-based estimation Challenges
6
Measurement Feedback-based estimation DNN & Power Cap Selection
DNN System
? Road
ξ
Challenges
7
✔ ALERT satisfies LAE constraints. 99.9% cases for vision; 98.5% cases for NLP ✔ Probabilistic design overcomes dynamic variability efficiently. ALERT achieves 93-99% of Oracle’s performance ✔ Coordinating App- and Sys- level improves performance. Reduces 13% energy and 27% error over prior approach
8
Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results
9
Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results
10
Image classification (ImageNet) Sentence prediction (PTB) Question Answering (SQuAD)
ODroid, CPUs, GPU
ResNet50, VGG16, RNN, Bert
5 10 15 20 25 30 35 40 0.05 0.1 0.15 0.2 Top-5 Error Rate (%) Inference Time of One Image (s)
42 DNNs on ImageNet classifications
11
MobileNet-v1 (α=1) MobileNet-v2 (α=1.3) ResNet50 NasNet-large PnasNet-large
12 12.5 13 13.5 14 14.5 15 15.5 16 0.07 0.09 0.11 0.13 0.15 0.17 Average Energy (J) Inference Time of One Image (s)
12
Power limit setting (W) Least Energy Fastest
13
Without co-locate job With co-locate job
14
Without co-locate job With co-locate job
15
10 20 30 40 50 60 70 80 90 100 Average Energy (J) Constraint Settings (deadline × accuracy_goal) Sys-level App-level Combined
Deadline 0.1s 0.2s 0.3s 0.4s 0.5s 0.6s 0.7s
16
Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results
17
With accuracy goal and inference deadline
Inference Deadline Accuracy Goal Energy Consumption Goal
With energy consumption goal and inference deadline
18
Configurations Constraints
Optimization max( )
2,2 2,3 3,2 3,3 2,1 3,1 1,2 1,3 1,1 4,2 4,3 4,1
DNNs Power cap
○ Runtime variation: The inference time may be different even for same the configuration
19
52 46 58 53 50 70 75 99 94
Profiling Runtime
51 …
○ Runtime variation ○ Too many combinations of DNNs and resources
20
X … … X … X … X X … X X … X X
DNNs dl … d2 d1 Power Cap p1 p2 … pk
○ Estimate latency for each configuration ○ Use recent execution history
21
DNN2, P1
43 58 49 51
DNN1, P2
30 31 History Prediction 52 29
○ Not enough history for each configuration
22
DNN2, P1
43 58 49 51
DNN1, P2
30 31 History Prediction 52 29
DNN1, P1
?
DNN2, P2
?
○ Use recent execution history under any DNN or resources
23
DNN2, P1 DNN1, P2 DNN1, P1 DNN2, P2
51 30 ? ? 34 20 40 30 60 45 Profiling Runtime
○ The variation might be too big to provide a good prediction.
24
Sequence 1
52 43 58 49 50
Sequence 2
51 50 49 49 50 15 10 99 70
Sequence 3
50 History Prediction
Mean Variation
5 1 40
○ Use recent execution history under any DNN or resources ○ Estimate its distribution: mean and variance
25
History
52 43 58 49 50
Mean Variation
5
26
○ If yes, training accuracy of the selected DNN ○ If not, random guess accuracy ■ Unless it’s an Anytime DNN. Inference Accuracy Time
!" #",% !&'()
27
Traditional DNN Anytime DNN Timeline Deadline Road Chocolate Ground Road
[1] C. Wan et. al. Orthogonalized SGD and Nested Architectures for Anytime Neural Networks . ICML, 2020.
28
○ If yes, training accuracy of the selected DNN ○ If not, ■
Traditional DNN: random guess accuracy.
■
Anytime DNN: accuracy of the last output
Traditional DNN Anytime DNN Inference Accuracy Time
!" #" !&'()
Time
!" !* !+
Inference Accuracy
#+ #* #" !&'()
29
Accuracy- Latency Relation Latency Distribution Expectation
30
inference has finished
Power Time
DNN active1 DNN active2 DNN Idle New input Latency Target
?× time Power setting × time
31
○ DNN active power is power setting ○ DNN idle power is estimated by Kalman filter
Power Time
DNN active DNN Idle New input Latency Target
32
DNN & Power Cap Selection Measurement Feedback-based estimation
DNN System
? Road
33
Understanding DNN Deployment Challenges ALERT Run-time Inference Management Experiments and Results
34
CPUs, GPU
Default, Compute intensive (2), Memory intensive (2)
Sparse ResNet50, RNN
35
knowledge of future. Emulated from profiling result.
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Minimize Energy App-only Sys-only No-coord Sys+App(ALERT) Oracle
36
Average performance normalized to Oracle_Static (Smaller is better) Violations (%)
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Minimize Error App-only Sys-only No-coord Sys+App(ALERT) Oracle
37
Average performance normalized to Oracle_Static (Smaller is better) Violations (%)
38
Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment
39
Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment
40
Meet requirements in most cases Quickly detect contention changes Use anytime DNN under unstable environment
41