Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System
515030910586 F1503024
1
Last Edited: May. 17, 2018 Report Date: Apr. 20, 2018
Optimization of Deep Learning Applications in Highly Heterogeneous - - PowerPoint PPT Presentation
Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System 515030910586 F1503024 Last Edited: May. 17, 2018 Report Date: Apr. 20, 2018 1 Catalog Introduction Related Works Proposed
1
Last Edited: May. 17, 2018 Report Date: Apr. 20, 2018
2
3
4
5
3
Computing
6
7
8
9
5
1. Enormous data generated by end devices, larger than computational capacity. 2. Limitation of the power consumption on end devices. 3. Limitation of internal storage for both program and model.
10
performance DL applications, and many companies (Qualcomm, Apple, etc.) are working on it.
11
12
13
14
15
(S. Terrapittayanon et al.)
16
(S. Terrapittayanon et al.)
17
)---
)- )
device + cloud device) to perform DL applications, based on and improve from DDNN.
18
19
1. Each device has locally optimal performance: DL performance, power consumption, model size. 2. Overall system has globally optimal performance: response time, load balancing. 3. Obtain scalability, robustness, failure recovery.
20
Distribute workload to each node
21
Optimized NN for each type
consumption, ……
22
e.g., End devices use MobileNet [9] (resource-oriented) Cloud device use ResNet [4] (performance-oriented)
Different subtask for each node
23
e.g., In some cases, giving user a classification of ‘cat’ in very short delay is better than return ‘American bobtail cat’ for longer time.
24
25
26
27
28
29
Dev1 Dev2 Dev3 Dev1 Dev2 Dev3 Round 1 Round 2
30
Dev1 Dev2 Dev3 Dev1 Round 1 Round 2 Online: Dev1 Dev2 Dev3 Online: Dev1 Round 3 Online: Dev1 Round 4 Online: Dev1 Dev1 Dev1
hardware on each types of devices are fully utilized, and the performance is both globally and locally optimal.
31
32
devices
devices
33
34
35
privacy protection
36
ØLimited computing & storage resources ØSpeed first
ØNo limit on computing & storage resources ØBalanced
ØBest performance
37
38
Network MobileNet [9] ResNet [4] WRN [11] Inference speed (fr/ms)* 4.00 1.27 0.14 Accuracy ** 0.50 0.73 0.83 Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9
* The inference speed is being tested in the same environment ** The accuracy is top-1 accuracy for 20 labels
1. Fit for different devices
39
Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Inference speed - Cloud (ms / fr) 0.25 0.79 6.99 Inference speed - Edge device (ms / fr) 2.03 5.53 227 Inference speed - End device (ms / fr) 5.20 15.7 1.17K Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9
2. Consider network latency
40
Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) System process speed (no latency) (ms / fr) 7.41 5.99 7.36 Communication latency (s) 0.5 1 System process speed (ms / fr) 10.4 8.4 18.3 10.8 31.0
41
Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Accuracy * 0.50 0.73 0.83 Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9
* The accuracy is top-1 accuracy for 20 labels
42
Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Accuracy – 20 labels 0.50 0.73 0.83 Accuracy – 100 labels / 0.61 0.73
43
44
)) ) ()
45
46
47 [1]. AlphaGo Deepmind, https://deepmind.com/research/alphago/ [2]. 201735 [3]. 2018/1/18http://news.sciencenet.cn/htmlnews/2018/1/400490.shtm [4]. Deep Residual Learning for Image Recognition. K. He, et al. CVPR, 2016 [5]. “We are making on-device AI ubiquitous”, Qualcomm Blog, https://www.qualcomm.com/news/onq/2017/08/16/we-are- making-device-ai-ubiquitous [6]. Distributed Deep Neural Networks over the Cloud, the Edge and End Devices, S. Teerapittayanon, et al. ICDCS, 2017 [7]. Scalable Distributed Computing Hierarchy: Cloud, Fog and Dew Computing, K. Skala, et al. OJCC, 2015 [8]. Internet of Things (IoT): A vision, architectural elements, and future directions, J. Gubbi, et al. FGCS, 2013 [9]. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, A.G. Howard, et al. arXiv: 1704.04861 [10]. Learning Multiple Layers of Features from Tiny Images, A. Krizhevsky, et al. Technical report, 2009 [11]. Wide Residual Networks, S. Zagoruyko, et al. BMVC, 2016
48