[PPT] - Optimization of Deep Learning Applications in Highly Heterogeneous PowerPoint Presentation

SLIDE 1

Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System

515030910586 F1503024

1

Last Edited: May. 17, 2018 Report Date: Apr. 20, 2018

SLIDE 2

Catalog

Introduction
Related Works
Proposed Framework
Experiments
Next Step
Reference

2

SLIDE 3

Introduction

Fast development of AI
Social HotspotAlphaGo Series [1]
Government Support - [2]SJTU AI Research [3]
Bussiness ConcernHomePod (Apple), Amazon Echo

3

SLIDE 4

Introduction

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

The importance of Deep Learning in AI
Computer Vision
NLP
Computer Architecture
Network
……
Requires huge amount of computation
Requires huge amount of data

4

SLIDE 5

Introduction

Traditional DL Computation Platform

Researcher: Desktop + High performance GPU
Company: Server cluster / Data center
Advantages: High performance density
Disadvantages: No portability; High usage cost; High maintenance cost

5

3

SLIDE 6

Introduction

Traditional DL Computation Platform

Commercial products: DL applications based on Cloud Computing
Advantages: Low hardware requirement on client side; Advantages of Cloud

Computing

Disadvantages: Poor user experience (Network Latency); Privacy issue

6

SLIDE 7

Introduction

Traditional DL Data Source

Publically available dataset
Advantages: Convenient to compare different algos; Easy to obtain
Disadvantages: Out-of-date; Far-away from end users
Private dataset (in company, hospital, etc.)
Advantages: Large scale; Close to production workflow
Disadvantages: Not publically available; Limited research value

7

SLIDE 8

Introduction

New computational platform and data source: Smart Phone and IoT devices

Increasing processing power of smart phone and smart devices.
Giant data source with enormous IoT devices (smart phones).
Low network latency in a LAN network structure.

8

SLIDE 9

Introduction

New computational platform and data source: Smart Phone and IoT devices

9

5

SLIDE 10

Introduction

New Challenges

As data producers, mobile and IoT devices are called: End Devices

1. Enormous data generated by end devices, larger than computational capacity. 2. Limitation of the power consumption on end devices. 3. Limitation of internal storage for both program and model.

10

SLIDE 11

Introduction

Summary

End devices (mobile phones and IoT devices) have the potential to

performance DL applications, and many companies (Qualcomm, Apple, etc.) are working on it.

11

SLIDE 12

Related Works

Cloud Computing, Edge Computing, Fog Computing
Internet of Things (IoT)
Highly Heterogeneous Distributed System
S. Teerapittayanon, et al. [6]

12

SLIDE 13

Related Works

Cloud Computing Edge, Fog Computing [7]

Features of Cloud Computing:
High-speed interconnection between workers
Virtualization and high scalability
Service abstraction (IaaSPaaSSaaS)
……
Motivation of Edge, Fog Computing
Apply for smart devices and IoT devices
Improve processing efficiency
Improve Quality of Service (QoS)

13

SLIDE 14

Related Works

Internet of Things (IoT) [8]

Many scenarios for IoT:
Smart Home, Smart Retail, Smart City,
Smart Agriculture, Smart Transportation,
……
Typical IoT devices:
Sensors
Embedded communication devices
Storage / computation middleware

14

SLIDE 15

Related Works

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

Distributed system with Cloud, Edge, and End devices [7]
1. Highly heterogeneous:
Optimization space both locally and globally
2. Controllable communication:
Leverage the communication latency
3. Towards ubiquitous computing:
Smart collection of computational resources

15

SLIDE 16

Related Works

Distributed Deep Neural Network (DDNN) [6]

(S. Terrapittayanon et al.)

Advantages
A novel framework to discuss DL applications on heterogeneous distributed system.
Leverage cloud workload by local exit.
Disadvantages
Lack of experiment on multiple devices
Lack of discussion on communication latency
Lack of generous DL application test-case (multi-view tracking)
Lack of discussion on computing capabilities (all devices use BNN)

16

SLIDE 17

Related Works

Distributed Deep Neural Network (DDNN) [6]

(S. Terrapittayanon et al.)

17

)---

-))()--

)- )

-:)(()-)()(

SLIDE 18

Related Works

Summary

Utilize the heterogeneous distributed framework (end device + edge

device + cloud device) to perform DL applications, based on and improve from DDNN.

18

SLIDE 19

Proposed Framework

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

Propose a framework with 3 types of heterogeneity :
1. Computing Node Heterogeneity
2. Neural Network Heterogeneity
3. Deep Learning Task Heterogeneity

19

SLIDE 20

Proposed Framework

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

Optimization Target

1. Each device has locally optimal performance: DL performance, power consumption, model size. 2. Overall system has globally optimal performance: response time, load balancing. 3. Obtain scalability, robustness, failure recovery.

20

SLIDE 21

Proposed Framework

Computing Node Heterogeneity

Cloud computing

Distribute workload to each node

Benefit:
Avoid long response delay between end user and the cloud
Avoid potential privacy leakage
Make full use of nearby computing resources

21

SLIDE 22

Proposed Framework

Neural Network Heterogeneity

All same NN structure (like DDNN)

Optimized NN for each type

Benefit:
Choose different NN structure for each node
Make full use of available hardware resources
Being able to reach local optimal for performance, speed, model size, power

consumption, ……

22

e.g., End devices use MobileNet [9] (resource-oriented) Cloud device use ResNet [4] (performance-oriented)

SLIDE 23

Proposed Framework

Deep Learning Task Heterogeneity

Same task for all nodes

Different subtask for each node

Benefit:
Choose different DL tasks according to hardware and the DNN loaded
Consider the network latency between nodes
Reach globally optimal for response time, overall performance, ……

23

e.g., In some cases, giving user a classification of ‘cat’ in very short delay is better than return ‘American bobtail cat’ for longer time.

SLIDE 24

Proposed Framework

Scheduling Algorithm

Baseline 1 – ‘End’ scheme
All data send to end devices in sequential order
Lowest precision but highest speed
Not utilizing the distributed system

24

SLIDE 25

Proposed Framework

Scheduling Algorithm

Baseline 1 – ‘End’ scheme
Baseline 2 – ‘End-Cloud’ scheme
Slower, but higher performance
‘Jam’ on Cloud device

25

SLIDE 26

Proposed Framework

Scheduling Algorithm

Baseline 1 – ‘End’ scheme
Baseline 2 – ‘End-Cloud’ scheme
Proposed – ‘Mapping’ scheme

26

SLIDE 27

Proposed Framework

Scheduling Algorithm

Proposed – ‘Mapping’ scheme
Find the ‘balance’ point of the distributed system
High accuracy
‘Jam’ on the cloud less likely to happen

27

SLIDE 28

Proposed Framework

Privacy Protection

Encryption module
Encrypt on End/Edge devices
Only send processed data

28

SLIDE 29

Proposed Framework

Fault Tolerance

Without / With device state monitoring

29

Dev1 Dev2 Dev3 Dev1 Dev2 Dev3 Round 1 Round 2

SLIDE 30

Proposed Framework

Fault Tolerance

Without / With device state monitoring

30

Dev1 Dev2 Dev3 Dev1 Round 1 Round 2 Online: Dev1 Dev2 Dev3 Online: Dev1 Round 3 Online: Dev1 Round 4 Online: Dev1 Dev1 Dev1

SLIDE 31

Proposed Framework

Summary

Propose a framework with three types of heterogeneity such that the

hardware on each types of devices are fully utilized, and the performance is both globally and locally optimal.

31

SLIDE 32

Experiments

Experiment environment
Testing methods and workflow
Effect of three types of heterogeneity
Scalability
Privacy protection

32

SLIDE 33

Experiments

Experiment environment

Use virtual machines to control the processing power of different

devices

Comparison: End Devices < Edge Devices < Cloud Devices
Use virtual network adapter to control the network latency between

devices

Comparison: End Devices < Edge Devices < Cloud Devices

33

SLIDE 34

Experiments

Experiment environment

34

SLIDE 35

Experiments

Testing methods and workflow

Deep Learning application: Image Classification
Testing datasetCIFAR 100 [10]
Widely-used open access dataset in Computer Vision research field
It fits ‘Task Heterogeneity’ by having hierarchical image labels
Median sized dataset
Rich labels, good expandability

35

SLIDE 36

Experiments

Testing methods and workflow

Baseline
Simulate Cloud-only situations. Record the overall latency and performance
Ablation
Analyze the effect of three types of heterogeneity
Overall
The overall performance and processing speed with scheduling algorithm and

privacy protection

36

SLIDE 37

Experiments

Baseline

End nodes: MobileNet [9]

ØLimited computing & storage resources ØSpeed first

Fog nodes: ResNet 34 [4]

ØNo limit on computing & storage resources ØBalanced

Cloud nodes: Wide ResNet [11]

ØBest performance

37

SLIDE 38

Experiments

Baseline

38

Network MobileNet [9] ResNet [4] WRN [11] Inference speed (fr/ms)* 4.00 1.27 0.14 Accuracy ** 0.50 0.73 0.83 Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9

* The inference speed is being tested in the same environment ** The accuracy is top-1 accuracy for 20 labels

SLIDE 39

Experiments

Ablation - Computing Node Heterogeneity

1. Fit for different devices

39

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Inference speed - Cloud (ms / fr) 0.25 0.79 6.99 Inference speed - Edge device (ms / fr) 2.03 5.53 227 Inference speed - End device (ms / fr) 5.20 15.7 1.17K Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9

SLIDE 40

Experiments

Ablation - Computing Node Heterogeneity

2. Consider network latency

40

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) System process speed (no latency) (ms / fr) 7.41 5.99 7.36 Communication latency (s) 0.5 1 System process speed (ms / fr) 10.4 8.4 18.3 10.8 31.0

SLIDE 41

Experiments

Ablation – Neural Network Heterogeneity

Proper performance, model size and speed

41

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Accuracy * 0.50 0.73 0.83 Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9

* The accuracy is top-1 accuracy for 20 labels

SLIDE 42

Experiments

Ablation – Deep Learning Task Heterogeneity

Trade-off between state space and speed / accuracy

42

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Accuracy – 20 labels 0.50 0.73 0.83 Accuracy – 100 labels / 0.61 0.73

SLIDE 43

Experiments

Overall – Scheduling and Scalability

Map-scheme: high performance + high scalability

43

SLIDE 44

Experiments

Overall – Privacy Protection

High protection level with low computation overhead

44

)) ) ()

SLIDE 45

Experiments

Overall – Fault Tolerance

High protection level with low computation overhead

45

SLIDE 46

Next Step

Put the test system on real devices
Consider other collaborative computing between different types of devices
Pack code as module in other distributed system, e.g. Spark
Refine modules, e.g. add differential privacy protection in encryption module.
……

46

SLIDE 47

Reference

47 [1]. AlphaGo Deepmind, https://deepmind.com/research/alphago/ [2]. 201735 [3]. 2018/1/18http://news.sciencenet.cn/htmlnews/2018/1/400490.shtm [4]. Deep Residual Learning for Image Recognition. K. He, et al. CVPR, 2016 [5]. “We are making on-device AI ubiquitous”, Qualcomm Blog, https://www.qualcomm.com/news/onq/2017/08/16/we-are- making-device-ai-ubiquitous [6]. Distributed Deep Neural Networks over the Cloud, the Edge and End Devices, S. Teerapittayanon, et al. ICDCS, 2017 [7]. Scalable Distributed Computing Hierarchy: Cloud, Fog and Dew Computing, K. Skala, et al. OJCC, 2015 [8]. Internet of Things (IoT): A vision, architectural elements, and future directions, J. Gubbi, et al. FGCS, 2013 [9]. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, A.G. Howard, et al. arXiv: 1704.04861 [10]. Learning Multiple Layers of Features from Tiny Images, A. Krizhevsky, et al. Technical report, 2009 [11]. Wide Residual Networks, S. Zagoruyko, et al. BMVC, 2016

SLIDE 48

Change Log

Apr. 18, 2018. Version 1.
Fix Reference
May. 25, 2018. Version 2.
Change from Chinese to English
Update to final-term process

48