Optimization of Deep Learning Applications in Highly Heterogeneous - - PowerPoint PPT Presentation

optimization of deep learning applications in highly
SMART_READER_LITE
LIVE PREVIEW

Optimization of Deep Learning Applications in Highly Heterogeneous - - PowerPoint PPT Presentation

Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System 515030910586 F1503024 Last Edited: May. 17, 2018 Report Date: Apr. 20, 2018 1 Catalog Introduction Related Works Proposed


slide-1
SLIDE 1

Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System

515030910586 F1503024

1

Last Edited: May. 17, 2018 Report Date: Apr. 20, 2018

slide-2
SLIDE 2

Catalog

  • Introduction
  • Related Works
  • Proposed Framework
  • Experiments
  • Next Step
  • Reference

2

slide-3
SLIDE 3

Introduction

  • Fast development of AI
  • Social HotspotAlphaGo Series [1]
  • Government Support - [2]SJTU AI Research [3]
  • Bussiness ConcernHomePod (Apple), Amazon Echo

3

slide-4
SLIDE 4

Introduction

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

  • The importance of Deep Learning in AI
  • Computer Vision
  • NLP
  • Computer Architecture
  • Network
  • ……
  • Requires huge amount of computation
  • Requires huge amount of data

4

slide-5
SLIDE 5

Introduction

Traditional DL Computation Platform

  • Researcher: Desktop + High performance GPU
  • Company: Server cluster / Data center
  • Advantages: High performance density
  • Disadvantages: No portability; High usage cost; High maintenance cost

5

3

slide-6
SLIDE 6

Introduction

Traditional DL Computation Platform

  • Commercial products: DL applications based on Cloud Computing
  • Advantages: Low hardware requirement on client side; Advantages of Cloud

Computing

  • Disadvantages: Poor user experience (Network Latency); Privacy issue

6

slide-7
SLIDE 7

Introduction

Traditional DL Data Source

  • Publically available dataset
  • Advantages: Convenient to compare different algos; Easy to obtain
  • Disadvantages: Out-of-date; Far-away from end users
  • Private dataset (in company, hospital, etc.)
  • Advantages: Large scale; Close to production workflow
  • Disadvantages: Not publically available; Limited research value

7

slide-8
SLIDE 8

Introduction

New computational platform and data source: Smart Phone and IoT devices

  • Increasing processing power of smart phone and smart devices.
  • Giant data source with enormous IoT devices (smart phones).
  • Low network latency in a LAN network structure.

8

slide-9
SLIDE 9

Introduction

New computational platform and data source: Smart Phone and IoT devices

9

5

slide-10
SLIDE 10

Introduction

New Challenges

  • As data producers, mobile and IoT devices are called: End Devices

1. Enormous data generated by end devices, larger than computational capacity. 2. Limitation of the power consumption on end devices. 3. Limitation of internal storage for both program and model.

10

slide-11
SLIDE 11

Introduction

Summary

  • End devices (mobile phones and IoT devices) have the potential to

performance DL applications, and many companies (Qualcomm, Apple, etc.) are working on it.

11

slide-12
SLIDE 12

Related Works

  • Cloud Computing, Edge Computing, Fog Computing
  • Internet of Things (IoT)
  • Highly Heterogeneous Distributed System
  • S. Teerapittayanon, et al. [6]

12

slide-13
SLIDE 13

Related Works

Cloud Computing Edge, Fog Computing [7]

  • Features of Cloud Computing:
  • High-speed interconnection between workers
  • Virtualization and high scalability
  • Service abstraction (IaaSPaaSSaaS)
  • ……
  • Motivation of Edge, Fog Computing
  • Apply for smart devices and IoT devices
  • Improve processing efficiency
  • Improve Quality of Service (QoS)

13

slide-14
SLIDE 14

Related Works

Internet of Things (IoT) [8]

  • Many scenarios for IoT:
  • Smart Home, Smart Retail, Smart City,
  • Smart Agriculture, Smart Transportation,
  • ……
  • Typical IoT devices:
  • Sensors
  • Embedded communication devices
  • Storage / computation middleware

14

slide-15
SLIDE 15

Related Works

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

  • Distributed system with Cloud, Edge, and End devices [7]
  • 1. Highly heterogeneous:
  • Optimization space both locally and globally
  • 2. Controllable communication:
  • Leverage the communication latency
  • 3. Towards ubiquitous computing:
  • Smart collection of computational resources

15

slide-16
SLIDE 16

Related Works

Distributed Deep Neural Network (DDNN) [6]

(S. Terrapittayanon et al.)

  • Advantages
  • A novel framework to discuss DL applications on heterogeneous distributed system.
  • Leverage cloud workload by local exit.
  • Disadvantages
  • Lack of experiment on multiple devices
  • Lack of discussion on communication latency
  • Lack of generous DL application test-case (multi-view tracking)
  • Lack of discussion on computing capabilities (all devices use BNN)

16

slide-17
SLIDE 17

Related Works

Distributed Deep Neural Network (DDNN) [6]

(S. Terrapittayanon et al.)

17

)---

  • -))()--

)- )

  • -:)(()-)()(
slide-18
SLIDE 18

Related Works

Summary

  • Utilize the heterogeneous distributed framework (end device + edge

device + cloud device) to perform DL applications, based on and improve from DDNN.

18

slide-19
SLIDE 19

Proposed Framework

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

  • Propose a framework with 3 types of heterogeneity :
  • 1. Computing Node Heterogeneity
  • 2. Neural Network Heterogeneity
  • 3. Deep Learning Task Heterogeneity

19

slide-20
SLIDE 20

Proposed Framework

‘Optimization of Deep Learning Applications in Highly Heterogeneous Distributed System’

  • Optimization Target

1. Each device has locally optimal performance: DL performance, power consumption, model size. 2. Overall system has globally optimal performance: response time, load balancing. 3. Obtain scalability, robustness, failure recovery.

20

slide-21
SLIDE 21

Proposed Framework

Computing Node Heterogeneity

  • Cloud computing

Distribute workload to each node

  • Benefit:
  • Avoid long response delay between end user and the cloud
  • Avoid potential privacy leakage
  • Make full use of nearby computing resources

21

slide-22
SLIDE 22

Proposed Framework

Neural Network Heterogeneity

  • All same NN structure (like DDNN)

Optimized NN for each type

  • Benefit:
  • Choose different NN structure for each node
  • Make full use of available hardware resources
  • Being able to reach local optimal for performance, speed, model size, power

consumption, ……

22

e.g., End devices use MobileNet [9] (resource-oriented) Cloud device use ResNet [4] (performance-oriented)

slide-23
SLIDE 23

Proposed Framework

Deep Learning Task Heterogeneity

  • Same task for all nodes

Different subtask for each node

  • Benefit:
  • Choose different DL tasks according to hardware and the DNN loaded
  • Consider the network latency between nodes
  • Reach globally optimal for response time, overall performance, ……

23

e.g., In some cases, giving user a classification of ‘cat’ in very short delay is better than return ‘American bobtail cat’ for longer time.

slide-24
SLIDE 24

Proposed Framework

Scheduling Algorithm

  • Baseline 1 – ‘End’ scheme
  • All data send to end devices in sequential order
  • Lowest precision but highest speed
  • Not utilizing the distributed system

24

slide-25
SLIDE 25

Proposed Framework

Scheduling Algorithm

  • Baseline 1 – ‘End’ scheme
  • Baseline 2 – ‘End-Cloud’ scheme
  • Slower, but higher performance
  • ‘Jam’ on Cloud device

25

slide-26
SLIDE 26

Proposed Framework

Scheduling Algorithm

  • Baseline 1 – ‘End’ scheme
  • Baseline 2 – ‘End-Cloud’ scheme
  • Proposed – ‘Mapping’ scheme

26

slide-27
SLIDE 27

Proposed Framework

Scheduling Algorithm

  • Proposed – ‘Mapping’ scheme
  • Find the ‘balance’ point of the distributed system
  • High accuracy
  • ‘Jam’ on the cloud less likely to happen

27

slide-28
SLIDE 28

Proposed Framework

Privacy Protection

  • Encryption module
  • Encrypt on End/Edge devices
  • Only send processed data

28

slide-29
SLIDE 29

Proposed Framework

Fault Tolerance

  • Without / With device state monitoring

29

Dev1 Dev2 Dev3 Dev1 Dev2 Dev3 Round 1 Round 2

slide-30
SLIDE 30

Proposed Framework

Fault Tolerance

  • Without / With device state monitoring

30

Dev1 Dev2 Dev3 Dev1 Round 1 Round 2 Online: Dev1 Dev2 Dev3 Online: Dev1 Round 3 Online: Dev1 Round 4 Online: Dev1 Dev1 Dev1

slide-31
SLIDE 31

Proposed Framework

Summary

  • Propose a framework with three types of heterogeneity such that the

hardware on each types of devices are fully utilized, and the performance is both globally and locally optimal.

31

slide-32
SLIDE 32

Experiments

  • Experiment environment
  • Testing methods and workflow
  • Effect of three types of heterogeneity
  • Scalability
  • Privacy protection

32

slide-33
SLIDE 33

Experiments

Experiment environment

  • Use virtual machines to control the processing power of different

devices

  • Comparison: End Devices < Edge Devices < Cloud Devices
  • Use virtual network adapter to control the network latency between

devices

  • Comparison: End Devices < Edge Devices < Cloud Devices

33

slide-34
SLIDE 34

Experiments

Experiment environment

34

slide-35
SLIDE 35

Experiments

Testing methods and workflow

  • Deep Learning application: Image Classification
  • Testing datasetCIFAR 100 [10]
  • Widely-used open access dataset in Computer Vision research field
  • It fits ‘Task Heterogeneity’ by having hierarchical image labels
  • Median sized dataset
  • Rich labels, good expandability

35

slide-36
SLIDE 36

Experiments

Testing methods and workflow

  • Baseline
  • Simulate Cloud-only situations. Record the overall latency and performance
  • Ablation
  • Analyze the effect of three types of heterogeneity
  • Overall
  • The overall performance and processing speed with scheduling algorithm and

privacy protection

36

slide-37
SLIDE 37

Experiments

Baseline

  • End nodes: MobileNet [9]

ØLimited computing & storage resources ØSpeed first

  • Fog nodes: ResNet 34 [4]

ØNo limit on computing & storage resources ØBalanced

  • Cloud nodes: Wide ResNet [11]

ØBest performance

37

slide-38
SLIDE 38

Experiments

Baseline

38

Network MobileNet [9] ResNet [4] WRN [11] Inference speed (fr/ms)* 4.00 1.27 0.14 Accuracy ** 0.50 0.73 0.83 Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9

* The inference speed is being tested in the same environment ** The accuracy is top-1 accuracy for 20 labels

slide-39
SLIDE 39

Experiments

Ablation - Computing Node Heterogeneity

1. Fit for different devices

39

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Inference speed - Cloud (ms / fr) 0.25 0.79 6.99 Inference speed - Edge device (ms / fr) 2.03 5.53 227 Inference speed - End device (ms / fr) 5.20 15.7 1.17K Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9

slide-40
SLIDE 40

Experiments

Ablation - Computing Node Heterogeneity

2. Consider network latency

40

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) System process speed (no latency) (ms / fr) 7.41 5.99 7.36 Communication latency (s) 0.5 1 System process speed (ms / fr) 10.4 8.4 18.3 10.8 31.0

slide-41
SLIDE 41

Experiments

Ablation – Neural Network Heterogeneity

  • Proper performance, model size and speed

41

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Accuracy * 0.50 0.73 0.83 Model parameters (M) 0.84 21.4 40.6 Model size (MB) 10.3 85.7 324.9

* The accuracy is top-1 accuracy for 20 labels

slide-42
SLIDE 42

Experiments

Ablation – Deep Learning Task Heterogeneity

  • Trade-off between state space and speed / accuracy

42

Devices (network) End Device (MobileNet [9]) Edge Device (ResNet [4]) Cloud (WRN [11]) Accuracy – 20 labels 0.50 0.73 0.83 Accuracy – 100 labels / 0.61 0.73

slide-43
SLIDE 43

Experiments

Overall – Scheduling and Scalability

  • Map-scheme: high performance + high scalability

43

slide-44
SLIDE 44

Experiments

Overall – Privacy Protection

  • High protection level with low computation overhead

44

)) ) ()

slide-45
SLIDE 45

Experiments

Overall – Fault Tolerance

  • High protection level with low computation overhead

45

slide-46
SLIDE 46

Next Step

  • Put the test system on real devices
  • Consider other collaborative computing between different types of devices
  • Pack code as module in other distributed system, e.g. Spark
  • Refine modules, e.g. add differential privacy protection in encryption module.
  • ……

46

slide-47
SLIDE 47

Reference

47 [1]. AlphaGo Deepmind, https://deepmind.com/research/alphago/ [2]. 201735 [3]. 2018/1/18http://news.sciencenet.cn/htmlnews/2018/1/400490.shtm [4]. Deep Residual Learning for Image Recognition. K. He, et al. CVPR, 2016 [5]. “We are making on-device AI ubiquitous”, Qualcomm Blog, https://www.qualcomm.com/news/onq/2017/08/16/we-are- making-device-ai-ubiquitous [6]. Distributed Deep Neural Networks over the Cloud, the Edge and End Devices, S. Teerapittayanon, et al. ICDCS, 2017 [7]. Scalable Distributed Computing Hierarchy: Cloud, Fog and Dew Computing, K. Skala, et al. OJCC, 2015 [8]. Internet of Things (IoT): A vision, architectural elements, and future directions, J. Gubbi, et al. FGCS, 2013 [9]. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, A.G. Howard, et al. arXiv: 1704.04861 [10]. Learning Multiple Layers of Features from Tiny Images, A. Krizhevsky, et al. Technical report, 2009 [11]. Wide Residual Networks, S. Zagoruyko, et al. BMVC, 2016

slide-48
SLIDE 48

Change Log

  • Apr. 18, 2018. Version 1.
  • Fix Reference
  • May. 25, 2018. Version 2.
  • Change from Chinese to English
  • Update to final-term process

48