Tsing nghua hua University versity Introduction Deep learning - - PowerPoint PPT Presentation

▶

Jul 05, 2023 208 likes •374 views

Deep ep500 500 BOF 2018 Jidong ong Zhai Tsing nghua hua University versity Introduction Deep learning has widely used in lots of areas Introduction A lot of deep learning frameworks, compute libraries and acceleration devices

SLIDE 1

Jidong

ng Zhai

Tsing nghua hua University versity Deep ep500 500 BOF 2018

SLIDE 2

Introduction

Deep learning has widely used in lots of areas

SLIDE 3

Introduction

···

CNTK

Frameworks Compute Libraries

···

BLAS

Compute Devices

TPU

···

A lot of deep learning frameworks, compute libraries and acceleration

devices

SLIDE 4

Introduction

Benchmark

? ? ?

However, how to evaluate?

···

CNTK

Frameworks Compute Libraries

···

BLAS

Compute Devices

TPU

···

SLIDE 5

Introduction

Benchmark

? ? ?

···

CNTK

Frameworks Compute Libraries

···

BLAS

Compute Devices

TPU

···

Which is better? Running Time Resource Use Scalability Efficiency … Set Optimization Target Promote Development

However, how to evaluate?

SLIDE 6

Related Deep Learning Benchmarks

convnet- benchmarks1 DeepBench2 DAWNBench3 TensorFlow Benchmark4

Target Framework Compute Library Compute Library Compute Device Compute Library Framework Framework Models Granularity Neural Network Basic Operation Neural Network Neural Network Diversity Only CNN Training Inference 2 CNN + 1 RNN 4 CNN Dataset ImageNet Dummy Data CIFAR10、ImageNet SQuAD ImageNet Metrics Time Per Iteration Time Training Time and Cost to certain Accuracy Total Training Time

1. convnet-benchmarks: https://github.com/soumith/convnet-benchmarks
2. Baidu DeepBench: https://github.com/baidu-research/DeepBench
3. Cody A. Coleman et al. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS 2017
4. TensorFlow Benchmark https://www.tensorflow.org/performance/benchmarks

Low Diversity Limited Dataset Single Metric

SLIDE 7

SLIDE 8

Related Deep Learning Benchmarks

1. https://mlperf.org/

MLPerf1

Evaluation Target Framework Compute Device Characteristics Granularity Neural Network Diversity

1. Image(Classification, Detection)
2. NLP(Translation, Sentiment Analysis)
3. Speech(Recognition)
4. Reinforcement Learning & Recommendation

Dataset ImageNet, COCO, WMT, Librispeech, MovieLens, … Evaluation Metrics Training Time, Power Use and Cost to certain Accuracy

Various Applications Various Datasets

SLIDE 9

How to evaluate HPC systems for machine learning?

SLIDE 10

Our Work on Workload Analysis for Deep Learning

Image Classification Machine Translation Language Model Question Answering

Applications

VGG ResNet Seq2seq RNN LM AoA Reader

Models Dataset

Real Data

WikiText-2 CBTest Cifar Tatoeba

Dummy Data

Real time Controllable Easy to obtain Generative

Preliminary workload analysis

SLIDE 11

Our Work

Time
Time of every operation type within one iteration
Time of phases within one iteration

100 200 300 400 500 600 700 VGG ResNet RNN LM AoA Reader Seq2seq Time(ms) Data Forward Backward Loss Update

SLIDE 12

Workload Analysis

Memory Usage
Memory Usage Break Down
Memory Usage – Input Size

2000 4000 6000 8000 10000 12000 14000 16000 VGG ResNet RNN LM AoA Reader Seq2seq Memory Use(MB) Weight Mediate Result + Temp 0.0 0.2 0.4 0.6 0.8 1.0 2,048 4,096 6,144 8,192 10,240 12,288 14,336 16,384 18,432 50000 100000 150000 200000 Ratio Memory Use(MB) Pic Area(Pixel2) Traning Inference Training/Inference 0.0 0.2 0.4 0.6 0.8 1.0 2,048 4,096 6,144 8,192 10,240 12,288 14,336 16,384 18,432 200 400 600 800 1000 1200 Ratio Memory Use(MB) Sequence Length Training Inference Training/Inference

SLIDE 13

Workload Characterization

Hardware Counters
For GPU

GPU Occupancy Warp Execution Efficiency Warp Non-Pred Execution Efficiency Bandwidth Utilization TFLPOS Normalized 1 0.46 1.00 1.00 4.02 5.65

SLIDE 14

Questions about an HPC Oriented Deep Learning Benchmark

Questions we need to think:
Model Selection
Various application areas?
A synthetic model with main features?
Dataset
Fixed data set (Imagenet)?
A Generative Data?
Metrics
Time for training?
Gflops?
AI operations per second?

SLIDE 15