Distributed DeepLearning at Scale Soumith Chintala Facebook AI - - PowerPoint PPT Presentation

▶

Aug 26, 2022 659 likes •1.18k views

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep Learning Research at FAIR Deep Learning on GPUs Deep Learning at scale Emerging Trends Deep Learning Research at Facebook AI Research Image

SLIDE 1

SLIDE 2

Distributed DeepLearning at Scale

Soumith Chintala

Facebook AI Research

SLIDE 3

Deep Learning Research at FAIR
Deep Learning on GPUs
Deep Learning at scale
Emerging Trends

Overview

SLIDE 4

Deep Learning Research at Facebook AI Research

SLIDE 5

Image Intelligence: Classification

SLIDE 6

Image Intelligence

Language Translation from Visual Learning

SLIDE 7

Image Intelligence : Detection

SLIDE 8

Image Intelligence : Detection

SLIDE 9

Image Intelligence : Detection

SLIDE 10

Image Intelligence : Detection

VGG# 1x1# conv# 2x2# pool# # x:#3x224x224# 512x14x14# 512x7x7# 512x1x1# 1024x1x1# fsegm(x):#224x224# fscore(x):#1x1# 512x14x14# 512x1x1#

56x56#

SLIDE 11

Image Intelligence : Detection

image scores

SLIDE 12

Image Intelligence : Detection

image scores image scores

SLIDE 13

Image Intelligence : Detection

SLIDE 14

Image Intelligence

https://code.facebook.com/posts/accessibility/

SLIDE 15

Video Intelligence

SLIDE 16

Image and Video Generation

Predicting the Future

SLIDE 17

Memory networks
Language Translation
Reading, Writing and answering Questions

Natural Language Understanding

chatbots, personal assistants

SLIDE 18

Deep Learning at Scale

SLIDE 19

Deep Learning at Scale

GPU-powered Convolution Neural Networks

SLIDE 20

Deep Learning at Scale

GPU-powered Convolution Neural Networks

SLIDE 21

Deep Learning at Scale

GPU-powered Convolution Neural Networks

Alex Khrizevsky

SLIDE 22

Deep Learning at Scale

GPU-powered Convolution Neural Networks

Alex Khrizevsky

SLIDE 23

Convolutions, GEMM take all the time
Faster Convolutions = faster research

Deep Learning at Scale

GPU-powered Convolution Neural Networks

SLIDE 24

Deep Learning at Scale

GPU-powered Convolution Neural Networks

SLIDE 25

Deep Learning at Scale

GPU-powered Convolution Neural Networks

Winograd transform based Convolutions

SLIDE 26

The standard in deep learning:

Deep Learning at Scale

GPU-powered Convolution Neural Networks

NVIDIA GPUs + CUDA + CuDNN

SLIDE 27

Exotic new hardware!
Custom chips (Yunji Chen et. al., Nervana Systems)

Deep Learning at Scale

GPU-powered Convolution Neural Networks

SLIDE 28

Use multiple GPUs on single machine

Deep Learning at Scale

Multi-GPU Training

SLIDE 29

Data parallel

Deep Learning at Scale

Multi-GPU Training

SLIDE 30

Model parallel

Deep Learning at Scale

Multi-GPU Training

SLIDE 31

Pipeline-parallel

Deep Learning at Scale

Multi-GPU Training

SLIDE 32

Bottleneck: interconnects

Deep Learning at Scale

Multi-GPU Training

SLIDE 33

Multi-machine SGD

Deep Learning at Scale

Multi-Machine Training

Send gradients

SLIDE 34

Multi-machine SGD

Deep Learning at Scale

Multi-Machine Training

Send Weights

SLIDE 35

Elastic Averaging SGD! (Sixin Zhang, Anna Choromanska, Yann LeCun)

Deep Learning at Scale

Multi-Machine Training

SLIDE 36

Elastic Averaging SGD!

Deep Learning at Scale

Multi-Machine Training

Train synchronously Occasionally, check with master Dont go too far from everyone else

SLIDE 37

Elastic Averaging SGD!

Deep Learning at Scale

Multi-Machine Training

Train synchronously Occasionally, check with neighbors Dont go too far from everyone else

SLIDE 38

Elastic Averaging SGD!
Empirical speedup of SquareRoot(N)
N = number of nodes
No communication overhead with pre-fetching
128 GPUs (32 clients * 4 GPUs)
Sharded parameters over 64 CPU servers
Tau = 10, prefetch = 5
zero overhead

Deep Learning at Scale

Multi-Machine Training

SLIDE 39

Elastic Averaging SGD!
Fun fact: Trained AlexNet in 5 epochs of Imagenet data
Good success in training Vision and Text networks

Deep Learning at Scale

Multi-Machine Training

SLIDE 40

Big Sur

Open Compute for Deep Learning

Serviceability
Thermal Efficiency
Performance

SLIDE 41

Big Sur

Open Compute for Deep Learning

Swap PCI-e Topologies with incredible ease

Rails for in-rack servicing 2.5” drive carriers Hot swappable fan modules GPU removal using 2 thumb screws Removable motherboard tray Cables to change topologies Removable GPU baseboard

SLIDE 42

Big Sur

PCI-e Topologies — Matter!

SLIDE 43

Big Sur

PCI-e Topologies — Matter!

SLIDE 44

Torch

SLIDE 45

Emerging Trends

SLIDE 46

Data / Model / Pipeline parallel seems sufficient
Torch (nn / autograd / distlearn)
Caffe

Emerging Trends

Efficient Collectives + Imperative Programs

SLIDE 47

Intel CnC, Caffe, TensorFlow, MXNet, Theano
Graph placement hints + execution
DSLs to write the computation graphs

Emerging Trends

Computational Graph Toolkits

SLIDE 48

Best of both worlds
Hard problem of automatic graph placement
Limited heuristic-driven success

Silver Bullet

Imperative Language + Graph Compiler

SLIDE 49

Big Sur Hardware
Kevin Lee kevinlee@fb.com
Doug Wimer dwimer@fb.com
Soumith Chintala soumith@fb.com
Multi-GPU / Multi-machine Training
Nicolas Vasilache ntv@fb.com
Jeff Johnson jhj@fb.com
Soumith Chintala soumith@fb.com
Computation Graphs, Automatic Placement
Jeff Johnson jhj@fb.com
Andrew Tulloch tulloch@fb.com
Yangqing Jia jiayq@fb.com
Soumith Chintala soumith@fb.com

Presence at GTC 2016

If you want to chat in-person, drop us an email

SLIDE 50