P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L
P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L - - PowerPoint PPT Presentation
P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L - - PowerPoint PPT Presentation
P Y T O R C H A N D T H E N E W C H A L L E N G E S O F M L LeCuns Law and the Rise of Deep Learning Gradient-Based Learning Applied to Document Recognition, LeCun et al., 1998 6,000 5371 4,800 Citation Count 3711 3,600 2472 2,400
Citation Count Gradient-Based Learning Applied to Document Recognition, LeCun et al., 1998
LeCun’s Law and the Rise of Deep Learning
TRANSLATION SPARK AR OCULUS VR BLOOD DONATIONS
400T+
PREDICTIONS PER DAY
1B+
PHONES RUNNING NEURAL NETS GLOBALLY
SIMPLICITY OVER COMPLEXITY HARDWARE ACCELERATED INFERENCE DISTRIBUTED TRAINING DYNAMIC NEURAL NETWORKS EAGER & GRAPH-BASED EXECUTION
WHAT IS PYTORCH?
BUILT BY THE COMMUNITY BUILT FOR PRODUCTION DESIGNED FOR RESEARCHERS
BUILT BY THE COMMUNITY BUILT FOR PRODUCTION DESIGNED FOR RESEARCHERS
~1,200
C O N T R I B U T O R S50%+
Y O Y G R O W T H22K
P Y T O R C H F O R U M U S E R SBUILT BY THE COMMUNITY BUILT FOR PRODUCTION DESIGNED FOR RESEARCHERS
GROWTH IN ARXIV MENTIONS IN RESEARCH PAPERS
16K+
S T U D E N T S E N R O L L E D I N C O U R S E S21M
M I N U T E S O F W A T C H T I M E I N T H E L A S T 1 2 M O N T H S U D A C I T Y F A S T . A IPractical Deep Learning for Coders, V3 Part 2: Deep Learning from the Foundations Introduction to Machine Learning for Coders A Code-First Introduction to Natural Language Processing
BUILT BY THE COMMUNITY BUILT FOR PRODUCTION DESIGNED FOR RESEARCHERS
RESEARCH PRODUCTION
C O R E P R I N C I P L E S
BUILDING FOR SCALE DEVELOPER EFFICIENCY
DEVELOPER EFFICIENCY
ENABLING A HIGH VELOCITY OF MODEL ITERATION AND INNOVATION
`
C L E A N A P I S
`
Today, we name and access dimensions by comment: # Tensor[N, C, H, W] images = torch.randn(32, 3, 56, 56) images.sum(dim=1) images.select(dim=1, index=0) But naming explicitly leads to more readable and maintainable code: NCHW = [‘N’, ‘C’, ‘H’, ‘W’] images = torch.randn(32, 3, 56, 56, names=NCHW)N A M E D T E N S O R S
EXPERIMENTAL`
T O R C H S C R I P T
Models are Python TorchScript programs, an optimizable subset of Python + Same “models are programs” idea + Production deployment + No Python dependency + Compilation for performance
- ptimization
C O R E P R I N C I P L E S
BUILDING FOR SCALE DEVELOPER EFFICIENCY
BUILDING FOR SCALE
HIGH PERFORMANCE EXECUTION FOR MODEL TRAINING AND INFERENCE
30% 50%
FB data used in an ML pipeline TODAY FB data used in an ML pipeline in 2018
3X
ML Data Growth in One Year
G R O W T H O F D A T A I N M L P I P E L I N E S
WORKFLOWS TRAINED RANKING ENGINEERS COMPUTE CONSUMED
3X
INCREASE2X
INCREASE3X
INCREASES C A L E O F M L T R A I N I N G A T F A C E B O O K
O P T I M I Z I N G F O R H A R D W A R E B A C K E N D S
MKL-DNN Cuda/CuDNN (Q)NNPACK FBGEMM XLA Glow TVMFeature Engineering
1
Training
2
Inference
3
Lightning (30X Flash Drives JBOF) Tioga Pass (Dual CPU, High Mem) Tioga Passsxm2
Q U A N T I Z A T I O N
Efficient inference on server and mobile devices using reduced precision math.
SIMPLICITY OF USE ACCURACY & PERF CONTROL DYNAMIC QUANTIZATION POST TRAINING QUANTIZATION QUANTIZATION AWARE TRAINING
4x 2-4x
LESS MEMORY COMPUTE SPEEDUPP Y T O R C H R E S E A R C H P R O T O T Y P I N G P R O D U C T I O N D E P L O Y M E N T +
NAMED TENSORS
PyTorch set the bar for ML Developer UX by focusing on expressivity and productivity "I want to write a program, not to (manually) build a graph" Where are similar areas for improvement today?
Data has semantic meaning!
But we force users to drop that context and use an abstract "Tensor" mathematical object
Type to enter a caption.Key Insight: Named Dimensions
Inspired by and done in collaboration with Prof. Alexander Rush, now Cornell Tech.
Key Insight: Named Dimensions
Today we name and access dimensions by comment
Today we name and access dimensions by comment But naming explicitly leads to more readable and maintainable code
Key Insight: Named Dimensions
By retaining semantic meaning, we also avoid common "Tensor Pitfalls"
- Accidental Broadcasting
- Accidental Alignment
By retaining semantic meaning, we also avoid common "Tensor Pitfalls"
- Accidental Broadcasting
- Accidental Alignment
Accidental Broadcasting
We didn't expect broadcasting to happen, but it did:
Accidental Broadcasting
We didn't expect broadcasting to happen, but it did: We can catch this automatically!
Accidental Broadcasting
We didn't expect broadcasting to happen, but it did: Broadcast by position, but check that dimension names are aligned. We can catch this automatically!
By retaining semantic meaning, we also avoid common "Tensor Pitfalls"
- Accidental Broadcasting
- Accidental Alignment
Accidental Alignment
No 1->N broadcast occurs across semantically distinct dimensions, but size happens to match.
Accidental Alignment
No 1->N broadcasting occurs across semantically distinct dimensions, but size happens to match. But there are so many formats!
Accidental Alignment
No 1->N broadcasting occurs across semantically distinct dimensions, but size happens to match. But there are so many formats! There is a "time bomb" if I ever normalize the wrong format and the "unaligned" dimensions have the same size!
Accidental Alignment
No 1->N broadcasting occurs across semantically distinct dimensions, but size happens to match.
Accidental Alignment
No 1->N broadcasting occurs across semantically distinct dimensions, but size happens to match. If we broadcast by name (align_as), we only need a single normalize function for all formats
What about mixing named and unnamed Tensors? I don't want to convert my entire program at once...
Coexistence with Unnamed
Named Tensors can coexist with Unnamed Tensors. Let's remove the requirement that mean, stdv are named
Coexistence with Unnamed
Named Tensors can coexist with Unnamed Tensors. Let's remove the requirement that mean, stdv are named refine_names lifts unnamed tensors to be named tensors
Experimental in 1.3
N a m e d T e n s o r s C o r e F u n c t i o n a l i t yCommon torch operators are supported in eager mode (Unnamed) autograd is supported
T u t o r i a lSee our in-depth MultiheadedAttention tutorial
E x p a n d e d C o v e r a g eExpanded NN package coverage Named autograd support Serialization, multiprocessing, distributed, JIT, mypy
Future Work
P y T o r c h J I T / T o r c h S c r i p t
A compiler and language infrastructure for machine learning
What is the PyTorch JIT?We need a system that can:
- 1. Capture the structure of PyTorch programs.
- 2. Use that structure to optimize.
We need a system that can:
- 1. Capture the structure of PyTorch programs.
TorchScript
- 2. Use that structure to optimize.
JIT Compiler
Problem StatementTorchScript
A static, high-performance subset of Python.
- 1. Prototype your model with PyTorch
- 2. Control flow is preserved
- 3. First-class support for lists, dicts, etc.
PyTorch JIT
An optimizing just-in-time compiler for PyTorch programs.
- 1. Lightweight, thread-safe interpreter
- 2. Easy to write custom transformations
- 3. Not just for inference! Autodiff support.
Recursive Neural Network Grammars
C A S E S T U D Y— Complex dynamic behavior based on the inputs — Typically written in pure C++
Complex Control Flow
Use common data structures
Define your own classes
Define your own classes
JIT as a Platform
W H A T ' S N E X T ? Q U A N T I Z A T I O NModel quantization done safely and automatically using JIT transformations.
M O B I L EA lightweight interpreter that can run on-device.
B A C K E N D SSupport for lowering models to static graph compilers, like TVM, Glow, XLA.
Q U A N T I Z A T I O N
— Neural networks inference is expensive — IoT and mobile devices with limited resources — Design models for efficient inference at scale
S Y S T E M - M O D E L C O - D E S I G NGive tools for building and running efficient models
OUR MISSIONQUANTIZATION
Can neural networks run in lower precision? float16, int8 Supported by modern hardware x86 CPU, ARM CPU, NVIDIA Volta & Turing, Qualcomm DSP, … Maintaining accuracy is hard Working approaches, ongoing research
- ● ● ●
N × float32 N × uint8
4x 2-4x
less memory compute speedup
- ● ● ●
scale
float32
zero_point
int32
float_val = (uint8_val - zero_point) × scale
W O R K F L O W S
Quantization Dataset Requirements Works Best For Accuracy Dynamic Quantization weights only small batch LSTMs and MLPs good Post Training Quantization weights and activations calibration CNNs good Quantization-aware Training weights and activations fine-tuning all best Or build your own!
WORKFLOW: DYNAMIC QUANTIZATION
- How: one line API
- What: quantize weights once, activations at runtime
- Good for LSTMs and MLPs with small batch size
- Savings 2x faster compute, 4x less memory
nnqd.Linear
W
int8
bias
float
X
float
Y
float
# load or train your model model = WordLanguageModel() model.load_state_dict(torch.load("model.pt")) # quantize qmodel = quantize_dynamic(model, dtype=torch.quint8) # use or deploy for C++ inference- utput = qmodel(input)
WORKFLOW: POST TRAINING
- How: tweak model, calibrate on data, convert
- What: quantize weight and activations for entire
model or submodules
- Good for CNNs (if the accuracy drop is acceptable)
- Savings 1.5-2x faster compute, 4x less memory
nnq.Conv
W
int8
bias
float
X
uint8
Y
uint8 Conv2d
W
float
bias
float
X
float
Y
float
- bserve
r
- bserve
r
- ut
qparams
calibrate quantize
WORKFLOW: POST TRAINING
- How: tweak model, calibrate on data, convert
- What: quantize weight and activations for entire
model or submodules
- Good for CNNs (if the accuracy drop is acceptable)
- Savings 1.5-2x faster compute, 4x less memory
# load or train your model model = ResNet50() model.load_state_dict(torch.load("model.pt")) # tweak model for best results # change code directly or use manipulation APIs model = quantization.fuse_modules(model, [["conv1", "bn1", "relu1"]])
WORKFLOW: POST TRAINING
- How: tweak model, calibrate on data, convert
- What: quantize weight and activations for entire
model or submodules
- Good for CNNs (if the accuracy drop is acceptable)
- Savings 1.5-2x faster compute, 4x less memory
WORKFLOW: POST TRAINING
- How: tweak model, calibrate on data, convert
- What: quantize weight and activations for entire
model or submodules
- Good for CNNs (if the accuracy drop is acceptable)
- Savings 1.5-2x faster compute, 4x less memory
WORKFLOW: POST TRAINING
- How: tweak model, calibrate on data, convert
- What: quantize weight and activations for entire
model or submodules
- Good for CNNs (if the accuracy drop is acceptable)
- Savings 1.5-2x faster compute, 4x less memory
[["conv1", "bn1", "relu1"]]) # specify which part to quantize and how qmodel = quantization.prepare(model, {"": quantization.default_qconfig}) # configurable! # collect calibration statistics qmodel.eval() for batch, target in data_loader: model(batch) # get the quantized model qmodel = quantization.convert(qmodel) # use or deploy for C++ inference qmodel(input) torch.jit.script(qmodel).save(“quantized.pt”)
SOON: JIT TO SIMPLIFY PREPARATION
Structural tweaks for TorchScript models automatically: fusion, batch norm folding, etc Status: coming in 1.4, check nightlies
model = torch.jit.script(model) # tweak model for best results # change code or use manipulation APIs model = quantization.fuse_modules(model, [["conv1", "bn1", "relu1"]]) qmodel = quantization.prepare_script(model, {"": quantization.default_qconfig}) ... qmodel = quantization.convert_script(qmodel) qmodel.save(“quantized.pt")PYTORCH AT CORE
- Same framework, no conversion
- Same serialization
- Python or TorchScript
- Eager at its core
- Most logic is in python
- Extensibility, debuggers, stack traces
- Extensible API
- New layers
- Observers
- Quantization techniques
- Partial quantization
torch.quantization.* torch.quantization.Observer torch.quantization.FakeQuant torch.nn.quantized.* torch.nn.quantized.dynamic.* torch.quantize_per_tensor torch.quantize_per_channel
FRAMEWORK SUPPORT
- Basic support - enough for CNNs and RNNs
- Backends
- x86 CPU in 1.3 (via FBGEMM)
- ARM CPU early alpha (QNNPACK)
- In 1.4
- Broader ops coverage
- CUDA support
- API simplification for JIT models
view max_pool2d avg_pool2d clone resize slice Conv2d Linear RNN LSTM * topk sort upsample_nearest2d interpolate + relu max
Experimental in 1.3
T R Y I T N O W Q u a n t i z a t i o n c o r e a n d w o r k f l o w sPost training, dynamic and quantization-aware training x86 and ARM CPU Backends
M o r e b a c k e n d s a n d J I T w o r k f l o wSimpler workflow for TorchScript Expanding operator coverage
Q u a n t i z e d m o d e l s a n d t u t o r i a l s t o o b t a i n t h e mResNet50 ResNext-101 InceptionV3 MobileNetV2 BERT … more to come
Coming in 1.4 Available in torch.hub
pytorch.org
P Y T O R C H 1 . 3T H A N K Y O U !
T H A N K Y O U !
T H A N K Y O U !