TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions
Zhihao Jia, Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken Stanford University
1
12/14/19
TASO: Optimizing Deep Learning with Automatic Generation of Graph - - PowerPoint PPT Presentation
TASO: Optimizing Deep Learning with Automatic Generation of Graph Substitutions Zhihao Jia , Oded Padon, James Thomas, Todd Warszawski, Matei Zaharia, and Alex Aiken Stanford University SOSP19 12/14/19 1 Current Rule-based DNN
1
12/14/19
2 conv3x3 + relu conv1x1 + relu
Input conv3x3 add relu
Rule-based Optimizer
conv3x3 conv1x1 Input conv3x3 add relu relu relu
Computation Graph Optimized Graph Fuse conv + relu
conv relu conv + relu
3
Fuse conv + relu Fuse conv + batch normalization Fuse multi. convs
Rule-based Optimizer
When I turned on XLA (TensorFlow’s graph optimizer), the training speed is about 20% slower. With XLA, my program is almost 2x slower than without XLA
Experts’ heuristics do not apply to all DNNs/hardware
4
Experts’ heuristics do not apply to all DNNs/hardware
5
Experts’ heuristics do not apply to all DNNs/hardware
6
Conv3x3 + Relu Conv1x1 + Relu
Input
Conv3x3
Add Relu
Conv3x3 + Relu Conv3x3 + Relu
Input
Conv3x3
Add Relu
Enlarge convs
Conv3x3 + Relu
Input
Conv3x3
Add Relu Split
Fuse convs Fuse conv & add
Conv3x3 + Relu
Input
Conv3x3 + Relu
Fuse conv & relu
Conv3x3 + Relu
Input
Conv3x3
Relu 7
8
9
10
11
12
Input Comp. Graph
Optimized
13
14
Subst. Generator Subst. Verifier Graph Optimizer
15
Subst. Generator Subst. Verifier Graph Optimizer
I1 IK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
16
Subst. Generator Subst. Verifier Graph Optimizer
I1 IK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
O1 OK
17
Subst. Generator Subst. Verifier Graph Optimizer
18
Subst. Generator Subst. Verifier Graph Optimizer
19
Pn.
Subst. Generator Subst. Verifier Graph Optimizer
20
∃𝑦, 𝑥%, 𝑥& . 𝐷𝑝𝑜𝑤 𝑦, 𝑥%), 𝐷𝑝𝑜𝑤(𝑦, 𝑥& ≠ 𝑇𝑞𝑚𝑗𝑢 𝐷𝑝𝑜𝑤 𝑦, 𝐷𝑝𝑜𝑑𝑏𝑢 𝑥%, 𝑥&
Conv W1 X W2 Conv Concat Conv W1 W2 X Y1 Y2 Split Y1 Y2
(Conv(x, w1), Conv (x, w2)) Split(Conv(x, Concat(w1, w2)))
𝐷𝑝𝑜𝑤 𝑦, 𝐷𝑝𝑜𝑑𝑏𝑢 𝑥%, 𝑥& = 𝐷𝑝𝑜𝑑𝑏𝑢 𝐷𝑝𝑜𝑤(𝑦, 𝑥%), 𝐷𝑝𝑜𝑤 𝑦, 𝑥&
21
22
Subst. Generator Subst. Verifier Graph Optimizer
3 6 9 12 15
ResNet-50 NasNet-A ResNeXt-50 NasRNN BERT-Large
23
3 6 9 12 15
ResNet-50 NasNet-A ResNeXt-50 NasRNN BERT-Large
24
25
Not covered in TensorFlow
26
27
1 2 3 4
Maxmum GraSh SubstitutiRn Size
1 1.5 2 2.5 3
Relative SSeeGuS 1as1et-A 5es1eXt-50 BE5T
DWC 3x3 Input1 Add Conv 1x1 Conv 1x1 Add Avg 3x3 Avg 3x3 Avg 3x3 concat Add Add Add DWC 5x5 DWC 3x3 Conv 1x1 Conv 1x1 DWC 5x5 DWC 3x3 Conv 1x1 Input2
A X
Add Avg 3x3 Avg 3x3
X
DWC 3x3
Y Y
Add DWC 3x3 Conv 1x1 Conv 1x1 DWC 5x5 X2 X1 W1 W3 W2 W4 Conv 1x1 Concat DWC 5x5 Concat Concat X2 X1 W3 W4 W1 W2
Add: element-wise addition Conv: standard conv DWC: depth-wise conv
28
29
30
Experts’ heuristics do not apply to all DNNs/hardware
31
add DWC 3x3 conv 1x1 conv 1x1 DWC 5x5 X2 X1 W1 W3 W2 W4 conv 1x1 concat DWC 5x5 concat concat X2 X1 W3 W4 W1 W2 add conv 3x3 conv 3x3 conv 1x1 X W1 W3 W2 concat W2 W1 pad 3x3 conv 3x3 X conv 3x3 W3
32
Optimized
Input Comp. Graph Operator Specifications
33
Candidate Substitutions
Verified Substitutions
0.5 1 1.5 2 2.5 3
ResNet-50 NasNet-A ResNeXt-50 NasRNN BERT-Large
34
35