Band-limited Training and Inference for Convolutional Neural Networks
1
and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 - - PowerPoint PPT Presentation
Band-limited Training and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast Training of Convolutional Networks through FFTs Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x
1
2
3
4
FFT(x) Data: x Filter: y xfft yfft Out: o xfft FFT(y) IFFT(offt) yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
FFT(x) Data: x Filter: y xfft yfft Out: o xfft FFT(y) IFFT(offt) yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
FFT(x) Data: x Filter: y xfft yfft Out: o xfft FFT(y) IFFT(offt)
yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
FFT(x) Data: x Filter: y xfft yfft Out: o xfft FFT(y) IFFT(offt)
yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
FFT(x) Data: x Filter: y xfft yfft Out: o xfft FFT(y) IFFT(offt)
yfft
Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation
cu cuDNN DNN: Subs ubstantial tantial memor
y wor
space neede ded d for
ermediate iate resul ults. ts.
FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)
yCfft Band-limited (xfft) xCfft yCfft
FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)
Less memory used yCfft Band-limited (xfft) xCfft yCfft
FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)
Faster computation Less memory used yCfft Band-limited (xfft) xCfft yCfft
FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)
Faster computation Less memory used yCfft Band-limited (xfft) xCfft yCfft
14
15
1-j 1+j
16
1+j
17
DC 1+j
18
DC 1+j
19
DC
20
DC
21
DC
22
DC
23
85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)
ResNet-18 on CIFAR-10
24
93.5%
85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)
ResNet-18 on CIFAR-10
25
93.5% 92%
85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)
ResNet-18 on CIFAR-10
85 90 95 20 40 60 80 Test Accuracy (%)
ResNet-18 on CIFAR-10
26
93.5% 92%
60 70 80 20 40 60 80 Test Accuracy (%) Compression rate (%)
DenseNet-121 on CIFAR-100
75.3% 71.2%
▪ ▪ ▪ ▪ ▪ ▪ ▪
27
30
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
31
𝑂−1 |𝑦 𝑜 |2 = σ𝜕=0 2𝜌
𝑦 𝜕 |2
32
33
10 20 30 40 50 60 70 80 20 40 60 80 Test accuracy (%) Inference Compression Rate (%) C=50 C=75
DenseNet-121 on CIFAR-100
34
10 20 30 40 50 60 70 80 20 40 60 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=50 C=75 C=85
DenseNet-121 on CIFAR-100
35
50 100 20 40 60 80 Normalized performance (%) GPU memory allocated 50 100 20 40 60 80 Normalized performance (%) Compression rate (%) Epoch time
ResNet-18 on CIFAR-10
36
20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 Train Compression Rate (%):
ResNet-18 on CIFAR-10
37
20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=85 Train Compression Rate (%):
ResNet-18 on CIFAR-10
38
20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=85 Train Compression Rate (%):
ResNet-18 on CIFAR-10
39
20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=30 C=50 C=85 Train Compression Rate (%):
ResNet-18 on CIFAR-10
40
50 100 20 40 60 80 GPU memory allocated 50 100 20 40 60 80 Compression rate (%) Epoch time
85 90 95 50 Test Accuracy (%) 60 70 80 50 Test Accuracy (%) Compression rate (%)
41
42
43
44
45
46
https://github.com/soumith/convnet-benchmarks/issues/93