[PPT] - and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 PowerPoint Presentation

SLIDE 1

Band-limited Training and Inference for Convolutional Neural Networks

1

SLIDE 2

2

SLIDE 3

3

FFT IFFT

SLIDE 4

4

SLIDE 5

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt) yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

fft

SLIDE 6

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt) yfft

fft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

SLIDE 7

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt)

fft

yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

SLIDE 8

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt)

fft

yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

SLIDE 9

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt)

fft

yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

cu cuDNN DNN: Subs ubstantial tantial memor

ry

y wor

rkspace

space neede ded d for

r intermed

ermediate iate resul ults. ts.

SLIDE 10

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

fft

yCfft Band-limited (xfft) xCfft  yCfft

Band-limiting = masking out high frequencies

SLIDE 11

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

fft

Less memory used yCfft Band-limited (xfft) xCfft  yCfft

SLIDE 12

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

fft

Faster computation Less memory used yCfft Band-limited (xfft) xCfft  yCfft

SLIDE 13

Preserve enough of the spectrum to retain high accuracy of models.

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

fft

Faster computation Less memory used yCfft Band-limited (xfft) xCfft  yCfft

SLIDE 14

14

SLIDE 15

15

2. Conjugate symmetry

1-j 1+j

SLIDE 16

16

2. Conjugate symmetry

1+j

SLIDE 17

17

DC 1+j

2. Conjugate symmetry
3. Real values

SLIDE 18

18

DC 1+j

2. Conjugate symmetry
3. Real values
4. No constraints

SLIDE 19

19

DC

2. Conjugate symmetry
3. Real values
4. No constraints
5. 1st compression

SLIDE 20

20

DC

2. Conjugate symmetry
3. Real values
4. No constraints
5. 1st compression

SLIDE 21

21

DC

2. Conjugate symmetry
3. Real values
4. No constraints
5. 1st compression
6. 2nd compression

SLIDE 22

22

DC

2. Conjugate symmetry
3. Real values
4. No constraints
5. 1st compression
6. 2nd compression

SLIDE 23

23

85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)

ResNet-18 on CIFAR-10

SLIDE 24

24

93.5%

85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)

ResNet-18 on CIFAR-10

SLIDE 25

25

93.5% 92%

85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)

ResNet-18 on CIFAR-10

SLIDE 26

85 90 95 20 40 60 80 Test Accuracy (%)

ResNet-18 on CIFAR-10

26

93.5% 92%

60 70 80 20 40 60 80 Test Accuracy (%) Compression rate (%)

DenseNet-121 on CIFAR-100

75.3% 71.2%

SLIDE 27

▪ ▪ ▪ ▪ ▪ ▪ ▪

27

SLIDE 28

SLIDE 29

SLIDE 30

30

▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

SLIDE 31

31

Cross-correlate input data and filter: x ∗c y Fx ω = F x n Fy ω = F y n x ∗c y = F−1(Fx ω ʘ Fy ω ) Spectrum of convolution: S ω = Fx ω ʘ Fy ω 𝐍𝐝 𝛛 = ቊ𝟐, 𝛛 ≤ 𝐝 𝐏, 𝛛 > 𝐝 x ∗c y = F−1[ Fx ω ʘ Mc ω ) ʘ (Fy ω ʘ Mc ω ] x ∗c y = F−1 S ω ʘ Mc ω

Energy (Parseval’s theorem): σ𝑜=0

𝑂−1 |𝑦 𝑜 |2 = σ𝜕=0 2𝜌

|𝐺

𝑦 𝜕 |2

SLIDE 32

32

SLIDE 33

33

10 20 30 40 50 60 70 80 20 40 60 80 Test accuracy (%) Inference Compression Rate (%) C=50 C=75

DenseNet-121 on CIFAR-100

SLIDE 34

34

10 20 30 40 50 60 70 80 20 40 60 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=50 C=75 C=85

DenseNet-121 on CIFAR-100

SLIDE 35

35

50 100 20 40 60 80 Normalized performance (%) GPU memory allocated 50 100 20 40 60 80 Normalized performance (%) Compression rate (%) Epoch time

ResNet-18 on CIFAR-10

SLIDE 36

36

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 Train Compression Rate (%):

ResNet-18 on CIFAR-10

SLIDE 37

37

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=85 Train Compression Rate (%):

ResNet-18 on CIFAR-10

SLIDE 38

38

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=85 Train Compression Rate (%):

Smooth degradation of accuracy during inference

ResNet-18 on CIFAR-10

SLIDE 39

39

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=30 C=50 C=85 Train Compression Rate (%):

Apply the same compression rate to training and inference

ResNet-18 on CIFAR-10

SLIDE 40

40

50 100 20 40 60 80 GPU memory allocated 50 100 20 40 60 80 Compression rate (%) Epoch time

85 90 95 50 Test Accuracy (%) 60 70 80 50 Test Accuracy (%) Compression rate (%)

SLIDE 41

41

SLIDE 42

42

SLIDE 43

43

SLIDE 44

44

SLIDE 45

45

SLIDE 46

46

“Speaking of longer term, it would be nice if the community migrated to a fully open sourced implementation for all of this [convolution operations, etc.]. This stuff is just too important to the progress of the field for it to be locked away in proprietary

implementations. The more people working together
n this the better for everyone. There's plenty of room

to compete on the hardware implementation side.” Scott Gray

https://github.com/soumith/convnet-benchmarks/issues/93