and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 - - PowerPoint PPT Presentation

and inference for convolutional
SMART_READER_LITE
LIVE PREVIEW

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 - - PowerPoint PPT Presentation

Band-limited Training and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast Training of Convolutional Networks through FFTs Fast Convolutional Nets With fbfft: A GPU Performance Evaluation Data: x


slide-1
SLIDE 1

Band-limited Training and Inference for Convolutional Neural Networks

1

slide-2
SLIDE 2

2

slide-3
SLIDE 3

3

FFT IFFT

slide-4
SLIDE 4

4

slide-5
SLIDE 5

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt) yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

  • fft
slide-6
SLIDE 6

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt) yfft

  • fft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

slide-7
SLIDE 7

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt)

  • fft

yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

slide-8
SLIDE 8

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt)

  • fft

yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

slide-9
SLIDE 9

FFT(x) Data: x Filter: y xfft  yfft Out: o xfft FFT(y) IFFT(offt)

  • fft

yfft

Mathieu et al.: “Fast Training of Convolutional Networks through FFTs” Fast Convolutional Nets With fbfft: A GPU Performance Evaluation

cu cuDNN DNN: Subs ubstantial tantial memor

  • ry

y wor

  • rkspace

space neede ded d for

  • r intermed

ermediate iate resul ults. ts.

slide-10
SLIDE 10

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

  • fft

yCfft Band-limited (xfft) xCfft  yCfft

Band-limiting = masking out high frequencies

slide-11
SLIDE 11

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

  • fft

Less memory used yCfft Band-limited (xfft) xCfft  yCfft

slide-12
SLIDE 12

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

  • fft

Faster computation Less memory used yCfft Band-limited (xfft) xCfft  yCfft

slide-13
SLIDE 13

Preserve enough of the spectrum to retain high accuracy of models.

FFT(x) Data: x Filter: y Out: o xfft xCfft Band-limited (yfft) FFT(y) yfft IFFT(offt)

  • fft

Faster computation Less memory used yCfft Band-limited (xfft) xCfft  yCfft

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

  • 2. Conjugate symmetry

1-j 1+j

slide-16
SLIDE 16

16

  • 2. Conjugate symmetry

1+j

slide-17
SLIDE 17

17

DC 1+j

  • 2. Conjugate symmetry
  • 3. Real values
slide-18
SLIDE 18

18

DC 1+j

  • 2. Conjugate symmetry
  • 3. Real values
  • 4. No constraints
slide-19
SLIDE 19

19

DC

  • 2. Conjugate symmetry
  • 3. Real values
  • 4. No constraints
  • 5. 1st compression
slide-20
SLIDE 20

20

DC

  • 2. Conjugate symmetry
  • 3. Real values
  • 4. No constraints
  • 5. 1st compression
slide-21
SLIDE 21

21

DC

  • 2. Conjugate symmetry
  • 3. Real values
  • 4. No constraints
  • 5. 1st compression
  • 6. 2nd compression
slide-22
SLIDE 22

22

DC

  • 2. Conjugate symmetry
  • 3. Real values
  • 4. No constraints
  • 5. 1st compression
  • 6. 2nd compression
slide-23
SLIDE 23

23

85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)

ResNet-18 on CIFAR-10

slide-24
SLIDE 24

24

93.5%

85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)

ResNet-18 on CIFAR-10

slide-25
SLIDE 25

25

93.5% 92%

85 90 95 20 40 60 80 Test Accuracy (%) Compression rate (%)

ResNet-18 on CIFAR-10

slide-26
SLIDE 26

85 90 95 20 40 60 80 Test Accuracy (%)

ResNet-18 on CIFAR-10

26

93.5% 92%

60 70 80 20 40 60 80 Test Accuracy (%) Compression rate (%)

DenseNet-121 on CIFAR-100

75.3% 71.2%

slide-27
SLIDE 27

▪ ▪ ▪ ▪ ▪ ▪ ▪

27

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

30

▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

slide-31
SLIDE 31

31

Cross-correlate input data and filter: x ∗c y Fx ω = F x n Fy ω = F y n x ∗c y = F−1(Fx ω ʘ Fy ω ) Spectrum of convolution: S ω = Fx ω ʘ Fy ω 𝐍𝐝 𝛛 = ቊ𝟐, 𝛛 ≤ 𝐝 𝐏, 𝛛 > 𝐝 x ∗c y = F−1[ Fx ω ʘ Mc ω ) ʘ (Fy ω ʘ Mc ω ] x ∗c y = F−1 S ω ʘ Mc ω

Energy (Parseval’s theorem): σ𝑜=0

𝑂−1 |𝑦 𝑜 |2 = σ𝜕=0 2𝜌

|𝐺

𝑦 𝜕 |2

slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

10 20 30 40 50 60 70 80 20 40 60 80 Test accuracy (%) Inference Compression Rate (%) C=50 C=75

DenseNet-121 on CIFAR-100

slide-34
SLIDE 34

34

10 20 30 40 50 60 70 80 20 40 60 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=50 C=75 C=85

DenseNet-121 on CIFAR-100

slide-35
SLIDE 35

35

50 100 20 40 60 80 Normalized performance (%) GPU memory allocated 50 100 20 40 60 80 Normalized performance (%) Compression rate (%) Epoch time

ResNet-18 on CIFAR-10

slide-36
SLIDE 36

36

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 Train Compression Rate (%):

ResNet-18 on CIFAR-10

slide-37
SLIDE 37

37

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=85 Train Compression Rate (%):

ResNet-18 on CIFAR-10

slide-38
SLIDE 38

38

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=85 Train Compression Rate (%):

Smooth degradation of accuracy during inference

ResNet-18 on CIFAR-10

slide-39
SLIDE 39

39

20 40 60 80 100 10 20 30 40 50 60 70 80 Test accuracy (%) Inference Compression Rate (%) C=0 C=30 C=50 C=85 Train Compression Rate (%):

Apply the same compression rate to training and inference

ResNet-18 on CIFAR-10

slide-40
SLIDE 40

40

50 100 20 40 60 80 GPU memory allocated 50 100 20 40 60 80 Compression rate (%) Epoch time

85 90 95 50 Test Accuracy (%) 60 70 80 50 Test Accuracy (%) Compression rate (%)

slide-41
SLIDE 41

41

slide-42
SLIDE 42

42

slide-43
SLIDE 43

43

slide-44
SLIDE 44

44

slide-45
SLIDE 45

45

slide-46
SLIDE 46

46

“Speaking of longer term, it would be nice if the community migrated to a fully open sourced implementation for all of this [convolution operations, etc.]. This stuff is just too important to the progress of the field for it to be locked away in proprietary

  • implementations. The more people working together
  • n this the better for everyone. There's plenty of room

to compete on the hardware implementation side.” Scott Gray

https://github.com/soumith/convnet-benchmarks/issues/93