Marc Riera, Jose Maria Arnau, Antonio GonzΓ‘lez
Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity - - PowerPoint PPT Presentation
Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity - - PowerPoint PPT Presentation
Computatio ion Reuse in in DNNs by Exploiting Input Sim imilarity Marc Riera , Jose Maria Arnau, Antonio Gonzlez Sequence Processing Applications Speech Audio Signal 4/06/2018 ISCA 2018 2 Sequence Processing Applications 4/06/2018 ISCA
Sequence Processing Applications
4/06/2018 2
Speech Audio Signal
ISCA 2018
Sequence Processing Applications
4/06/2018 3 ISCA 2018
Sequence Processing Applications
4/06/2018 4 ISCA 2018
Sequence Processing Applications
4/06/2018 5 ISCA 2018
Sequence Processing Applications
4/06/2018 6
Speech Recognition DNN executions to classify a sequence of audio frames in phonemes
ISCA 2018
Benchmarks
4/06/2018 7 ISCA 2018
DNN Name DNN Type DNN Application #Parameters Accuracy Kaldi MLP Acoustic Scoring 4,7M 89,04% EESEN RNN Speech Recognition 11M 68,85% C3D CNN Video Classification 78M 93,48% AutoPilot CNN Self-Driving Cars 1,6M 99,63%
In Input Sim imilarity
4/06/2018 8
45% 69% 77% 52% 61% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Kaldi C3D Autopilot EESEN Average Input Similarity (%)
ISCA 2018
Exploiting Temporal Sim imilarity Example
4/06/2018 9
Frame i Frame i+1
Baseline
N π½0
π
π½1
π
π½2
π
ππ = π½0
ππ₯0 + π½1 ππ₯1 + π½2 ππ₯2 + π
π₯0 π₯1 π₯2 N π½0
π+1
π½1
π+1
π½2
π+1
ππ+1 = π½0
π+1π₯0 + π½1 π+1π₯1 + π½2 π+1π₯2 + π
π₯0 π₯1 π₯2
ISCA 2018
Exploiting Temporal Sim imilarity Example
4/06/2018 10
Frame i Frame i+1
Proposal
N π½0
π
π½1
π
π½2
π
ππ = π½0
ππ₯0 + π½1 ππ₯1 + π½2 ππ₯2 + π
π₯0 π₯1 π₯2 N π½0
π+1
π½1
π+1
π½2
π+1
π·π+π = π·π + (π±π
π+πβπ±π π )ππ
π₯0 π₯1 π₯2
Number of computations before = 6 Number of computations after = 2
Note: Substraction of the inputs is almost negligible since its performed once per input
ISCA 2018
Computatio ion Reuse
4/06/2018 11 ISCA 2018
53% 74% 79% 55% 66% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Kaldi C3D Autopilot EESEN Average Computation Reuse (%)
DNN Processing Unit
4/06/2018 12 ISCA 2018
Tile
FC Execution in the Reuse Accelerator (1)
4/06/2018 13 ISCA 2018
FC Execution in the Reuse Accelerator (2)
4/06/2018 14 ISCA 2018
FC Execution in the Reuse Accelerator (3)
4/06/2018 15 ISCA 2018
Other Supported Layers
4/06/2018 16 ISCA 2018
Recurrent Neural Network (RNN) Convolutional Neural Network (CNN)
Evalu luation Methodology
4/06/2018 17
- Simulator to evaluate the performance and energy of the accelerator
- Design Compiler to obtain power and delay of logic modules
- 28/32nm library from Synopsys and the DesignWare logic modules
- CACTI used for SRAM and eDRAM memories
- MICRON LPDDR4 for main Memory
- Accelerator Configuration:
ISCA 2018
Memory ry Footprint Overheads
4/06/2018 18 ISCA 2018
2 4 6 8 10 12 14 16 18 20 On-Chip IO Buffer Off-Chip Main Memory Memory Increase (%)
Results: SpeedUp
4/06/2018 19 ISCA 2018
Results: Energy Savin ings
4/06/2018 20 ISCA 2018
Conclusions
4/06/2018 21 ISCA 2018
- More than 60% of the inputs remain unmodified respect the previous execution
- Our proposed scheme checks which inputs have changed:
- Unmodified inputs are ignored, avoiding computations and memory accesses
- Modified inputs are used to correct the previous output of each neuron
- On average, 63% energy savings and 3.5x speedup
- Small area overhead of less than 1% mainly for additional storage
Marc Riera, Jose Maria Arnau, Antonio GonzΓ‘lez