A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - - PowerPoint PPT Presentation

a case for dynamic activation quantization in cnns
SMART_READER_LITE
LIVE PREVIEW

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya - - PowerPoint PPT Presentation

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah Overview Background Proposal Search Space Architecture Results Future Work Improving CNN Efficiency


slide-1
SLIDE 1

A Case for Dynamic Activation Quantization in CNNs

Karl Taht, Surya Narayanan, Rajeev Balasubramonian University of Utah

slide-2
SLIDE 2

Overview

  • Background
  • Proposal
  • Search Space
  • Architecture
  • Results
  • Future Work
slide-3
SLIDE 3

Improving CNN Efficiency

  • Stripes: Bit-Serial Deep Neural Network Computing
  • Per-layer bit precisions net significant savings with <1% accuracy loss
  • Brute force approach to find best quantization – retraining at each step!
  • Good end result, but expensive!
  • Weight-Entropy-Based Quantization for Deep Neural Networks
  • Quantize both weights and activations
  • Guided search to find optimal quantization (entropy and clustering)
  • Still requires retraining, still a passive approach

Can we exploit adaptive reduced precision during inference?

slide-4
SLIDE 4

Proposal: Adaptive Quantization Approach (AQuA)

  • Most images contain regions of irrelevant

information for the classification task

  • Can avoid such computations all together?
  • Quantize completely regions to 0 bits
  • More simply – Crop them!
slide-5
SLIDE 5

Proposal: Activation Cropping

slide-6
SLIDE 6

Proposal: Activation Cropping

Concept: Add lightweight predictor here Save computations here

slide-7
SLIDE 7

Search Space – How to Crop

Image

N N

  • Exploit domain knowledge
  • Information is typically centered

within the image (>55% in our tests)

  • Utilize a regular pattern
  • Less control logic required
  • Maps easier to different hardware
  • Added bonus:
  • While objects are centered, majority
  • f area (and thus computation) is on

the outside!

slide-8
SLIDE 8

Proposal: Activation Cropping

N = 25 N = 10 N = 8 N = 5 N = 2 Concept: Scale Feature Maps Proportionally

slide-9
SLIDE 9

Search Space – Crop Directions

Image Image Image Image

[ 0 1 0 1 ] [ 0 1 0 0 ] [ 0 0 1 0 ] [ 0 0 0 1 ]

  • We consider 16 possible crops as

permutations of top, bottom, left, and right crops encoded as a vector: [ TOP , BOTTOM , LEFT , RIGHT ]

  • Unlike traditional pruning, AQuA can

exploit image-based information to enhance pruning options.

Image

[ 1 0 0 0 ]

Image

[ 1 0 1 1 ]

slide-10
SLIDE 10

Quantifying Potentials

  • For maintaining original

Top-1 accuracy, 75% images can tolerate some type of crop!

  • Greater savings with

top-5 predictions

  • Technique invariant to

weight quantization

Weight Set Number of Edges Cropped

slide-11
SLIDE 11

Exploiting Energy Savings with ISAAC

  • Activation cropping technique can

be applied to any architecture

  • We use the ISAAC accelerator due

to its flexibility

  • Future work includes leveraging

additional variable precision techniques

1 bit 2 bit 2 bit 2 bit 2 bit 1 bit 1 bit 1 bit 8 bit

Inputs W e i g h t s Outputs

slide-12
SLIDE 12

Weight Precision Savings

1 bit 2 bit 2 bit 2 bit 2 bit 1 bit 1 bit 1 bit

8 columns

16 bit 10 bit 5 columns 8 bit ADC (Multiplexed)

8 x ADC Operations

5 x ADC Operations

slide-13
SLIDE 13

“FlexPoint” Support

1 bit 2 bit 2 bit 2 bit 2 bit 1 bit 1 bit 1 bit

8 columns

16 bit 10 bit 5 columns 8 bit ADC (Multiplexed) Can vary shift amount to compute fixed point computations with different exponents

slide-14
SLIDE 14

Activation Quantization Savings

1 bit 2 bit 2 bit 2 bit 2 bit 1 bit 1 bit 1 bit 8 bit

k-bit inputs Outputs 1...1010101 0...1000110 1...0011111 1 .. 1 1 1 1 1 .. 1 1 1 1 .. 1 1 1 1 .. 1 1 1 1 1 k .. 6 7 5 3 1 4 2 Time Step

K-bit activations (inputs) require K time steps.

Buffered Input

slide-15
SLIDE 15

Activation Quantization Savings

1 bit 2 bit 2 bit 2 bit 2 bit 1 bit 1 bit 1 bit 8 bit

k-bit inputs Outputs 1...1010101 0...1000110 1...0011111 1 .. 1 1 1 1 1 .. 1 1 1 1 .. 1 1 1 1 .. 1 1 1 1 1 k .. 6 7 5 3 1 4 2 Time Step

K-bit activations (inputs) require K time steps.

Buffered Input

Fewer computations means increasing throughput, reducing area requirements, and lowering energy.

slide-16
SLIDE 16

Naive Approach – Crop Everything

  • Substantial energy savings at a cost to accuracy
  • Theoretically, can save over 33% energy and maintain original accuracy!
slide-17
SLIDE 17

Overall Energy Savings

  • Adaptive quantization saves 33%
  • n average compared to an

uncropped baseline.

  • Technique can be applied in

conjunction with weight quantization techniques with nearly identical relative savings

slide-18
SLIDE 18

Future Work

  • Predict unimportant regions
  • Using a “0th” layer with a just a

few gradient-based kernels

  • Use variable low precision

computations unimportant regions (not just cropping)

  • Quantify energy and latency

changes due to additional prediction step, but fewer

  • verall computations

Original Sobel Gradient

slide-19
SLIDE 19

Conclusion

  • Adaptive quantization saves 33%
  • n average compared to an

uncropped baseline.

  • Technique can be applied in

conjunction with weight quantization techniques with nearly identical relative savings

slide-20
SLIDE 20

Thank you!

Questions?