A Center
A Systems Approach to Computing in Beyond CMOS Fabrics
- A. D. Patil, N. R. Shanbhag,
- L. R. Varshney, E. Pop, H.-S.
- P. Wong, S. Mitra, J.
A Systems Approach to Computing in Beyond CMOS Fabrics A. D. - - PowerPoint PPT Presentation
A Systems Approach to Computing in Beyond CMOS Fabrics A. D. Patil, N. R. Shanbhag, L. R. Varshney, E. Pop, H.-S. P. Wong, S. Mitra, J. Rabaey, J. Weldon, L. Pileggi, S. Manipatruni, D. Nikonov, and I. A. Young A Center
A Center
DATA INFORMATION BMW Sasi Ian [The Guardian, May 2017]
2
𝑢ℎ
[Pop-NanoResearch-2010]
Row decoder Row decoder Digital processor Decision (𝒛)
K-b bus
d0 WL driver Precharge SA SA SA Mux & buffer Memory
∆𝑾𝑪𝑴
L:1
L:1
L:1
Input buffer (𝑸) d1 d2 d3 … …
memory interface processor
3
4
5
C
y ˆ x
Estimator/ Detector
1
y
2
y
N
y
corrected
, ( , )
P
e h e h
application-derived metric
= max 𝑄(𝑧|𝑧*, . . , 𝑧-) Principles of Statistical Information Processing Prototypes In CMOS & Beyond CMOS
to Shannon/brain-inspired architectures to
6
fundamental limits on energy efficiency
FR row decoder FR row decoder
ADC & RDL
BLP SA SA SA Mux & buffer
K-b bus
Input buffer ( ) Cross BL processor (CBLP) Decision ( ) BLP BLP BLP BLP
L:1
L:1
d0 d1 d2 d3 BLP
L:1
Precharge WL driver
deep in-memory arch.
7
8
4-bit HD ≈ 10000 -bit
9
[Rabaey, Olshausen, Mitra, Wong]
Low resistance state: 1 High resistance state: 0
Oxygen vacancy (VO) Stochastic VO
Top electrode
VTE < VSET PSET < 1
Bottom electrode Bottom electrode Top electrode
1001100111……0100111101 (hyper-vector)
PSET: SET probability (switching from ‘0’ to ‘1’)
0.7 0.8 0.9 1.0 1.1 10
2
10
3
10
4
Pulse Amplitude (V) Pulse Width (ns)
0. 0. 0. 0. 1.
0% 100% 50% 25% 75%
50%
Experimental data
[H. Li,…, H.-S. P. Wong, IEDM, 2016] 10
TiN TiN TiN (BE) TiN
TiN/Ti
(TE) Layer 1 (L1) Layer 2 (L2) Layer 3 (L3) Layer 4 (L4)
FinFET
Pillar electrode Plane electrode Word line (WL) RRAM cell Bit line (BL) Select line (SL)
z x y
1T-4R
50 nm TiN (20 nm) TiN/Ti (50 nm)
Fab by NDL, Taiwan
11
[H. Li,…, H.-S. P. Wong, IEDM, 2016]
1 1 1 1 1 1 1 1
Multiplication
1 1 1 1 1 1 1 1 1
1 1
1M
1k 1M 1G 1T
1M 100k 10k
Resistance Logic Evaluation
C = 0 D = 1
A B C D 1 1 Input AB = pillar addr. = 10
Logic Evaluation Cycle (#) Resistance (W)
10
1
10
3
10
5
10
7
10
9
10
11
7 14 21 28
Current (µA) Addition Cycle (#)
1111 0111 0011 0001 0000
1 1 1
1 2 3 4
1
VDD gnd VDD gnd VDD gnd
L1 L2 L4 L3
200 ns
1 1 1 1 1 1 1 1 1 1 1 1
VDD gnd gnd VDD VDD gnd
200 ns
L1 L2 L4 L3
4 5 6 7 1Measured HRS (400kΩ-1MΩ) Measured LRS (~10kΩ)
11
Bit 1 up Bit 0 down
12
Letter (3 layers) Trigram (5 layers) XOR (1 layer) LangMap (6 layers) HamD Measure (21 layers) Input texts 3-letter sequences Compute trigrams MAP Generate (learn) language/text maps (one for each text) Training: finish Inference: measure HamD & identify the ‘nearest’ Binding MAP (addition) (HamD)
Algorithm Architecture Device
One-shot learning
4 kb ´ 36 layers
Random HD vectors Sampling Projection
PERM ADD XOR XOR Store ADD ADD ADD XOR XOR Store
MAP
PSET » 50% [[1] A. Rahimi et al., ISLPED, p.64, 2016]
10
2
10
3
10
4
10
5
10
6
Total Area (µm
2)
28nm LP 3D VRRAM
1 kb 2 kb 10 kb
HD Vector Size
9.16E5 1.78E6 2223 2691 3394
412´ 660´
1 kb 2 kb 10 kb 400 800 1200 1600
Component Area (µm
2)
HD Vector Size
Routing: + 1699.7 Routing: + 2081.5 Routing: + 2158.1
Cell array SA MUX Decoder
(a) (b)
38 13
[Nikonov-JXCDC-2015]
stochastic regime
deterministic regime
[Patil, Shanbhag, Manipatruni, Nikonov, Young, MMM-Intermag’16, arXiv’17]
14
C
> = 𝑗L𝐽HIJK L
>
energy numbers from [Manipatruni, et al., Physical Review Applied’16]
15
𝟗× 𝟗. 𝟒× 𝟔× 𝟕× 𝜗 𝜗 𝜗 𝜗 𝜗 𝜗 𝜗 𝜗 𝜗 𝜗 = 0.5
q
{ }
t t t
C A B =
t
t
g
t T
1
!-noisy non-volatile AND gate Error-free gate Virtual gate operation emulating !-noise
"
"
16
"($)
&(')
error compensator is robust: 𝜗 < 10@^
error compensator is efficient: 𝑚 > 𝑟 & 𝑙 > 𝑛
[Zhang, Shanbhag, IEEE Trans. Signal Processing, 2016] [Abdallah, Shanbhag, IEEE J. solid-state circuits, 2013] [Gonugondla, Shim, Shanbhag, ICASSP, 2016]
17
Error distribution at the output of 15bit RCA all delays equal after PDB after PDB & PDR “maximally” slow network → “minimally” error-prone network, without energy increase generates a sparse error distribution
error probability error probability error probability error magnitude error magnitude error magnitude
18
[Verma-JSSC-2010]
𝑨 = 1 𝑨 = 0
CHB-MIT EEG dataset
19
Main Block Error Compensator Gate count 52.8k 5.608k
20
21
22
network of 𝜗-noisy gates energy-vs-error rate model
(fits spintronics)
fundamental limit
error rate energy
design principle
[Chatterjee and Varshney,
23
24
Restricted Boltzmann Machines resistive memories graphene dot product
Φ1 Φ2 Φ3 Φ4 V1 V2 V3 V4 Vx
Inputs
Nano function with flag
Function computation Flag computation Function
Nanoflag
deep NNs MTJ-based random number generators Stochastic Spin Models Support Vector Machine
Spin Torque Transfer VSS VDD Isupply Ferromagnet Conducting Channel Insulating Partition Input Magnet Output Magnet
CNFET Machine Learning Accelerator Core Graphene MUX logic Oscillatory NNs RRAM Oscillators
[Weldon, Pileggi] [Mitra, Wong] [Pop, Grover] [Shanbhag, Young (Intel)] [Wong] [Weldon, Pileggi] [Pileggi] 25
Complexity Energy Device Error Rate
[S. Manipatruni, D. Nikonov, I. Young, N. Shanbhag]
Statistical Information Processing 26
27
Nanofunctions Beyond-CMOS devices Shannon & Brain-inspired Models of Computing Application requirements
Applications Architectures Circuits Devices Systems