Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. - - PowerPoint PPT Presentation
Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. - - PowerPoint PPT Presentation
Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. Parhi A W-bit f ixed point t wos complement number A is represent ed as : A=a w-1 .a w-2 a 1 .a 0 where t he bit s ai, 0 i W-1, are eit her 0 or 1, and t he
- Chap. 13
2
- A W-bit f ixed point t wo’s complement number A is
represent ed as : A=aw-1.aw-2… a1.a0
where t he bit s ai, 0 ≤ i ≤ W-1, are eit her 0 or 1,
and t he msb is t he sign bit .
- The value of t his number is in t he range of
[-1, 1 – 2-W+1] and is given by : A = - aw-1 + Σ aw-1-i2-i
- For bit -serial implement at ions, const ant word
lengt h mult ipliers are considered. For a W×W bit mult iplicat ion t he W most -signif icant bit s of t he (2W-1 )-bit product are ret ained.
- Chap. 13
3
- Parallel Mult ipliers :
A = aw-1.aw-2… a1.a0 = -aw-1 + ∑
− = 1 1 W i
aw-1-i2-i B = bw-1.bw-2… b1.b0 = -bw-1 + ∑
− = 1 1 W i
bw-1-i2-i Their product is given by : P = -p2W-2 + ∑
− = 2 2 1 W i
p2W-2-i2-i I n const ant word lengt h mult iplicat ion, W – 1 lower
- rder bit s in t he product P are ignored and t he
Product is denot ed as X ⇐ P = A × B, where X = -xW-1 +
∑
− = 1 1 W i
xw-1-i2-i
- Chap. 13
4
- P
arallel Mult iplicat ion wit h Sign Ext ension : Using Horner’s rule, mult iplicat ion of A and B can be writ t en as P = A × (-bW-1 + Σ bW-1-i2-i) = -A. bW-1 + [A. bW-2 + [A. bW-3 +[… + [A. b1 + A b0 2-1] 2- 1]… ]2-1] 2-1 where 2-1 denot es scaling operat ion.
- I n 2’s complement , negat ing a number is equivalent t o t aking
it s 1’s complement and adding 1 t o lsb as shown below:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 ) 1 ( ) 1 ( 2 1 2 ) 1 ( 2 2 ) 1 ( 2
+ − − − = − − − + − − = − − − − − = − − = − − − − − = − − − −
+ − + − − = + − − + = − − + = − = −
∑ ∑ ∑ ∑ ∑
W i W i i w w W W i i i w w W i i W i i i w w W i i i w w
a a a a a a a a A
- Chap. 13
5
- The addit ions cannot be carried out direct ly due
t o t erms having negat ive weight . Sign ext ension is used t o solve t his problem. For example, A = a3 + a22-1 + a12-2 + a02-3 = -a32 + a3 + a22-1 + a12-2 + a02-3 = -a322 + a32 + a3 + a22-1 + a12-2 + a02-3 describes sign ext ension of A by 1 and 2 bit s.
Tabular f orm of bit -level array mult iplicat ion
- Chap. 13
6
- Parallel Carry-Ripple Array Mult ipliers :
Bit level dependence Graph
- Chap. 13
7
Parallel Carry Ripple Mult iplier
- Chap. 13
8
DG f or 4×4-bit carry save array mult iplicat ion P arallel carry-save array mult iplier
- Chap. 13
9
- Baugh-Wooley Mult ipliers:
Handles t he sign bit s of t he mult iplicand and mult iplier ef f icient ly. Tabular f orm of bit -level Baugh-Wooley mult iplicat ion
- Chap. 13
10
- P
arallel Mult ipliers wit h Modif ied Boot h Recoding : Reduces t he number of part ial product s t o accelerat e t he mult iplicat ion process. The algorit hm is based on t he f act t hat f ewer part ial product s need t o be generat ed f or groups of consecut ive zeros and ones. For a group of “m” consecut ive ones in t he mult iplier, i.e., … 0{11… 1}0… = … 1{00… 0}0…
- …
0{00… 1}0… = … 1{00… 1}0… inst ead of “m” part ial product s, only 2 part ial product s need t o be generat ed is signed digit represent at ion is used. Hence, in t his mult iplicat ion scheme, t he mult iplier bit s are f irst recoded int o signed-digit represent at ion wit h f ewer number of nonzero digit s; t he part ial product s are t hen generat ed using t he recoded mult iplier digit s and accumulat ed.
- Chap. 13
11
st ring of 1’s
1 1 1
beginning of 1’s
- A
- 1
1 1 A single 0
- A
- 1
1 1
beginning of 1’s
- 2A
- 2
1 end of 1’s +2A 2 1 1 a single 1 +A 1 1 end of 1’s +A 1 1 st ring of 0’s +0 Comment s Operat ion b’
i
b2i-1 b2i b2i+1 Radix-4 Modif ied Boot h Recoding Algorit hm Recoding operat ion can be described as: b’
i = -2b2i+1 + b2i + b2i-1
- Chap. 13
12
I nterleaved Floor- Plan and Bit- Plane- Based Digital Filters
- A const ant coef f icient FI R f ilt er is given by:
y(n) = x(n) + f •x(n-1) + g•x(n-2) wher e, x(n) is t he input signal, and f and g ar e f ilt er coef f icient s.
- The main idea behind t he int er leaved appr oach is t o
per f or m t he comput at ion and accumulat ion of par t ial pr oduct s associat ed wit h f and g simult aneously t hus incr easing t he speed.
- This incr eases t he accur acy as t r uncat ion is done at
t he f inal st ep.
- I f t he coef f icient s ar e int er leaved in such a way t hat
t heir par t ial pr oduct s ar e comput ed in dif f er ent r ows, t he r esult ing ar chit ect ur e is called bit -plane ar chit ect ur e.
- Chap. 13
13
Bit- Serial Multipliers
- Lyon’s Bit -Serial Mult iplier using Horner’s Rule :
- For t he scaling operat or, t he f irst out put bit a1 should be
generat ed at t he same t ime inst ance when t he f irst input a1 ent ers t he operat or. Since input a1 has not ent ered t he syst em yet , t he scaling operat or is non-causal and cannot be implement ed in hardware.
- Chap. 13
14
Derivat ion of implement able bit -serial 2’s complement mult iplier
- Chap. 13
15
Lyon’s bit -serial 2’s complement mult iplier
- Chap. 13
16
Design of Bit- Serial Multipliers Using Systolic Mappings Here, dT = [1 0], sT = [1 1] and pT = [0 1]
1 x(-1,1) 1 carry(1,0) 1 b(1,0) 1 1 a(0,1) sTe pTe e
- Design of Lyon’s bit -serial mult iplier by syst olic mapping
Using DG of ripple carry mult iplicat ion.
- Chap. 13
17
- Design of bit -serial mult iplier by syst olic mapping
using DG of ripple carry mult iplicat ion and t he f ollowing :
dT = [0 1], sT = [0 1] and pT = [1 0]
1
- 1
x(-1,1) 1 carry(1,0) 1 b(1,0) 1 a(0,1) sTe pTe e
- Chap. 13
18
- Design of bit -serial mult iplier by syst olic mapping
using DG f or carry-save array mult iplicat ion and t he f ollowing :
dT = [1 0], sT = [1 1] and pT = [0 1]
1 x(-1,1) 1 carry(1,0) 1 1 b(1,0) 1 1 a(0,1) sTe pTe e
- Chap. 13
19
Dependence graph f or carry save Baugh-Wooley mult iplicat ion wit h carry ripple vect or merging
- Chap. 13
20
- Design of bit -serial Baugh-Wooley mult iplier by syst olic mapping
using DG f or Baugh-Wooley mult iplicat ion and t he f ollowing :
dT = [0 1], sT = [0 1] and pT = [1 0]
- 1
carry-vm(-1,0)
1 1 x(1,1) 1 b(1,0) 1 carry(0,1) 1 a(0,1)
sTe pTe
e
Here, carry-vm denot es t he carry out put s in t he vect or merging port ion.
- Chap. 13
21
Bit -Serial Baugh-Wooley Mult iplier
- Chap. 13
22
DG bit -serial Baugh-Wooley mult iplier wit h carry-save array and vect or merging port ion t reat ed as t wo separat e planes
- Chap. 13
23
Bit -serial Baugh-Wooley mult iplier using t he DG having t wo separat e planes f or carry-save array and t he vect or merging port ion
- Chap. 13
24
Bit- Serial FI R Filter
Bit -level pipelined bit -serial FI R f ilt er, y(n) = (-7/ 8)x(n) + (1/ 2)x(n-1), where const ant coef f icient mult iplicat ions are implement ed as shif t s and adds as y(n) = -x(n) + x(n)2-3 + x(n-1)2-1. (a)Filt er archit ect ure wit h scaling operat ors; (b) f easible bit -level pipelined archit ect ure
- Chap. 13
25
Bit- Serial I I R Filter
- Consider implement at ion of t he I I R f ilt er
Y(n) = (-7/ 8)y(n-1) + (1/ 2)y(n-2) + x(n) where, signal word-lengt h is assumed t o be 8.
- The f ilt er equat ion can be re-writ t en as f ollows:
w(n) = (-7/ 8)y(n-1) + (1/ 2)y(n-2) Y(n) = w(n) + x(n) which can be implement ed as an FI R sect ion f rom y(n-1) wit h an addit ion and a f eedback loop as shown below:
- Chap. 13
26
- St eps f or deriving a bit -serial I I R f ilt er archit ect ure:
A bit -level pipelined bit -serial implement at ion of t he FI R sect ion needs t o be derived. The input signal x(n) is added t o t he out put of t he bit - serial FI R sect ion w(n). The result ing signal y(n) is connect ed t o t he signal y(n-1). The number of delay element s in t he edge marked ?D needs t o be det ermined.(see f igure in next page)
- For, syst ems cont aining loop, t he t ot al number of delay
element s in t he loops should be consist ent wit h t he original SFG, in order t o maint ain synchronizat ion and correct f unct ionalit y.
- Loop delay synchronization involves mat ching t he number of
word-level loop delay element s and t hat in t he bit -serial archit ect ure. The number of bit -level delay element s in t he bit -serial loops should be W × ND, where W is signal word- lengt h and N D denot es t he number of delay element s in t he word-level SFG.
- Chap. 13
27
- Bit -level pipelined bit -serial archit ect ure, wit hout
synchronizat ion delay element s. (b) Bit -serial I I R f ilt er. Not e t hat t his implement at ion requires a minimum f easible word-lengt h of 6.
- Chap. 13
28
Note:
To comput e t he t ot al number of delays in t he bit -level archit ect ure, t he pat hs wit h t he largest number of delay elements in t he swit ching element s should be count ed. I nput synchronizing delays (also ref erred as shimming delays
- r skewing delays).
I t is also possible t hat t he loops in t he int ermediat e bit - level pipelined archit ect ure may cont ain more t han W × ND number of bit -level delay element s, in which case t he word- lengt h needs t o be increased. The archit ect ure wit hout t he t wo loop synchronizing delays can f unct ion correct ly wit h a signal word-lengt h of 6, which is t he minimum word-lengt h f or t he bit -level pipelined bit - serial archit ect ure.
- Chap. 13
29
- Associativity transf ormation :
Loop it erat ion bound of I I R f ilt er can be reduced f rom one-mult iply-t wo-add t o one-mult iply-add by associat ive t ransf ormat ion
- Chap. 13
30
Bit -serial I I R f ilt er af t er associat ive t ransf ormat ion. This implement at ion requires a minimum f easible wor d-lengt h of 5.
- Chap. 13
31
Canonic Signed Digit Arithmetic
- Encoding a binary number such t hat it cont ains t he
f ewest number of non-zero bit s is called canonic signed digit(CSD).
- The f ollowing are t he propert ies of CSD numbers:
No 2 consecut ive bit s in a CSD number are non-zero. The CSD represent at ion of a number cont ains t he minimum possible number of non-zero bit s, t hus t he name canonic. The CSD represent at ion of a number is unique. CSD numbers cover t he range (-4/ 3,4/ 3), out of which t he values in t he range [-1,1) are of great est int erest . Among t he W-bit CSD numbers in t he range [-1,1), t he average number of non-zero bit s is W/ 3 + 1/ 9 + O(2-W). Hence, on average, CSD numbers cont ains about 33% f ewer non-zero bit s t han t wo’s complement numbers.
- Chap. 13
32
- Conversion of W-bit number t o CSD f ormat :
– A = a’W-1. a’W-2… a’1. a’0 = 2’s complement number – I t s CSD represent at ion is aW-1. aW-2…
- a1. a0
- Algorit hm t o obt ain CSD represent at ion:
– a’-1 = 0; – γ-1 = 0; – a’W = a’W-1; – f or (i = 0 t o W-1)
{
θi = a’i ⊕ a’i-1; γi = γi-1θi; ai = (1 - 2a’i+1)γi; }
- Chap. 13
33
- 1
1
- 1
- 1
ai
- 1
1 1
- 1
- 1
- 1
1
- 1
- 1
1 - 2a’i+1
1 1 1 1
γi
1 1 1 1 1
θi
1 1 1 1 1 1 1
a’i
- 1
W-1
W i
Table showing t he comput at ion of t he CSD represent at ion f or t he number 1.01110011.
- Chap. 13
34
CSD Multiplication A CSD mult iplier using linear arrangement of adders t o comput e x × 0.10100100101001
- Horner’s rule f or precision improvement : This involves
delaying t he scaling operat ions common t o t he 2 part ial product s t hus increasing accuracy.
- For example, x•2-5 + x•2-3 can be implement ed as
(x•2-2 + x)2-3 t o increase t he accuracy.
- Chap. 13
35
Using Horner’s rule f or part ial product accumulat ion t o reduce t he t runcat ion error.
- Chap. 13
36
Rearrangement of t he CSD mult iplicat ion of x × 0.10100100101001 using Horner’s rule f or part ial product accumulat ion t o reduce t he t runcat ion error.
- Chap. 13
37
Use of Tree- Height Reduction f or Latency Reduction (a) linear arrangement (b) t ree arrangement Combinat ion of t ree-t ype arrangement and Horner’s rule f or t he accumulat ion of part ial product s in CSD mult iplicat ion
- Chap. 13
38
Bit serial archit ect ure using CSD. I n t his case t he coef f icient s
- 7/ 32 = -1/ 4 + 1/ 32 is encoded as 0.01001 and ¾
= 1 – ¼ is encoded as 1.01.