[PPT] - Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. PowerPoint Presentation

SLIDE 1

Chapt er 13: Bit Level Arit hmet ic Archit ect ures

Keshab K. Parhi

SLIDE 2

Chap. 13

2

A W-bit f ixed point t wo’s complement number A is

represent ed as : A=aw-1.aw-2… a1.a0

where t he bit s ai, 0 ≤ i ≤ W-1, are eit her 0 or 1,

and t he msb is t he sign bit .

The value of t his number is in t he range of

[-1, 1 – 2-W+1] and is given by : A = - aw-1 + Σ aw-1-i2-i

For bit -serial implement at ions, const ant word

lengt h mult ipliers are considered. For a W×W bit mult iplicat ion t he W most -signif icant bit s of t he (2W-1 )-bit product are ret ained.

SLIDE 3

Chap. 13

3

Parallel Mult ipliers :

A = aw-1.aw-2… a1.a0 = -aw-1 + ∑

− = 1 1 W i

aw-1-i2-i B = bw-1.bw-2… b1.b0 = -bw-1 + ∑

− = 1 1 W i

bw-1-i2-i Their product is given by : P = -p2W-2 + ∑

− = 2 2 1 W i

p2W-2-i2-i I n const ant word lengt h mult iplicat ion, W – 1 lower

rder bit s in t he product P are ignored and t he

Product is denot ed as X ⇐ P = A × B, where X = -xW-1 +

∑

− = 1 1 W i

xw-1-i2-i

SLIDE 4

Chap. 13

4

P

arallel Mult iplicat ion wit h Sign Ext ension : Using Horner’s rule, mult iplicat ion of A and B can be writ t en as P = A × (-bW-1 + Σ bW-1-i2-i) = -A. bW-1 + [A. bW-2 + [A. bW-3 +[… + [A. b1 + A b0 2-1] 2- 1]… ]2-1] 2-1 where 2-1 denot es scaling operat ion.

I n 2’s complement , negat ing a number is equivalent t o t aking

it s 1’s complement and adding 1 t o lsb as shown below:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2 2 ) 1 ( ) 1 ( 2 1 2 ) 1 ( 2 2 ) 1 ( 2

+ − − − = − − − + − − = − − − − − = − − = − − − − − = − − − −

+ − + − − = + − − + = − − + = − = −

∑ ∑ ∑ ∑ ∑

W i W i i w w W W i i i w w W i i W i i i w w W i i i w w

a a a a a a a a A

SLIDE 5

Chap. 13

5

The addit ions cannot be carried out direct ly due

t o t erms having negat ive weight . Sign ext ension is used t o solve t his problem. For example, A = a3 + a22-1 + a12-2 + a02-3 = -a32 + a3 + a22-1 + a12-2 + a02-3 = -a322 + a32 + a3 + a22-1 + a12-2 + a02-3 describes sign ext ension of A by 1 and 2 bit s.

Tabular f orm of bit -level array mult iplicat ion

SLIDE 6

Chap. 13

6

Parallel Carry-Ripple Array Mult ipliers :

Bit level dependence Graph

SLIDE 7

Chap. 13

7

Parallel Carry Ripple Mult iplier

SLIDE 8

Chap. 13

8

DG f or 4×4-bit carry save array mult iplicat ion P arallel carry-save array mult iplier

SLIDE 9

Chap. 13

9

Baugh-Wooley Mult ipliers:

Handles t he sign bit s of t he mult iplicand and mult iplier ef f icient ly. Tabular f orm of bit -level Baugh-Wooley mult iplicat ion

SLIDE 10

Chap. 13

10

P

arallel Mult ipliers wit h Modif ied Boot h Recoding : Reduces t he number of part ial product s t o accelerat e t he mult iplicat ion process. The algorit hm is based on t he f act t hat f ewer part ial product s need t o be generat ed f or groups of consecut ive zeros and ones. For a group of “m” consecut ive ones in t he mult iplier, i.e., … 0{11… 1}0… = … 1{00… 0}0…

…

0{00… 1}0… = … 1{00… 1}0… inst ead of “m” part ial product s, only 2 part ial product s need t o be generat ed is signed digit represent at ion is used. Hence, in t his mult iplicat ion scheme, t he mult iplier bit s are f irst recoded int o signed-digit represent at ion wit h f ewer number of nonzero digit s; t he part ial product s are t hen generat ed using t he recoded mult iplier digit s and accumulat ed.

SLIDE 11

Chap. 13

11

st ring of 1’s

1 1 1

beginning of 1’s

A
1

1 1 A single 0

A
1

1 1

beginning of 1’s

2A
2

1 end of 1’s +2A 2 1 1 a single 1 +A 1 1 end of 1’s +A 1 1 st ring of 0’s +0 Comment s Operat ion b’

i

b2i-1 b2i b2i+1 Radix-4 Modif ied Boot h Recoding Algorit hm Recoding operat ion can be described as: b’

i = -2b2i+1 + b2i + b2i-1

SLIDE 12

Chap. 13

12

I nterleaved Floor- Plan and Bit- Plane- Based Digital Filters

A const ant coef f icient FI R f ilt er is given by:

y(n) = x(n) + f •x(n-1) + g•x(n-2) wher e, x(n) is t he input signal, and f and g ar e f ilt er coef f icient s.

The main idea behind t he int er leaved appr oach is t o

per f or m t he comput at ion and accumulat ion of par t ial pr oduct s associat ed wit h f and g simult aneously t hus incr easing t he speed.

This incr eases t he accur acy as t r uncat ion is done at

t he f inal st ep.

I f t he coef f icient s ar e int er leaved in such a way t hat

t heir par t ial pr oduct s ar e comput ed in dif f er ent r ows, t he r esult ing ar chit ect ur e is called bit -plane ar chit ect ur e.

SLIDE 13

Chap. 13

13

Bit- Serial Multipliers

Lyon’s Bit -Serial Mult iplier using Horner’s Rule :
For t he scaling operat or, t he f irst out put bit a1 should be

generat ed at t he same t ime inst ance when t he f irst input a1 ent ers t he operat or. Since input a1 has not ent ered t he syst em yet , t he scaling operat or is non-causal and cannot be implement ed in hardware.

SLIDE 14

Chap. 13

14

Derivat ion of implement able bit -serial 2’s complement mult iplier

SLIDE 15

Chap. 13

15

Lyon’s bit -serial 2’s complement mult iplier

SLIDE 16

Chap. 13

16

Design of Bit- Serial Multipliers Using Systolic Mappings Here, dT = [1 0], sT = [1 1] and pT = [0 1]

1 x(-1,1) 1 carry(1,0) 1 b(1,0) 1 1 a(0,1) sTe pTe e

Design of Lyon’s bit -serial mult iplier by syst olic mapping

Using DG of ripple carry mult iplicat ion.

SLIDE 17

Chap. 13

17

Design of bit -serial mult iplier by syst olic mapping

using DG of ripple carry mult iplicat ion and t he f ollowing :

dT = [0 1], sT = [0 1] and pT = [1 0]

1

1

x(-1,1) 1 carry(1,0) 1 b(1,0) 1 a(0,1) sTe pTe e

SLIDE 18

Chap. 13

18

Design of bit -serial mult iplier by syst olic mapping

using DG f or carry-save array mult iplicat ion and t he f ollowing :

dT = [1 0], sT = [1 1] and pT = [0 1]

1 x(-1,1) 1 carry(1,0) 1 1 b(1,0) 1 1 a(0,1) sTe pTe e

SLIDE 19

Chap. 13

19

Dependence graph f or carry save Baugh-Wooley mult iplicat ion wit h carry ripple vect or merging

SLIDE 20

Chap. 13

20

Design of bit -serial Baugh-Wooley mult iplier by syst olic mapping

using DG f or Baugh-Wooley mult iplicat ion and t he f ollowing :

dT = [0 1], sT = [0 1] and pT = [1 0]

1

carry-vm(-1,0)

1 1 x(1,1) 1 b(1,0) 1 carry(0,1) 1 a(0,1)

sTe pTe

e

Here, carry-vm denot es t he carry out put s in t he vect or merging port ion.

SLIDE 21

Chap. 13

21

Bit -Serial Baugh-Wooley Mult iplier

SLIDE 22

Chap. 13

22

DG bit -serial Baugh-Wooley mult iplier wit h carry-save array and vect or merging port ion t reat ed as t wo separat e planes

SLIDE 23

Chap. 13

23

Bit -serial Baugh-Wooley mult iplier using t he DG having t wo separat e planes f or carry-save array and t he vect or merging port ion

SLIDE 24

Chap. 13

24

Bit- Serial FI R Filter

Bit -level pipelined bit -serial FI R f ilt er, y(n) = (-7/ 8)x(n) + (1/ 2)x(n-1), where const ant coef f icient mult iplicat ions are implement ed as shif t s and adds as y(n) = -x(n) + x(n)2-3 + x(n-1)2-1. (a)Filt er archit ect ure wit h scaling operat ors; (b) f easible bit -level pipelined archit ect ure

SLIDE 25

Chap. 13

25

Bit- Serial I I R Filter

Consider implement at ion of t he I I R f ilt er

Y(n) = (-7/ 8)y(n-1) + (1/ 2)y(n-2) + x(n) where, signal word-lengt h is assumed t o be 8.

The f ilt er equat ion can be re-writ t en as f ollows:

w(n) = (-7/ 8)y(n-1) + (1/ 2)y(n-2) Y(n) = w(n) + x(n) which can be implement ed as an FI R sect ion f rom y(n-1) wit h an addit ion and a f eedback loop as shown below:

SLIDE 26

Chap. 13

26

St eps f or deriving a bit -serial I I R f ilt er archit ect ure:

A bit -level pipelined bit -serial implement at ion of t he FI R sect ion needs t o be derived. The input signal x(n) is added t o t he out put of t he bit - serial FI R sect ion w(n). The result ing signal y(n) is connect ed t o t he signal y(n-1). The number of delay element s in t he edge marked ?D needs t o be det ermined.(see f igure in next page)

For, syst ems cont aining loop, t he t ot al number of delay

element s in t he loops should be consist ent wit h t he original SFG, in order t o maint ain synchronizat ion and correct f unct ionalit y.

Loop delay synchronization involves mat ching t he number of

word-level loop delay element s and t hat in t he bit -serial archit ect ure. The number of bit -level delay element s in t he bit -serial loops should be W × ND, where W is signal word- lengt h and N D denot es t he number of delay element s in t he word-level SFG.

SLIDE 27

Chap. 13

27

Bit -level pipelined bit -serial archit ect ure, wit hout

synchronizat ion delay element s. (b) Bit -serial I I R f ilt er. Not e t hat t his implement at ion requires a minimum f easible word-lengt h of 6.

SLIDE 28

Chap. 13

28

Note:

To comput e t he t ot al number of delays in t he bit -level archit ect ure, t he pat hs wit h t he largest number of delay elements in t he swit ching element s should be count ed. I nput synchronizing delays (also ref erred as shimming delays

r skewing delays).

I t is also possible t hat t he loops in t he int ermediat e bit - level pipelined archit ect ure may cont ain more t han W × ND number of bit -level delay element s, in which case t he word- lengt h needs t o be increased. The archit ect ure wit hout t he t wo loop synchronizing delays can f unct ion correct ly wit h a signal word-lengt h of 6, which is t he minimum word-lengt h f or t he bit -level pipelined bit - serial archit ect ure.

SLIDE 29

Chap. 13

29

Associativity transf ormation :

Loop it erat ion bound of I I R f ilt er can be reduced f rom one-mult iply-t wo-add t o one-mult iply-add by associat ive t ransf ormat ion

SLIDE 30

Chap. 13

30

Bit -serial I I R f ilt er af t er associat ive t ransf ormat ion. This implement at ion requires a minimum f easible wor d-lengt h of 5.

SLIDE 31

Chap. 13

31

Canonic Signed Digit Arithmetic

Encoding a binary number such t hat it cont ains t he

f ewest number of non-zero bit s is called canonic signed digit(CSD).

The f ollowing are t he propert ies of CSD numbers:

No 2 consecut ive bit s in a CSD number are non-zero. The CSD represent at ion of a number cont ains t he minimum possible number of non-zero bit s, t hus t he name canonic. The CSD represent at ion of a number is unique. CSD numbers cover t he range (-4/ 3,4/ 3), out of which t he values in t he range [-1,1) are of great est int erest . Among t he W-bit CSD numbers in t he range [-1,1), t he average number of non-zero bit s is W/ 3 + 1/ 9 + O(2-W). Hence, on average, CSD numbers cont ains about 33% f ewer non-zero bit s t han t wo’s complement numbers.

SLIDE 32

Chap. 13

32

Conversion of W-bit number t o CSD f ormat :

– A = a’W-1. a’W-2… a’1. a’0 = 2’s complement number – I t s CSD represent at ion is aW-1. aW-2…

a1. a0
Algorit hm t o obt ain CSD represent at ion:

– a’-1 = 0; – γ-1 = 0; – a’W = a’W-1; – f or (i = 0 t o W-1)

{

θi = a’i ⊕ a’i-1; γi = γi-1θi; ai = (1 - 2a’i+1)γi; }

SLIDE 33

Chap. 13

33

1

1

1
1

ai

1

1 1

1
1
1

1

1
1

1 - 2a’i+1

1 1 1 1

γi

1 1 1 1 1

θi

1 1 1 1 1 1 1

a’i

1

W-1

W i

Table showing t he comput at ion of t he CSD represent at ion f or t he number 1.01110011.

SLIDE 34

Chap. 13

34

CSD Multiplication A CSD mult iplier using linear arrangement of adders t o comput e x × 0.10100100101001

Horner’s rule f or precision improvement : This involves

delaying t he scaling operat ions common t o t he 2 part ial product s t hus increasing accuracy.

For example, x•2-5 + x•2-3 can be implement ed as

(x•2-2 + x)2-3 t o increase t he accuracy.

SLIDE 35

Chap. 13

35

Using Horner’s rule f or part ial product accumulat ion t o reduce t he t runcat ion error.

SLIDE 36

Chap. 13

36

Rearrangement of t he CSD mult iplicat ion of x × 0.10100100101001 using Horner’s rule f or part ial product accumulat ion t o reduce t he t runcat ion error.

SLIDE 37

Chap. 13

37

Use of Tree- Height Reduction f or Latency Reduction (a) linear arrangement (b) t ree arrangement Combinat ion of t ree-t ype arrangement and Horner’s rule f or t he accumulat ion of part ial product s in CSD mult iplicat ion

SLIDE 38

Chap. 13

38

Bit serial archit ect ure using CSD. I n t his case t he coef f icient s

7/ 32 = -1/ 4 + 1/ 32 is encoded as 0.01001 and ¾

= 1 – ¼ is encoded as 1.01.