Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. - - PowerPoint PPT Presentation

chapt er 13 bit level arit hmet ic archit ect ures
SMART_READER_LITE
LIVE PREVIEW

Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. - - PowerPoint PPT Presentation

Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. Parhi A W-bit f ixed point t wos complement number A is represent ed as : A=a w-1 .a w-2 a 1 .a 0 where t he bit s ai, 0 i W-1, are eit her 0 or 1, and t he


slide-1
SLIDE 1

Chapt er 13: Bit Level Arit hmet ic Archit ect ures

Keshab K. Parhi

slide-2
SLIDE 2
  • Chap. 13

2

  • A W-bit f ixed point t wo’s complement number A is

represent ed as : A=aw-1.aw-2… a1.a0

where t he bit s ai, 0 ≤ i ≤ W-1, are eit her 0 or 1,

and t he msb is t he sign bit .

  • The value of t his number is in t he range of

[-1, 1 – 2-W+1] and is given by : A = - aw-1 + Σ aw-1-i2-i

  • For bit -serial implement at ions, const ant word

lengt h mult ipliers are considered. For a W×W bit mult iplicat ion t he W most -signif icant bit s of t he (2W-1 )-bit product are ret ained.

slide-3
SLIDE 3
  • Chap. 13

3

  • Parallel Mult ipliers :

A = aw-1.aw-2… a1.a0 = -aw-1 + ∑

− = 1 1 W i

aw-1-i2-i B = bw-1.bw-2… b1.b0 = -bw-1 + ∑

− = 1 1 W i

bw-1-i2-i Their product is given by : P = -p2W-2 + ∑

− = 2 2 1 W i

p2W-2-i2-i I n const ant word lengt h mult iplicat ion, W – 1 lower

  • rder bit s in t he product P are ignored and t he

Product is denot ed as X ⇐ P = A × B, where X = -xW-1 +

− = 1 1 W i

xw-1-i2-i

slide-4
SLIDE 4
  • Chap. 13

4

  • P

arallel Mult iplicat ion wit h Sign Ext ension : Using Horner’s rule, mult iplicat ion of A and B can be writ t en as P = A × (-bW-1 + Σ bW-1-i2-i) = -A. bW-1 + [A. bW-2 + [A. bW-3 +[… + [A. b1 + A b0 2-1] 2- 1]… ]2-1] 2-1 where 2-1 denot es scaling operat ion.

  • I n 2’s complement , negat ing a number is equivalent t o t aking

it s 1’s complement and adding 1 t o lsb as shown below:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2 2 ) 1 ( ) 1 ( 2 1 2 ) 1 ( 2 2 ) 1 ( 2

+ − − − = − − − + − − = − − − − − = − − = − − − − − = − − − −

+ − + − − = + − − + = − − + = − = −

∑ ∑ ∑ ∑ ∑

W i W i i w w W W i i i w w W i i W i i i w w W i i i w w

a a a a a a a a A

slide-5
SLIDE 5
  • Chap. 13

5

  • The addit ions cannot be carried out direct ly due

t o t erms having negat ive weight . Sign ext ension is used t o solve t his problem. For example, A = a3 + a22-1 + a12-2 + a02-3 = -a32 + a3 + a22-1 + a12-2 + a02-3 = -a322 + a32 + a3 + a22-1 + a12-2 + a02-3 describes sign ext ension of A by 1 and 2 bit s.

Tabular f orm of bit -level array mult iplicat ion

slide-6
SLIDE 6
  • Chap. 13

6

  • Parallel Carry-Ripple Array Mult ipliers :

Bit level dependence Graph

slide-7
SLIDE 7
  • Chap. 13

7

Parallel Carry Ripple Mult iplier

slide-8
SLIDE 8
  • Chap. 13

8

DG f or 4×4-bit carry save array mult iplicat ion P arallel carry-save array mult iplier

slide-9
SLIDE 9
  • Chap. 13

9

  • Baugh-Wooley Mult ipliers:

Handles t he sign bit s of t he mult iplicand and mult iplier ef f icient ly. Tabular f orm of bit -level Baugh-Wooley mult iplicat ion

slide-10
SLIDE 10
  • Chap. 13

10

  • P

arallel Mult ipliers wit h Modif ied Boot h Recoding : Reduces t he number of part ial product s t o accelerat e t he mult iplicat ion process. The algorit hm is based on t he f act t hat f ewer part ial product s need t o be generat ed f or groups of consecut ive zeros and ones. For a group of “m” consecut ive ones in t he mult iplier, i.e., … 0{11… 1}0… = … 1{00… 0}0…

0{00… 1}0… = … 1{00… 1}0… inst ead of “m” part ial product s, only 2 part ial product s need t o be generat ed is signed digit represent at ion is used. Hence, in t his mult iplicat ion scheme, t he mult iplier bit s are f irst recoded int o signed-digit represent at ion wit h f ewer number of nonzero digit s; t he part ial product s are t hen generat ed using t he recoded mult iplier digit s and accumulat ed.

slide-11
SLIDE 11
  • Chap. 13

11

st ring of 1’s

1 1 1

beginning of 1’s

  • A
  • 1

1 1 A single 0

  • A
  • 1

1 1

beginning of 1’s

  • 2A
  • 2

1 end of 1’s +2A 2 1 1 a single 1 +A 1 1 end of 1’s +A 1 1 st ring of 0’s +0 Comment s Operat ion b’

i

b2i-1 b2i b2i+1 Radix-4 Modif ied Boot h Recoding Algorit hm Recoding operat ion can be described as: b’

i = -2b2i+1 + b2i + b2i-1

slide-12
SLIDE 12
  • Chap. 13

12

I nterleaved Floor- Plan and Bit- Plane- Based Digital Filters

  • A const ant coef f icient FI R f ilt er is given by:

y(n) = x(n) + f •x(n-1) + g•x(n-2) wher e, x(n) is t he input signal, and f and g ar e f ilt er coef f icient s.

  • The main idea behind t he int er leaved appr oach is t o

per f or m t he comput at ion and accumulat ion of par t ial pr oduct s associat ed wit h f and g simult aneously t hus incr easing t he speed.

  • This incr eases t he accur acy as t r uncat ion is done at

t he f inal st ep.

  • I f t he coef f icient s ar e int er leaved in such a way t hat

t heir par t ial pr oduct s ar e comput ed in dif f er ent r ows, t he r esult ing ar chit ect ur e is called bit -plane ar chit ect ur e.

slide-13
SLIDE 13
  • Chap. 13

13

Bit- Serial Multipliers

  • Lyon’s Bit -Serial Mult iplier using Horner’s Rule :
  • For t he scaling operat or, t he f irst out put bit a1 should be

generat ed at t he same t ime inst ance when t he f irst input a1 ent ers t he operat or. Since input a1 has not ent ered t he syst em yet , t he scaling operat or is non-causal and cannot be implement ed in hardware.

slide-14
SLIDE 14
  • Chap. 13

14

Derivat ion of implement able bit -serial 2’s complement mult iplier

slide-15
SLIDE 15
  • Chap. 13

15

Lyon’s bit -serial 2’s complement mult iplier

slide-16
SLIDE 16
  • Chap. 13

16

Design of Bit- Serial Multipliers Using Systolic Mappings Here, dT = [1 0], sT = [1 1] and pT = [0 1]

1 x(-1,1) 1 carry(1,0) 1 b(1,0) 1 1 a(0,1) sTe pTe e

  • Design of Lyon’s bit -serial mult iplier by syst olic mapping

Using DG of ripple carry mult iplicat ion.

slide-17
SLIDE 17
  • Chap. 13

17

  • Design of bit -serial mult iplier by syst olic mapping

using DG of ripple carry mult iplicat ion and t he f ollowing :

dT = [0 1], sT = [0 1] and pT = [1 0]

1

  • 1

x(-1,1) 1 carry(1,0) 1 b(1,0) 1 a(0,1) sTe pTe e

slide-18
SLIDE 18
  • Chap. 13

18

  • Design of bit -serial mult iplier by syst olic mapping

using DG f or carry-save array mult iplicat ion and t he f ollowing :

dT = [1 0], sT = [1 1] and pT = [0 1]

1 x(-1,1) 1 carry(1,0) 1 1 b(1,0) 1 1 a(0,1) sTe pTe e

slide-19
SLIDE 19
  • Chap. 13

19

Dependence graph f or carry save Baugh-Wooley mult iplicat ion wit h carry ripple vect or merging

slide-20
SLIDE 20
  • Chap. 13

20

  • Design of bit -serial Baugh-Wooley mult iplier by syst olic mapping

using DG f or Baugh-Wooley mult iplicat ion and t he f ollowing :

dT = [0 1], sT = [0 1] and pT = [1 0]

  • 1

carry-vm(-1,0)

1 1 x(1,1) 1 b(1,0) 1 carry(0,1) 1 a(0,1)

sTe pTe

e

Here, carry-vm denot es t he carry out put s in t he vect or merging port ion.

slide-21
SLIDE 21
  • Chap. 13

21

Bit -Serial Baugh-Wooley Mult iplier

slide-22
SLIDE 22
  • Chap. 13

22

DG bit -serial Baugh-Wooley mult iplier wit h carry-save array and vect or merging port ion t reat ed as t wo separat e planes

slide-23
SLIDE 23
  • Chap. 13

23

Bit -serial Baugh-Wooley mult iplier using t he DG having t wo separat e planes f or carry-save array and t he vect or merging port ion

slide-24
SLIDE 24
  • Chap. 13

24

Bit- Serial FI R Filter

Bit -level pipelined bit -serial FI R f ilt er, y(n) = (-7/ 8)x(n) + (1/ 2)x(n-1), where const ant coef f icient mult iplicat ions are implement ed as shif t s and adds as y(n) = -x(n) + x(n)2-3 + x(n-1)2-1. (a)Filt er archit ect ure wit h scaling operat ors; (b) f easible bit -level pipelined archit ect ure

slide-25
SLIDE 25
  • Chap. 13

25

Bit- Serial I I R Filter

  • Consider implement at ion of t he I I R f ilt er

Y(n) = (-7/ 8)y(n-1) + (1/ 2)y(n-2) + x(n) where, signal word-lengt h is assumed t o be 8.

  • The f ilt er equat ion can be re-writ t en as f ollows:

w(n) = (-7/ 8)y(n-1) + (1/ 2)y(n-2) Y(n) = w(n) + x(n) which can be implement ed as an FI R sect ion f rom y(n-1) wit h an addit ion and a f eedback loop as shown below:

slide-26
SLIDE 26
  • Chap. 13

26

  • St eps f or deriving a bit -serial I I R f ilt er archit ect ure:

A bit -level pipelined bit -serial implement at ion of t he FI R sect ion needs t o be derived. The input signal x(n) is added t o t he out put of t he bit - serial FI R sect ion w(n). The result ing signal y(n) is connect ed t o t he signal y(n-1). The number of delay element s in t he edge marked ?D needs t o be det ermined.(see f igure in next page)

  • For, syst ems cont aining loop, t he t ot al number of delay

element s in t he loops should be consist ent wit h t he original SFG, in order t o maint ain synchronizat ion and correct f unct ionalit y.

  • Loop delay synchronization involves mat ching t he number of

word-level loop delay element s and t hat in t he bit -serial archit ect ure. The number of bit -level delay element s in t he bit -serial loops should be W × ND, where W is signal word- lengt h and N D denot es t he number of delay element s in t he word-level SFG.

slide-27
SLIDE 27
  • Chap. 13

27

  • Bit -level pipelined bit -serial archit ect ure, wit hout

synchronizat ion delay element s. (b) Bit -serial I I R f ilt er. Not e t hat t his implement at ion requires a minimum f easible word-lengt h of 6.

slide-28
SLIDE 28
  • Chap. 13

28

Note:

To comput e t he t ot al number of delays in t he bit -level archit ect ure, t he pat hs wit h t he largest number of delay elements in t he swit ching element s should be count ed. I nput synchronizing delays (also ref erred as shimming delays

  • r skewing delays).

I t is also possible t hat t he loops in t he int ermediat e bit - level pipelined archit ect ure may cont ain more t han W × ND number of bit -level delay element s, in which case t he word- lengt h needs t o be increased. The archit ect ure wit hout t he t wo loop synchronizing delays can f unct ion correct ly wit h a signal word-lengt h of 6, which is t he minimum word-lengt h f or t he bit -level pipelined bit - serial archit ect ure.

slide-29
SLIDE 29
  • Chap. 13

29

  • Associativity transf ormation :

Loop it erat ion bound of I I R f ilt er can be reduced f rom one-mult iply-t wo-add t o one-mult iply-add by associat ive t ransf ormat ion

slide-30
SLIDE 30
  • Chap. 13

30

Bit -serial I I R f ilt er af t er associat ive t ransf ormat ion. This implement at ion requires a minimum f easible wor d-lengt h of 5.

slide-31
SLIDE 31
  • Chap. 13

31

Canonic Signed Digit Arithmetic

  • Encoding a binary number such t hat it cont ains t he

f ewest number of non-zero bit s is called canonic signed digit(CSD).

  • The f ollowing are t he propert ies of CSD numbers:

No 2 consecut ive bit s in a CSD number are non-zero. The CSD represent at ion of a number cont ains t he minimum possible number of non-zero bit s, t hus t he name canonic. The CSD represent at ion of a number is unique. CSD numbers cover t he range (-4/ 3,4/ 3), out of which t he values in t he range [-1,1) are of great est int erest . Among t he W-bit CSD numbers in t he range [-1,1), t he average number of non-zero bit s is W/ 3 + 1/ 9 + O(2-W). Hence, on average, CSD numbers cont ains about 33% f ewer non-zero bit s t han t wo’s complement numbers.

slide-32
SLIDE 32
  • Chap. 13

32

  • Conversion of W-bit number t o CSD f ormat :

– A = a’W-1. a’W-2… a’1. a’0 = 2’s complement number – I t s CSD represent at ion is aW-1. aW-2…

  • a1. a0
  • Algorit hm t o obt ain CSD represent at ion:

– a’-1 = 0; – γ-1 = 0; – a’W = a’W-1; – f or (i = 0 t o W-1)

{

θi = a’i ⊕ a’i-1; γi = γi-1θi; ai = (1 - 2a’i+1)γi; }

slide-33
SLIDE 33
  • Chap. 13

33

  • 1

1

  • 1
  • 1

ai

  • 1

1 1

  • 1
  • 1
  • 1

1

  • 1
  • 1

1 - 2a’i+1

1 1 1 1

γi

1 1 1 1 1

θi

1 1 1 1 1 1 1

a’i

  • 1

W-1

W i

Table showing t he comput at ion of t he CSD represent at ion f or t he number 1.01110011.

slide-34
SLIDE 34
  • Chap. 13

34

CSD Multiplication A CSD mult iplier using linear arrangement of adders t o comput e x × 0.10100100101001

  • Horner’s rule f or precision improvement : This involves

delaying t he scaling operat ions common t o t he 2 part ial product s t hus increasing accuracy.

  • For example, x•2-5 + x•2-3 can be implement ed as

(x•2-2 + x)2-3 t o increase t he accuracy.

slide-35
SLIDE 35
  • Chap. 13

35

Using Horner’s rule f or part ial product accumulat ion t o reduce t he t runcat ion error.

slide-36
SLIDE 36
  • Chap. 13

36

Rearrangement of t he CSD mult iplicat ion of x × 0.10100100101001 using Horner’s rule f or part ial product accumulat ion t o reduce t he t runcat ion error.

slide-37
SLIDE 37
  • Chap. 13

37

Use of Tree- Height Reduction f or Latency Reduction (a) linear arrangement (b) t ree arrangement Combinat ion of t ree-t ype arrangement and Horner’s rule f or t he accumulat ion of part ial product s in CSD mult iplicat ion

slide-38
SLIDE 38
  • Chap. 13

38

Bit serial archit ect ure using CSD. I n t his case t he coef f icient s

  • 7/ 32 = -1/ 4 + 1/ 32 is encoded as 0.01001 and ¾

= 1 – ¼ is encoded as 1.01.