A Low Power Design of Gray and T0 Codecs for the Address Bus - - PDF document

▶

May 10, 2023 278 likes •346 views

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard Advanced Learning and Research Institute, ALaRI, University of Lugano,

SLIDE 1

A Low Power Design of Gray and T0 Codecs for the Address Bus Encoding for System Level Power Optimization

Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard Advanced Learning and Research Institute, ALaRI, University of Lugano, Switzerland

ABSTRACT

This report describes our attempt to design the Gray and T0 codecs to be used to encode the bits to be sent on the processor-memory address bus. Since switching is one of the most important contributors to the power consumption of VLSI circuits, it is imperative to encode the bits in such a way that the switching activity on the buses are reduced. However, it should also be understood that encoding does not always reduces power. The trade offs between power uti- lization of the codec hardware and the power reduction due to lessening of switching transitions has also been understood. Different codecs may perform differently for different address sequences. We have generated the sequences of addresses of specified sequentiality and evaluated the performance of both codecs. The codecs are designed and synthesized using VHDL/Synopsis Tools. The VHDL models are then simulated in order to measure the dynamic power consumed by them when the bits are encoded and decoded. The total power including the power consumed by the bus is calculated. Various comparisons are made with the uncoded binary scheme. An optimum bus capacitance is also calculated which makes the usage of codecs beneficial. We have also tried to implement another scheme where the bus lines are interchanged in order to reduce the power consumption due to crosstalk. The results

btained are discussed and explained in the report.

Keywords: Gray Encoding, Zero Transition Encoding, Bus load

1. BUS ENCODING FUNDAMENTALS - GRAY AND T0 CODECS

Bandwidth of data transfers have increased considerably due to the high speed needed between microprocessors and system

interfaces. Considerable amount of power is needed at the I/O pins of a microprocessor due to intrinsic capacitance of the

bus lines. By minimizing the switching transitions on the system level bus lines, dramatic optimization of average power consumption can be achieved. There are various bus encoding schemes that achieve this purpose, eg. Gray code and T0 code.1

1.1. Gray Encoding

It has been observed that the addresses generated by a program are often sequential in nature. The simplest way to encode the generated addresses is binary, which results in a lot of transitions thus increasing the switching activity. One

f the often cited solutions for this was proposed by Su, Tsui and Despain2 to use gray encoding to minimize the number
f transitions. Gray encoding allows for only a single transition for consecutive addresses.

1.2. T0 Encoding

The sequentiality of the addresses is transferred to the subsystem by adding an additional redundant line to the bus in

rder to avoid transfer of consecutive addresses. The redundant line is set to zero when 2 of the addresses in the bus are

consecutive, this prevents unnecessary switching, and the receiver then calculates the new address. As it is clearly visible, the T0 code guarantees zero transitions as its asymptotic performance for in-sequence addresses1

2. PROBLEM STATEMENT AND MOTIVATION

An 16 bit address bus is assumed. Two bus encoding schemes, gray and T0 have to be implemented using VHDL. The program accesses to memory has to be modeled by generating address streams of varying sequentiality. The codecs are evaluated on the basis of switching activity and power consumption. Synthesization and evaluation of codec power consumption is done by using synopsis power compiler. The minimum bus load has to be calculated which makes the bus encoding convenient. The main motivation for the project is to be able to appreciate and understand the effectiveness of various encoding schemes for the address streams of different sequentiality.

Further author information: (Send correspondence to Prabhat Kumar Saraswat) Prabhat Kumar Saraswat: E-mail: prabhat.saraswat@alari.ch, Telephone: 0041 786295106

1

SLIDE 2

3. DESIGN AND IMPLEMENTATION OF VHDL MODELS

The first and foremost step is to design the VHDL models effectively to reduce the overheads due to the hardware of codec

itself. The gray code and T0 codec were implemented.

The gray encoding algorithm is implemented by comparing each bit with the next bit in the generated bit stream. For example, if the generated bit stream is represented as B ,the gray code will be a concatenation of B[i] xor B[i+1] where i is 0...n-1. The advantage of using gray algorithm before sending the data to the bus is that¡ we have less switching transitions and consequently less power usage. As the algorithm for encoding and decoding data is the same we can use same hardware to encode and decode the data. The hardware configuration for gray encoder and decoder implemented in VHDL is shown in figure1. The zero transition codec algorithm is very efficient for purely sequential addressing mode , In this case we need to define

ne extra line in our connection paths to use it as a flag. When system wants to access to sequential addresses in memory

we just freeze the first address and by setting the flag receiver is informed to calculate the addresses from the base address. The hardware configuration for T0 coded implemented in VHDL is shown below:

Figure 1. Implemented Hardware for GRAY and T0 CODEC respectively

The corresponding VHDL codes can be seen in the attached appendix with this report.

4. ADDRESS SEQUENCE GENERATION WITH A SPECIFIED SEQUENTIALITY

We have attempted to generate numbers with a specified sequentiality value. The objective of this attempt is to model the pattern of memory accesses from a processor when a software is run on it. The software in itself might be containing some loops where the sequential memory locations are accessed. However, there might be cases (branches etc.) when the memory accesses are not sequential. The problem to be addressed is how to model those cases when the accesses are not

sequential. How to simulate those cases and generate the resulting address streams, when a specific value, say sequentiality

percentage is defined. One primitive attempt to model this problem would be described here. However, this method is definitely not the best way to generate, but it is hoped that it would raise some questions and issues that would allow for further refinement and understanding of this problem. Our approach takes shape along with a basic argument that defines non sequentiality. Non sequentiality, as we can presume, is observed when a totally chaotic (random) distribution of numbers is generated. It refers to the stream generated when there is no correlation between the numbers. Thus we need a random number generator function to be able to generate such a stream. Let the minimum address accessed is 0, and the max address which can be reached is NUM.We have coded a function which generates a random number according to a uniform distribution: X is a Uniform Distirbution between (0, NUM) We will now define two parameters a and b which would be defining the sequentiality levels in generated numbers. The number a and b are related as: b = 1 − a Both a and b have ranges between 0 and 1 and the value of a would define the sequentiality percentage. The final number generated could be calculated by a simple function. Let I(n) represent a sequential stream from 0 to 2

SLIDE 3

NUM and X(n) define a uniformly random stream from 0 to NUM as mentioned in the aforementioned paragraph. The number to be generated was defined as: genNum = a × I (n) + b × X(n) The function was implemented in a CPP program and following graphs were generated by varying the values of a from 1 to 0 in steps of 0.1. Thus a purely sequential stream was gradually made non sequential. Thus 50% sequential occurs at a=0.5. The graphs of a sample generated streams for various sequentiality values can be seen in the attached appendix.

5. DESCRIPTION OF THE SIMULATION ENVIRONMENT AND METHODOLOGY

The simulation methodology followed can be enumerated into following steps. This is further elucidated by a figure showing some of the important components of the simulation and reports generation. Various steps involved are as follows:

1. The VHDL files are written corresponding to Gray code decoder, encoder and T0 code decoder and encoder.
2. Address streams corresponding to various sequentiality values were generated, corresponding to various sequentiality

values from 10 (purely sequential) to 0 (purely random).

3. The address streams are used to assign these values to the DIN signal of the encoder inputs of both T0 and Gray
codec. This is achieved by automatically generating corresponding do files for the same. The outputs from the

encoder modules forms the input for the decoders. The output of decoders are compared against the original stream to verify the behavioral correctness.

4. The test benches are also generated for the encoder and decoder of both Gray and T0 codecs using the generated

address stream. The testbenches are generated concurrently with the do files in order to maintain a coherent

behavior. This is done because the random numbers are calculated on the basis of a seed value which depends on

the time of system and various other parameters.

5. The VHDL models are synthesized using design compiler. The testbenches are executed outside the DC shell to

give a switching activity interchange format (SAIF) file. The switching activity is pre simulated with synthesized RTL model of codec and is attached to the testbench.

6. The reports corresponding to power consumption and switching activity are generated. The process is repeated for

all the values of a from 10 to 0. It is also repeated for address corresponding to both bit addressable and byte addressable memory. The figure corresponds to the simulation environment and methodology:

Figure 2. Simulation Environment Description

3

SLIDE 4

6. RESULTS

This section will explain all the comparisons and results obtained out of simulations. The assumptions would also be explained at places where they have been used.

6.1. Switching Activity Reductions

The hamming distance between the consecutive codes were calculated and added inorder to estimate the number of transitions for each codec. We have done the comparisons between the binary coded bus and gray coded bus, the second comparison is between binary and T0, and third is between T0 and gray encoded. The graphs are presented below: The graphs clearly indicate the superiority of the various coding schemes over the binary

Figure 3. Total transitions for various address sequentialities comparing between (a)BIN-GRAY, (b)BIN-T0 and (c)T0-GRAY

coded bus. The values on X axis indicates the value of b in the equation mentioned in the previous section. The small values of b corresponds to the address streams of high sequentiality. We can see that the gray transitions are nearly half the binary transitions for the first value of b=0 which corresponds to a purely sequential address stream. It should also be noted that as the sequentiality decreases the reductions in number of transitions due to gray code also decreases. It can be seen that towards the lower end of x axis (high values of b) the reductions due to gray encoding are small and often

incoherent. This may be attributed to the highly random nature of the address streams.

In case of T0 codec, for the total sequential case, the number of transitions is 1, thus this makes it a best choice for the streams having a very high degree of sequentiality. It is interesting to note that for the other cases when the sequentiality is 70 % or lower, the T0 codec actually performs worse than the gray code, in terms of number of transitions observed. This can be clearly observed from the third graph shown on the extreme left which shows the comparison between T0 and Gray codecs. We have also computed the total number of Bit Transitions on bus for various encoding schemes as compared wrt. the sequentiality values. The graphs corresponding to the Gray and T0 coded bus is shown below. It should be noted that the 17th bit in the T0 coded bus represents the INC which doesn’t change much as the changes occur only when there is a change in sequentiality. Often the lines of the buses does not have same capacitance.2 The bus lines towards the inner regions have high capacitance. Thus it would be beneficial if the transitions on those lines are made less. In the relevant literature in this context, there is also a concern about power consumption due to crosstalk between lines, so it is advised that the lines which do not have high number of transitions should be placed between the lines containing alot

f transitions. We also have tried to do something like this by pre profiling for various streams to observe which of the

bits are changing the most. It should be noted that the higher end bits are not changed because the generated address streams was not full 16 bits long.

Figure 4. Transition profiles of various bits in Gray and T0 encoded bus

4

SLIDE 5

6.2. Power Consumption results from Synopsis Power Compiler

The power consumption due to the codec hardware was evaluated using the power compiler. The switching activity was generated using the pregenerated testbenches corresponding to various address streams. We have done a simulation based estimation of SA, where a Switching Activity Interchange Format (SAIF) file is generated by simulating the RTL code with the testbench outside the dc shell and applying the SA to the design. The switching power is measured for both encoder and decoder of T0 and Gray codes respectively. We have calculated the power for two variants of memories, bit addressable and byte addressable. This is done by regulating the address streams generated for both types of memories. The results below are for encoder and decoder for Gray and T0 respectively. The quantitative total power calculation is shown in the next section.

Figure 5. Switching Power Consumption for enc and dec of Gray and T0 respectively.

It is clearly seen from the graphs that for highly sequential address streams, the switching power consumption is less. This concurs with the earlier observation that the transitions are less for sequential streams. When the streams are not sequential, the power consumption patterns are quite unpredictable as can be clearly observed from the graphs towards the higher values of b in x-axis. The power consumption of the T0 codec is more than the gray codec much owing to the increased complexity of the T0’s circuit.

6.3. Power Consumption Calculations due to Codecs

We would present the equations derived to calculate total power consumption of codecs. Let us define PGtot to be the total power consumption for Gray codec. The total power consumption is actually the sum of two factors, the power consumed by the codec and the power consumed by the bus lines. We would be referring to these values as PGcodec and PGbus from now on. PGcodec is sum of the contributions due to both encoder and decoder thus: PGcodec = PGenc + PGdec PGcodec = PGenc dynamic + PGenc static + PGdec dynamic + PGdec static The values for the aforementioned parameters are easily taken out from the power compiler’s report. It is assumed that the total number of bus lines are same as number of address bits. All the bus lines have different values

f capacitance.2 The power consumed can be modeled by a simple equation corresponding to the energy consumed by a
capacitor. The switching activity factor corresponding to each line also creeps in. The bus consumption for all 16 lines

can be showed by the following equation: PGbus = 1

2V 2F 15 i=0 CiSi

where V = voltage of the binary levels F = Frequency of the circuit (here 100 MHz) Ci = Capacitance for the bus line i Si = Switching activity for the bus line i However for the ease of calculation and tractability it can be safely assumed that ∀i, Ci = Ceq. The capacitance of all buses are assumed to be a constant value Ceq. Thus the bus equation finally becomes: PGbus = 1

2V 2FCeq

15

i=0 Si

thus PGtot = PGcodec + PGbus = PGenc dynamic + PGenc static + PGdec dynamic + PGdec static + 1

2V 2FCeq

15

i=0 Si

5

SLIDE 6

Similarly for the T0 Codec, we get the same equation except the number of bus lines are 17. Thus the equation is: PT0tot = PT0codec + PT0bus = PT0enc dynamic + PT0enc static + PT0dec dynamic + PT0dec static + 1

2V 2FCeq

16

i=0 Si

7. OPTIMUM BUS LOAD CALCULATIONS

The bus load is the capacitance of the bus lines. The higher is the load the higher is the power consumed per transition. This means that if we have a high load the reduction that is obtained after encoding the information (thus reducing the SA) in terms of power is high. So the savings obtained with respect to the original design are high and one may have benefits even if an encoder and a decoder (that itself consumes power) is added. On the other side, if the bus has a very limited load, the reduction in terms of power when the SA (with the encoding) is minimized is limited, hence adding the power due to the codec may result in an increase of the global power w.r.t. the original design.∗ Thus for the encoding to be convenient and keeping in mind above considerations the total power consumed by the codec should be less than the total power consumed without the codec hardware (with a simple binary scheme). As it has been calculated above, in the similar way, the power consumed in the unencoded binary stream is given by: Puncoded = 1

2V 2FCeq

15

i=0 S uni

where S uni is the switching activity due to binary transitions obtained using the simulations.

For the encoding to be convenient and profitable following inequalities should be solved. For Gray Codec PGtot < Puncoded PGcodec + 1

2V 2FCeq

15

i=0 Si < 1 2V 2FCeq

15

i=0 S uni

For T0 Codec PT0tot < Puncoded PT0codec + 1

2V 2FCeq

16

i=0 Si < 1 2V 2FCeq

16

i=0 S uni

The only unknown in both of the equations is Ceq, which was calculated for both gray and T0 codecs. The calculations was done for only the bit addressable memory. The calculated values are shown below: Codec Opt Busload Gray 42 pF T0 57 pF

8. CONCLUSIONS AND FUTURE WORK

This report provided a comprehensive evaluation of two bus encoding schemes, Gray and Zero Transition for the system level power optimization. The overheads due to power consumptions by the hardware itself were understood. In the end an optimum bus load was calculated to make encoding convenient. The effect of interchange of bus lines on the crosstalk power consumption was also understood. One of the future works could be to model it in hardware and quantitatively evaluate the effect. It was also seen that the address sequentiality has a lot of role in the power consumption. Thus a logic block can be put before encoding which could arrange the generated address to support a particular encoding scheme.

ACKNOWLEDGMENTS

We would like to thank Prof. Enrico Macii and Prof. Alberto Macii for being so helpful both in and outside of class. The email conversations helped us to understand the task at hand good enough and allowed us to explore new avenues. We would also like to thank each other for complementing each other so well during the project.

REFERENCES

1. L. Benini, G. Micheli, E. Macii, D. Sciuto, and C. Silvano, “Address bus encoding techniques for system-level power
ptimization,” 1997.
2. C. Y. T. C. L. Su and A. M. Despain, “Saving power in the control path of embedded processors.” IEEE Design and

Test of Computers, Vol. 11, No. 4, pp. 24-30, Winter 1994, 1994.

∗With reference to Prof Machi’s Email