1 Consider the execution of the following block of SPIM code. The - - PDF document

▶

Jun 23, 2023 501 likes •1.34k views

1 Consider the execution of the following block of SPIM code. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. .data start: .word 0x32, 44, Top, 0x33 str: .asciiz Top .align 8 end: .space 16

SLIDE 1

1 Consider the execution of the following block of SPIM code. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. 1 (a) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding con- tents. 1 (b) What are the values of the following. exit 0x00400028 Top 0x00400010 addi $t3, $t3, -1 (encoding) 0x216bffff .data start: .word 0x32, 44, Top, 0x33 str: .asciiz “Top” .align 8 end: .space 16 .text .globl main main: li $t3, 4 la $t0, start la $t1, end Top: lw $t5, 0($t0) sw $t5, 0($t1) addi $t0, $t0, 4 addi $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $zero, Top exit: li $v0 syscall The value in 0x10010008 depends on how the preceding instructions are

emcoded. “la $t1, end” is encoded into two instructions while “la $t0, start” is

encoded into one instruction. As long as your assumptions are consistent you shouldnot lose points. Everything else is 0x00000000 upto and beyond 0x10010100 (this is the point of the .align 8 directive)

Data Segment Addresses Data Segment Contents 0x10010000 0x00000032 0x10010004 0x0000002c 0x10010008 0x00400010 0x1001000c 0x00000033 0x10010010 0x00706f54 0x10010014 0x00000000 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00000000 0x10010024 0x00000000

SLIDE 2

1 (c) Provide a set of data directives that will perform the memory allocation implied by the following high level language declarations.

character My_Char;

integer Limit;

integer array[10];

1 (d) Given the following sequences, what is the range of feasible addresses for the label tar- get on the MIPS architecture? start: beq$1, $2, target add$4, $5, $6 . . . target: sub$6, $4, $8 1 (e) What i s the difference between the j and jal instructions? .data .byte 0x00 .align 2 .word 0x0 .data .space 40 ** Note there are many solutions * start + 8 <= target =< start+4 +215-1 (words) Both instructions work identically with regard to the program control flow, i.e., which instruction is executed next and how he address fields of the instruction are

endcoded. The jal instruction also causes the value of PC+4 to be stored in regis-

ter $31.

SLIDE 3

2 The following is block of SPIM code to compute the sum of the elements of a diagonal in a matrix stored as shown. For the matrix shown below the sum in $t2 should end up being 1+12+23+34 = 70. Fill in any missing pieces of code to complete the program. Use any additional registers as necessary to keep track of the number of loop iterations and any

ther information you require. You may complete your code on the next page as necessary.

If does not have to fit on the spaces shown. You cannot change the loop body. Remember there are many distinct, correct, feasible solutions! .data row1: .word 1,2,3,4 row1: .word 11,12,13,14 row1: .word 21,22,23,24 row1: .word 31,32,33,34 .text la $t0, row1 loop: lw $t1, 0($t0) #loop body that performs the sum add $t2, $t2, $t1 bne $t7, $0, loop # check if all elements have been accessed li $v0, 10 # exit the program syscall li $t6, 20 # load offset to next diagonal element (16+4) li $t7, 4 # load counter for number of elements add $t0, $t0, $t6 addi $t7, $t7, -1

SLIDE 4

3 Consider the following MIPs code taken from some program. We wish to turn this piece of code into a procedure. Note that no procedure call conventions have been followed and the procedure must be re-written to follow the MIPs procedure call conventions. 3 (a) What registers will have to be saved on the stack by the procedure if the MIPS register procedure call convention is to be followed and you do not modify the code shown above? 3 (b) Which registers are used to pass parameters to this procedure? How many registers do you think will be required for parameter passing ? 3 (c) If the procedure is stored in memory starting at location 0x00400020, what will be the encoding of the jal procA instruction that is in the calling program? procA: lw $12, 0($8) sw $12, 0($15) addi $8, $8, 4 addi $15, $15, 4 addi $16, $16, -1 bne $16, $0, procA jr $31 $s0, $fp (assuming we use the frame pointer). $ra does not have to saved since no other procedure/ function is called. The values of $8, $15, and $16 would be required in the procedure and are passed using registers $a0, $a1, and $a2. 0x0c100008

SLIDE 5

4 Consider the execution of the following block of SPIM code on a multicycle datapath. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume immediate instructions take 4 cycles. 4 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

f program execution is cycle number 0!

4 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding con- tents.

Cycle ALUSrcA ALUOp RegWrite Instruction Register IorD 14 1 00 0x8d0d0000 X 35 1 01 0x1560fffb X

.data start: .word 21, 22, 23,24 str: .asciiz “CmpE” .align 4 .word 24, 0x77 .text .globl main main: li $t3, 4 lui $t0, 0x1001 lui $t1, 0x1002 move: lw $t5, 0($t0) sw $t5, 0($t1) addiu $t0, $t0, 4 addiu $t1, $t1, 4 addi $t3, $t3, -1 end: bne $t3, $zero, move done:

Data Segment Addresses Data Segment Contents 0x10010000 0x00000015 0x10010004 0x00000016 0x10010008 0x00000017 0x1001000c 0x00000018 0x10010010 0x45706d43 0x10010014 0x00000000 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00000018 0x10010024 0x00000077

SLIDE 6

4 (c) Show a sequence of SPIM instructions that will branch to the label loop if the contents

f register $t0 is 13.

4 (d) What are the values of end, done, start, and main? 4 (e) What does the following bit pattern represent assuming that it is one of the following. 10101101 00101101 00000000 00000000. Provide your answer in the form indicated. addi $t1, $t0, -13 beq $t1, $zero, loop start: 0x10010000 main: 0x00400000 end: 0x00400020 done: 0x00400024 A single precision IEEE 754 floating point number A MIPS instruction (show in symbolic code) _sw $t5, 0($t1)__ ( -1.0101101 x 2-37_ show actual value in scientific notation)

SLIDE 7

5 Consider the execution of the following block of SPIM code on a multicycle datapath. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume immediate instructions take 4 cycles. 5 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

f program execution is cycle number 0!

5 (b) Show a sequence of SPIM instructions to implement the following: x = a[i*4], where a[i] is the access of element i from array a[]. Use whatever register allocation you need.

Cycle ALUSrcB ALUOp RegDst Instruction Register PCWriteCond 9 11 00 X 0x3c091002 20 00 00 0xad2d0000

.data start: .word 21, 22, 23,24 str: .asciiz “CmpE” .align 4 .word 24, 0x77 .text .globl main main: li $t3, 4 lui $t0, 0x1001 lui $t1, 0x1002 move: lw $t5, 0($t0) sw $t5, 0($t1) addiu $t0, $t0, 4 addiu $t1, $t1, 4 addi $t3, $t3, -1 end: bne $t3, $zero, move done: .text add $t5, $t0, $t0 # $t0 contains the value of i add $t5, $t5, $t5 # two additions to compute i4 add $t5, $t5, $t5 # two additions to compute i8 add $t5, $t5, $t5 # two additions to compute i*16 add $t1, $t5, $t3 # base address contained in $t3 lw $t2, 0($t1) # value of x stored in $t2

SLIDE 8

6 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and the text segment starts at 0x00400000. 6 (a) How much space in bytes does the program occupy in the data segment and text seg- ment? 6 (b) With respect to the above program,

what are the values of the following?

.data L1: .word 0x32, 104 .asciiz “Test 1” .align 2 Blank: .word L2: .space 64 .text main: li $t0, 4 move $t2, $zero li $a0, 1 loop: jal solo addi $t0, $t0, -1 addi $a0, $a0, 1 add $t2, $t2, $v0 slt $t1, $t0, $zero bne $t1, $zero, loop li $v0, 10 syscall Data segment (21 words, 84 bytes) - remember must also count reserved space Text segment (11 words, 44 bytes) L2 _0x10010014__ loop 0x0040000c_ memory location 0x1001000c 0x00003120______

SLIDE 9

What information would be stored in the symbol table for the above program? Be specific

about the above program. Do not provide a generic answer. 6 (c) What is the encoding of the following instructions? 6 (d) Which of the instructions in the above program, if any, must be identified as requiring relocation information ? 6 (e) Suppose the procedure solo is independently compiled as a separate module. When it is linked with the main program the first instruction in solo is placed in memory at address

0x0040100c. Provide the hexadecimal encoding of the jal solo instruction.

Upon completion of compilation, the label solo and its address (relative to the start of this module). This is information to be used by the linker to find and link the module corresponding to solo. bne $t1, $zero, loop 0x1520fffa______ add $t0, $t0, $v0 _0x01024020________ The jal instruction since this is the only instruction that relies on absolute addresses. 0x0c100403

SLIDE 10

7 Consider the program shown on the previous question. You have probably noticed that it does not follow the MIPS register saving convention. Rewrite the code showing any changes that would be required by the MIPS register saving conventions. 7 (a) What is the difference between the jal and j instructions? .data L1: .word 0x32, 104 .asciiz “Test 1” .align 2 Blank: .word L2: .space 64 .text main: li $t0, 4 move $t2, $zero li $a0, 1 loop: subu $sp, $sp, 32 sw $t0, 0($sp) sw $t2, 8($sp) sw $a0, 12($sp) sw $t1, 16($sp) jal solo lw $t0, 0($sp) lw $t2, 8($sp) lw $t1, 16($sp) lw $a0, 12($sp) addu $sp, $sp, 32 addi $t0, $t0, -1 addi $a0, $a0, 1 add $t2, $t2, $v0 slt $t1, $t0, $zero bne $t1, $zero, loop li $v0, 10 syscall The execution of the jal instruction also causes the return address to be saved in $ra. The target address is computed in the same manner for both instructions.

SLIDE 11

8 Answer the following questions with respect to the MIPS program shown below. For the purpose of this test, assume that each instruction can be stored in one word, i.e., do not worry about pseudo instructions! Further, assume the data segment starts at 0x1001000 and that the text segment starts are 0x04000000. 8 (a) What are the values of the labels L2 and Loop? 8 (b) Provide the hexadecimal encodings of the following instructions? 8 (c) Show the contents of the data segment. Show both addresses and values at those addresses. 8 (d) The instruction BLT (Branch on less than or equal too) is not a native instruction. Show the SPIM code that could be used to implement this test. move $t0,$0 loop: mul $t8, $t0, 4 add $t1, $a0, $t8 sw $0, 0($t1) addi $t0, $t0, 1 slt $t7, $t0, $a1 bne $t7, $0, loop .data .word 24, 0x16 .byte, 77,66,55,44 .text L2: L2 0x10010008______ loop _0x04000004___________ bne $t7, $0, loop add $t1, $a0, $t8 0x15e0fffa 0x00984820 0x10010000 0x18 0x10010004 0x16 0x10010008 0x2c37424d slt $t1, $t3, $t2 beq $t1, $0 loop ble $t2, $t3, loop

SLIDE 12

9 (20 pts) This question deals with procedure calls. Consider the following SPIM program

segment. For the purpose of this test, assume that each instruction can be stored in one

word, i.e., do not worry about pseudo instructions! Further, assume the data segment starts at 0x1001000 and that the text segment starts are 0x04000000. 9 (a) Consider the first time the procedure func is called. What is the contents of $ra when the program is executing func. 9 (b) Based on the MIPS register saving conventions which registers should be stored on the stack in the calling stack frame when func is called. .data .word 0x33, 0x2 .byte 4, 3, 0, 0 .ascii”Final” .text li $t0, 4 jal func slt $a0, $0, $t9 bne $t0, $0,loop li $v0, 10 syscall addi $a0, $a0, -1 jr $ra first: str: main: loop func: move $a0, $t0 move $v0, $a0 move $a0, $v0 0x0400000c $t0, $t9, $a0 Note in this case $a0 is stored as a matter of convention since the calling procedure is not relying on the availavbility of the value that $a0 contained before the call.

SLIDE 13

10 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and that the text segment starts at 0x00400000. Assume that the instruction done is encoded in one word. 10 (a) Write the values of the words stored at the following memory locations. Provide your answer in hexadecimal notation. 10 (b) Consider the process of assembly of the above program.

What would the symbol table contain?

.data label: .word 24, 28 .byte 64, 32 .asciiz “Example Program” .text main: jal push jal pop done pop: lw $fp, 0($sp) lw $ra, 4($sp) addiu $sp, $sp, 32 ret1: jr $ra push: subiu $sp, $sp, 32 sw $fp, 0($sp) sw $ra, 4($sp) ret2: jr $ra Word Address Value _0x10010008________ _0x00400000________ _0x0040000c______ _78452040__ _8fbe0000__ _0c100007_________ All of the labels that the linker needs to know about: push and pop and main (for programs where the loader creates code that jumps to main). We do not need to store labels from the data segment.

SLIDE 14

Assuming that the memory map is fixed, what relocation information would be

recorded if any? 10 (c) What are the values of the following labels? 10 (d) Assume that the procedures push and pop were assembled in a distinct file and linked with the main program. Further assume that the procedures were placed in memory starting at location 0x00400040. Provide the encoding of the ‘jal pop’ instruction. The jal instructions generate absolute addresses and must have relocation information stored to permit placement of the program starting elsewhere in memory. ret1 0x00400018_ push 0x0040001c_____ 0x0c100010

SLIDE 15

11 (a) Provide the MIPS code to do the following.

implement the greater-than-or-equal-to-zero test using only native instructions.
initialize a register to the value 0x33334444.

11 (b) Provide two advantages and two disadvantages of RISC vs. CISC machines. Example for a general test of $t0 greater-than-or-equal-to $t1 slt $t2, $t0, $t1 bne $t2, $0, False j True Where True and False are the labels of the instructions that correspond to the blocks of code to be executed depending on the outcome of the test. move $t0, $zero lui $t0 0x3333

ri $t0, $t0, 0x4444

From text

SLIDE 16

12 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and the text segment starts at 0x00400000. 12 (a) What function does the last two instructions in the main program realize? 12 (b) With respect to the above program,

what are the values of the following symbols?
What is the total number of bytes required for this program, both data segment and

text segment storage? .data .word 0x21, 32 .byte 4, 3 .ascii”Test” .text li $4, 4 jal func slt $7, $0, $4 bne $7, $0,loop li $v0, 10 syscall addi $4, $4, -1 jr $31 first: str: main: loop: func: .align 2 end: Program Exit. Ref SPIM documentation end 0x00400014___ loop 0x04000004_ func 0x04000018_ str 0x1001000c_______ data segment - 16 text segment - 32

SLIDE 17

12 (c) What is the encoding of the following instructions? 12 (d) Using only native instructions, how can you implement the following test: branch-on- greater-than-or-equal-to-5? bne $7, $0, loop 0x14e0fffd______ jal func _0x0c100006______ addi $4, $4, -1 _0x2084ffff_________ Example: slt $t2, $t0, $t1 bne $t2, $0, False j True Where True and False are the labels of the instructions that correspond to the blocks of code to be executed depending on the outcome of the test.

SLIDE 18

13 Consider the case where Procedure A calls procedure B, and procedure B calls procedure C. Procedures A and B both use registers $15, $16, $17 and $24, and procedure C uses regis- ters $9, $10, and $11. The top of the stack initially points to the top of an active frame. 13 (a) Show the contents of the stack when Procedure C is executing. Assume the MIPS register saving conventions as defined in the text. State, or clearly indicate the number of stack frames and which registers are saved by each procedure. 13 (b) Would callee save have been more efficient? Justify your answer. 7FFFFFFC $sp stack grows down $15 $24 $16 $17 $15 $24 saved by A saved by B C’s stack frame Note: I have not shown the procedures saving $ra and $fp which should be

illustrated. Procedure C does not use any of the $sx registers and does not

call other procedures. Therefore C does not need to save any of the registers.

No. Each procedure would have to save all the registers used in

the procedure. This is clearly less efficient, e.g., look at proce- dure C.

SLIDE 19

14 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the text segment starts at 0x00400000 and that the data segment starts at

0x10010000. Assume that registers $7, $8 and $9 have initial values 16, 0x10010020, and

0x10020020 respectively. 14 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

f program execution is cycle number 1

14 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. 14 (c) Assume the exception/interrupt handler is stored at location 0x80000020. Provide the SPIM instruction(s) for transferring program control to begin executing instructions at this

address. Do not worry about setting the return address.

Cycle ALUSelA ALUOp PCWrite IR (in HEX!) ALUOutput 8 1 00 0xad2c0000 0x10020020 20 1 00 0x20e7ffff 0xf

.data str: .asciiz “Start” .align 2 .word 24, 0x6666 .text move: lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 end: bne $7, $0, move

Data Segment Addresses Data Segment Contents 0x10010000 0x72617453 0x10010004 0x00000074 0x10010008 0x00000018 0x1001000c 0x00006666 lui $1, 0x8000 addi $1, $1, 0x0020 jr $1

SLIDE 20

15 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the text segment starts at 0x00400000 and that the data segment starts at

0x10010000. Assume that registers $7, $8 and $9 have initial values 16, 0x10010020, and

0x10020020 respectively. 15 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

f program execution is cycle number 1

15 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. 15 (c) What is the difference between signed and unsigned instructions and what is the moti-

Cycle ALUSelB ALUOp RegDst IR (in HEX!) MemToReg 7 11 00 X 0xad2c0000 X 17 10 00 0x25290004

.data str: .asciiz “Exams” .align 4 .word 24, 0x6666 .text move: lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 end: bne $7, $0, move

Data Segment Addresses Data Segment Contents 0x10010000 0x6d617845 0x10010004 0x00000073 0x10010008 0x00000000 0x1001000c 0x00000000 0x10010010 0x00000018 0x10010014 0x00006666

SLIDE 21

vation for making this distinction? 15 (d) What are the values of end and str? 15 (e) What does the following bit pattern represent assuming that it is one of the following. 00101001 10111010 00000000 00000000. Provide your answer in the form indicated. Signed instructions interpret operands as 2’s complement numbers They also will set conditions such as overflow, negative, etc. Unsigned numbers will be treated as n bit binary numbers. They will typically not set conditions such as overflow. Certain classes of numbers natually involve unsigned arithmetic, e.g., memory address computations. In order to detect these conditions in some instances and ignore them in others, MIPS uses two different types of instructions. str: 0x10010000 end: 0x00400014 A single precision IEEE 754 floating point number A MIPS instruction (show in symbolic code) _slti $26, $13, 0__ ( 1.011101 x 2-44_ show in scientific notation)

SLIDE 22

16 Answer the following questions with respect to the MIPS program shown below. Note that this simple program does not use the register saving conventions followed in class. Assume that each instruction is a native instruction and can be stored in one word! Further, assume the data segment starts at 0x10001000 and that the text segment starts are 0x00400000. 16 (a) What does the above program do? 16 (b) State the values of the labels loop and main? 16 (c) what are the hexadecimal encodings of the following instructions? .data label: .word 8,16,32,64 .byte 64, 32 .text .globl main main: la $4, label li $5, 16 jal func done func: move $2, $4 move $3, $5 add $3, $3, $2 move $9, $0 loop: lw $22, 0($2) add $9, $9, $22 addi $2, $2, 4 slt $8, $2, $3 bne $8, $0, loop move $2, $9 jr $31 Sum the elements stored starting at address label loop main _ 0x00400020 0x00400000 bne $8, $0, loop add $9, $9, $22 0x1500fffb 0x01364820

SLIDE 23

16 (d) The above program does not implement any register saving conventions on a proce- dure call. Assume we will use the MIPS register saving convention i.e., Caller/Callee

Save. Ignore the storage of local variables on the stack. What will be the contents of the top

frame on the stack when the procedure func is executing? 16 (e) Identify the local and global labels in the program? $fp, $22, since no other procedure is called $ra is not stored main is a global label all other labels are local labels

SLIDE 24

17 (a) What is an unresolved reference? How and when does it become resolved? 17 (b) Write the SPIM instruction sequence that you could use to implement the branch-on- greater-than-or-equal-to-test. For example, how would implement “bge $7, $6, loop”. A reference to a label that is undefined It is resolved by the linker which has access to all of the labels accessed by the programs being linked slt $1, $7, $6 beq $1, $0, loop

SLIDE 25

18 Consider the following isolated block of code. Note that some initialization code is miss- ing. 18 (a) If the text segment starts at 0x04000000, what is the value of the label end?. 18 (b) Let is assume we wish to implement the above function as a procedure.

How many arguments would the procedure have and how would they be passed to

this procedure from a calling program?

name the registers that would be stored on the stack after this procedure is invoked

18 (c) Show the hexadecimal encoding of the last instruction. 18 (d) What does the following bit pattern represent assuming that it is one of the following. 00101100 01110000 11000000 00000000 .text move: lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 end: bne $7, $0, move 0x04000014 Three passed in argument registers $a0-$a2 $7, $8, $9, $12 would have to be stored by the caller, if at all. $fp is the only register that is stored following the steps in the text. 0x14e0ffffa A single precision IEEE 754 floating point number A MIPS instruction (symbolic code!) _sltiu $16, $3, 0xc000________ (in scientific notation)) _1.111000011 x 2-39_________

SLIDE 26

19 Consider the following isolated block of SPIM code. Ignore initialization code. The data segment starts at 0x1001000 and the text segment starts at 0x00400000. Note that this block of code does not use registers saving conventions on a procedure call. 19 (a) What are the values of the following labels?. 19 (b) What are the encodings of the following instructions? 19 (c) (6 pts) Let us suppose the procedure mystery is independently compiled and linked when the program is run. After linking the program is stored in memory and mystery has a value of 0x00400010. What is the encoding of the jal mystery instruction.

start: addi $2, $0, 85 addi $3, $0, 1 jal mystery end: j exit mystery: add $4, $0, $0 loop: andi $5, $2, 1 beq $5, $0, skip add $4, $4, $3 skip: srl $2, $2, 1 bne $2, $0, loop exit: jr $31 li $v0, 10 syscall

loop ____0x00400014 skip _0x00400020 addi $3, $0, 1 0x20030001 beq $5, $2, skip 0x10a00001____ 0x0c100004

SLIDE 27

19 (d) Which of the following are valid MIPS instructions? If they are invalid you must state the correct form or clearly state why it is incorrect.

lw $t0, $t1($t3)
sltiu $t1, $t2, 0x44
bne $t1, label, loop

19 (e) Suppose I want to store the following information in the data segment in the order

shown. Show a sequence of SPIM data directives that will correctly do so. The text string,

each reserved word and the byte array should be labeled so that programs may reference them easily.

the text string “Enter a SPIM instruction”
reserve space for 4 words
store an array of 16 bytes (pick your own values)

The offset must be a label or numeric constant. Examples of the correct form are: lw $t0, 4($t3) or lw $t0, label2($t3) This is correct. Immediates can be specified in hexadecimal notation or base 10 notations. The arguments to the instruction must be registers, or one of the arguments can actually be an immediate. A correct form would be bne $t1, $s3, loop .data str: .asciiz “Enter a SPIM instruction” .align 2 L1: .space 4 L2: .space 4 L3: .space 4 L4: .space 4 array: .byte 1,2,3,4,5,6,7,8,9,11,22,33,44,55,66,77

SLIDE 28

19 (f) (2 pts). In the preceding question (1(f)) what assembler directive would cause the byte array to start on a 64 byte boundary? 19 (g) What does the program in question 1 do? You have to specific and state the function or

peration performed. No partial credit.

.align 6 this moves allocation to the next 26 byte boundary Counts the number of bit set to 1 in $2.

SLIDE 29

20 (a) What is the difference between native instructions and pseudoinstructions? 20 (b) Let us consider the program shown in the previous question. The procedure mystery does not appear to follow any of the SPIM procedure call conventions. If you were to rewrite mystery to follow the MIPS caller/callee saving convention, what registers would you expect to find in the stack frame for mystery while it is executing? 20 (c) Native instructions are implemented by the underlying hardware. pseudoinstructions are translated into native instructions by the assembler. pseudoinstructions provide more powerful instructions to the programmer without affecting the complexity of the hardware since they are not interpreted by the hardware. Based on our class discussions only $fp. Suppose that a 32-bit integer A is stored in memory, starting at the address given in register $t1, and that another 32-bit integer B is stored in the next sequential word after A. Write a MIPS procedure Swap that exchanges the values A and B in memory if A > B. The procedure should return to its caller when it is finished. Assume a purely caller-save convention for preserving registers is being used. Full credit is the program fits in the table

below. Provide comments.

label instruction comment Swap: lw $t0, 0($t1) Load A lw $t2, 4($t1) Load B slt $t3, $t2, $t0 Check if B<A beq $t3, $0, exit If not true (result=0)exit sw $t0, 4($t1) Swap A and B sw $t2, 0($t1) exit: jr $31 Return Another solutiojn would expect to find the address of A in $a0 and use this register instead of $t1. Since we are using caller-save and this procedure does not call anyone else, the return address does not need to be saved. With no saving requirements there is no stack frame that needs to be allocated/deallocated.

SLIDE 30

22 For the following questions, consider the single cycle datapath shown in Figure 5.24 on page 314. The value of loop is 0x00400004, the contents of register $t0, $t1, and $t2 are 0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008 22 (a) Fill in the values of control signals in the following table for each instruction. The jump instruction does not utilize any portion of the datapath other than the portion that updates the PC with the jump address. However, the register file does utilize bits of the encoded address as register addresses for rs and rt. The input to the write data port of memory is the contents of register rt. Therefore first encode the j instruction to find the value of rt. It will be 16. 22 (b) Errors in fabrication are a major concern in the design of a datapath. Suppose that MemToReg and RegDst multiplexor control signals were stuck at 0, i.e., you cannot assert these signals due to a fabrication defect. Which of the instructions supported by the datap- ath will still work? 22 (c) This question deals with the changes to the datapath required to implement the addition

f the jal instruction. Full credit requires you to be specific.
what are the structural changes, if any, that must be made to the datapath

Instruction ALUSrc ALUOp RegDst MemToReg Write Data Input to Data Memory j loop X XX X X contents of $16 sw $t2, 8($at) 1 00 X X 0x000000c0 sw, beq, and j. This follows from the truth table for the

controller. Which rows can still produce valid control signal

values?

We need $31 as a hardwired input address to the RegDst Mux. RegDst is

now a 2 bit control signal

We need PC+4 as an input to the MemToReg Mux since it is now a candidate

for writing into the register file (into $31). The MemToReg signal is now a 2 bit control signal

The jal control signal is used to select the jump address from the PC source mux.

This mux is is expanded to a 2 bit signal.

SLIDE 31

Show the changes required to the main controller truth table and the PLA implementation
f this controller.

RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp Jal 10 X 10 1 0 0 0 XX 1 The PLA implementation illustarted in Figure C.2.5 will be changed to add another gate to recognize the jal instruction. The RegDst and MemToReg outputs must be modified to produce two output signals. The most significant bit is simply the jal while the least significant remains as before. We also introduce a jal output.

SLIDE 32

23 Consider the state of following procedure executing within the single cycle SPIM data path shown in Figure 5.24 on pg. 314. The first instruction is at address 0x00400440 and regis- ters $4 and $5 contain the values 0x10010440 and 0x00000040 respectively. Assume that all instructions are native instructions. 23 (a) Fill in the values of the signals shown for the following instructions (first time through the loop). 23 (b) Suppose in our design the controller was implemented in a separate chip. Due to a fab- rication defect the MemToReg control signal has been found to be defective. What modifi- cations can we make to any part of the datapath (excluding the controller) such that the datapath still functions correctly. (Hint: This is equivalent to asking how can I get rid of the MemToReg control signal).

Instruction ALUOutput Value at the Write Data Input of Data Memory RegDst ALUOp ALUSrc MemTo Reg beq $3, $2, end 0x0000003c 0x100100444 X 01 X lw $22, 0($2) 0x10010440 0x00000000 00 1 1

func: add $2, $4, $0 add $3, $5, $0 add $3, $3, $2 add $9, $0, $0 add $22, $0, $0 loop: lw $22, 0($2) add $9, $9, $22 addi $2, $2, 4 beq $3, $2, end j loop end: move $2, $9 jr $31 Invert the RegDst signal and use it as MemToReg

SLIDE 33

23 (c) Now suppose the above program was executed on a multicycle datapath. Assume that the first cycle for fetching the first instruction is numbered cycle 0. Assume immediate instructions take the same number of cycles as the R-format instructions. On which cycle do the following events take place.

add $9, $0, $0 completes execution

15___(the end of the instuction execution not the EX cycle)

the branch address for beq $2, $3, end is computed (for the first time through the loop)

_34 23 (d) What the principal advantage of the multicycle datapath over the single cycle datapath. Be specific and provide an example to illustrate your point. Each instruction only takes as long as it needs to. For example, if the clock cycle was 10 ns and the load takes 5 cycles (50 ns) branches now only take 30 ns (3 cycles and R-format instuctionsonly take 40 ns (4 cycles). The net time savings

ver the execution fo a program can be substantial.

SLIDE 34

24 Show the hexadecimal value of the single precision normalized floating point representa- tions of the following numbers. Use the IEEE 754 standard.Your representation should be fully compliant with the definition. 24 (a) Perform the following floating point operations assuming that the number representa- tions are IEEE 754 compliant. Your answers can be represented in scientific notation.

multiplication: 0xbe000000 x 0xbf800000 (
1.001 x 2131 + 1.11001 x 2126

24 (b) Provide an example of a number that is too large to be represented in single precision notation but can be represented in double precision. Infinity

4.5 ______________________

0x7f800000 0xc0900000

0.125 x -1.0 = 0.125

1.001 x 2131 + 0.0000111001 x 2131 = 1.0010111001 x 2131 1.0 x 2277

SLIDE 35

25 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle number 0. Assume that addiu and addi require the same number of states as an add instruction. 25 (a) How many cycles does this code sequence take to execute to completion? 25 (b) On what cycle is the add instruction in the body of the loop executed for the second time (remember the first cycle is cycle number 0!)? 25 (c) Fill in the following values during the requested cycle (remember the first cycle is cycle number 0!). The first row corresponds to the fetch cycle of the add instruction. The second row corre- sponds to third cycle of the branch instruction. Cycle ALUSrcB ALUOp RegDst PCWrite MemToReg PCSource 13 01 00 1 00 27 00 01 01 add $t4, $0, $0 addi $t3, $0, 4 loop: lw $t0, 0($t1) add $t4, $t4, $t0 addiu $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $0, loop 88 (the loop takes 20 cycles and executes 4 times). Cycle no. 35 for the EX cycle of the add instruction.

SLIDE 36

26 (a) Show the single precision normalized floating point representations of the following

numbers. Use the 754 standard and present your solution in hexadecimal notation.

26 (b) How would you interpret the following hexadecimal number assuming that it is a sin- gle precision IEEE 754 floating point number: 0x00140000? 26 (c) Assuming IEEE 754 rules and conventions, what is the result of the following operation in hexadecimal notation.

1.101 x 2142 X 1.01 x 2122 (note that the exponent values are represented in

base 10).

0.125 ___________________________________________________________

0.0 ___________________________________________________________

10.03125 ___________________________________________________________ 0xbe000000 0x41208000 It is a denormalized number since the exponent is 0, but the significand is non-zero. 1.000001 x 2138

SLIDE 37

26 (d) Assuming that we have 6 significant digits (including the implicit bit). What is the value of the guard, round and sticky bits in the course of the following computation? Repeat for 4 significant digits. 1.001 x 2121 + 1.1 x 2128 After adjusting the expoent value of the smaller number have 1.1000000000 x 2128 + 0.0000001001 x 2128

1.1000001001 x 2128

With 6 significant digits we have the guard, round and sticky bits as 011 With 4 significant digits we have 001. Note the sticky bit is set if here are non-zero bits to the right of the guard and round bits.

SLIDE 38

27 For the following questions, consider the single cycle datapath shown in Figure 5.24 on page 314. The value of loop is 0x00400004, the contents of register $t0, $t1, and $t2 are 0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008 27 (a) Fill in the values of control signals in the following table for each instruction. The jal instruction requires some additions to the datapath. However, the register file does utilize bits of the encoded instruction as register addresses for rs and rt. The input to the write data port of memory is the contents of register rt. Therefore encode the jal instuction to find the value of rt. It will be 16. 27 (b) Your are designer and wish to reduce the number of control signals in this datapath. Provide one hardware modification that will eliminate the MemToReg control signal. 27 (c) This question deals with the changes to the datapath required to implement the addition

f the addi instruction. Full credit requires you to be specific.
what are the structural changes, if any, that must be made to the datapath

Instruction ALUSrc ALUOp RegDst MemToReg Write Data Input to Data Memory jal loop x x 10 10 contents of $16 bne $t1, $t2, loop 01 X X 0x000000c0

We need $31 as a hardwired input address to the RegDst Mux. RegDst is

now a 2 bit control signal

We need PC+4 as an input to the MemToReg Mux since it is now a candidate

for writing into the register file (into $31). The MemToReg signal is now a 2 bit control signal Use the complement of RegDst as the value of this signal. This is not the

nly solution but one which is readily apparent from the truth table.

none are required. We have the connections necessary to add an immediate operand to the contents of a register. The issue is simply

ne of control.

SLIDE 39

Show the changes required to the main controller truth table and the PLA implementation
f this controller.

RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp 0 1 0 1 0 0 0 00 The values of AluSrc and RegWrite become modified to include another OR term. Note the change in these two columns on the truth table. Also we have an additional gate on the input to recognize the addi instruction.

SLIDE 40

28 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle number 0. Assume that li and addi require the same number of states as an add instruction, that jal and slt require the same number of states as a bne instruction. 28 (a) How many cycles does this code sequence take to execute to completion? 28 (b) On what cycle is the slt instruction in the body of the loop fetched for the first time (remember the first cycle is cycle number 0!)? 28 (c) Fill in the following values during the requested cycle (remember the first cycle is cycle number 0!). The first row corresponds to a decode cycle. The second row corresponds to a write back cycle. Cycle ALUSrcB IorD PCWrite RegDst MemToReg PCSource 13 11 00 26 00 1 00 .text main: li $t0, 2 add $t2, $zero, $zero li $a0, 1 loop: jal solo addi $t0, $t0, -1 addi $a0, $a0, 1 add $t2, $t2, $v0 slt $t1, $t0, $zero bne $t1, $zero, loop 33 27

SLIDE 41

29 (a) Show the hexadecimal value of the double precision normalized floating point repre- sentations of the following numbers. Use the IEEE 754 standard. 29 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inac- curacy in the following IEEE 754 computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the mag- nitude of the error if any. 29 (c) When using Booth’s algorithm, does it matter which of the operands are the mutliplier? Justify your answer. Infinity 0x7ff0000000000000________

1.0 _0xbff0000000000000_______

(1.01 x 2125) X (1.001 x 2129) 1.001000 x 1.01

1.01101 x 2125+129-127=127

Rounding down gives us 1.011 x 2127 There is an error and the magnitude of the error is 0.00001 x 2127 The number of addition operations depends on the distribution of 1’s in the multiplier. We can pick the operand that will produce the minimum number of addition operations. If shifts are faster than additions, we can speed up the opera- ton with this optimization.

SLIDE 42

29 (d) Consider the operation 58 * 4 using version 3 of the multiplication algorithm in chapter 4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm. What are the initial and final contents of the multiplicand and product registers? Assume that the multiplier and multiplicand have the same number of bits.

Initial Final multiplicand 000100 000100 Product 000000111010 000011101000

SLIDE 43

30 Consider a base, multi-cycle SPIM machine operating at 200 Mhz, with the following pro- gram execution characteristics. Each of the following parts are independent of each other. 30 (a) The designers decided to add a new addressing mode to help with array addressing where we often add a byte offset to a base address stored in a register. Therefore a sequence of instructions such as code block 1 below can be replaced by the instruction shown in code block 2. We have 40% of the lw instructions affected in this manner. Assum- ing that the cycle time does not change and the new instruction also takes 5 cycles, what is the speedup produced by the use of this new addressing mode?

Instruction class CPI Frequency R-Type + Immediate 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3%

add $t2, $t1, $t0 lw $t3, 0($t2) lwx $t2, $t1+$t0 becomes code block 1 code block 2 CPI_old = 0.454 + 0.255 + 0.204 + 0.073 + 0.033 = 4.15 Ex_old = #Instr 4.15 * clock_cycle_time With the optimization the number of instructions has been changed. If 40% of the lw instructions have been affected that means 0.4 * 0.25 = 0.1 or 10% of the total number of instructions have been affected. For each of these instructions we can eliminate one instruction by combining two instructions. The new total number of instructions is there- fore (0.9 * #Instr). The eliminated instructions are all R-format instructions. Therefore the percentage of such instructions are reduced by 10%. CPI_new = 0.354 + 0.255 + 0.24 + 0.073 + 0.033)/0.9 = 4.17 Ex_new = (0.9 #Instr) * 4.17 * clock_cycle_time Comparing the execution times we can see that this is a good trade-off.

SLIDE 44

30 (b) Is it possible to halve the execution time of programs by simply speeding up R-format and immediate instructions, i.e., realize a speedup of 2? Justify your answer. 30 (c) What is the maximum MIPS rating of the multicycle datapath? From Amdahl’s Law we have Speedup = 1/(s + ((1-s)/n)) where s is the fraction that is unaffected, and n is the factor by which this component can be sped up. Let us assume that R-format and immediate instructions are infinitely sped up, i.e., n = infinity. In this case we have Speedup = 1/(0.55) < 2. Therefore the answer is no. The fastest instruction takes 3 cycles, i.e., has a CPI of 3. This is the theoretically lowest CPI possible for any program (although it is unrealistic to have a program that only branches opr jumps!) MIPS Rating = (Clock Rate in MHZ)/CPI = 200/3 = 66.7 MIPS

SLIDE 45

31 We wish to make our datapath implement instructions such as those that are found in the Power PC. Specifically we are interested in adding an instruction that will increment a counter, and branch to a label if the counter value is 0: icb $t0, loop.This instructon oper- ates just as a branch instruction would if the branch is taken. Asume that this instruction uses the sameformat as a branch instuction (with the rt field set to 0). Assume that the

pcode for this new instruction is 101000. Explicitly state any assumptions you need to

answer the following questions. 31 (a) State any additions/modifications required to the hardware data path of Figure 5.28 or describe how the instruction can be implemented with no physical modifications. 31 (b) Show the modified state machine that will handle this instruction. Provide the values of relevant control signals in each additional state. Any control signals not specified will assumed to be X. The execution of icb requires the contents of the register to be decremented. We can provide this value of 1 as an input to the AluSrcB mux which makes the number of inputs 5, and therefore the AluSrcB mux requires a three bit control signal. The RegDst field must be exapanded to a 2 bit field since the rs register must now be written using rs as a destination address. Thus this instruction has a WB state. from decode AluSrcA = 1 AluSrcB = 100 AluOp = 00 PCWriteCond PCSource = 10 RegWrite RegDst = 10 MemToReg = 0 to fetch

SLIDE 46

31 (c) Show the additions to the microcode to implement this instruction. The existing micro- coded implementation of the state machine before addition of this instruction is shown

below. You need only show additions/modifications. Note that there is more than one

approach to solving this problem. Soem solutions may require additions to the microin- struction format in your text. If so state them explicitly.

label ALU control src1 src2 Register Control Memory PCwrite Control Sequencing fetch add PC 4 read PC ALU seq add PC extshft dispatch-1 Mem1 add A extend dispatch-2 LW2 read ALU seq write MDR fetch SW2 write ALU fetch Rformat1 Func A B seq write ALU fetch BEQ1 Subt A B ALUOut-Cond fetch JUMP1 jmp addr fetch ICB add A +1 ALUOut-cond seq write ALU fetch

Dispatch ROM 1

Op Name Value 000000 R-format 0110 000010 jmp 1001 000100 beq 1000 100011 lw 0010 101011 sw 0010 101000 icb 1010

Dispatch ROM 2

Op Name Value 100011 lw 0011 101011 sw 0101

SLIDE 47

32 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the manner in which the data path controller is modified to support exceptions. 32 (a) In order to support n exceptions, how many additional states should be added to the state machine? How many additional microinstructions should added? 32 (b) Consider an illegal memory address exception. Show the changes to the state machine for this exception. Make any assumptions that you feel may be necessary. For each addi- tional state clearly show the values of relevant control signals. Any signals left unspecified will be assumed to be X (don’t care). n IntCause = 10 CauseWrite AluSrcA = 0 AluSrcB = 01 AluOp = 01 EPCWrite PCwrite PCSource = 11 to fetch code for illegal memory address is 10 The illegal memory address exception is detected in any state from

State 0
State 3
State 5

that performs a memory access.

SLIDE 48

32 (c) What is the minimum number of cycles from the occurrence of an exception to the time the first instruction of the exception handler is fetched?

One. Exceptions are handled in one state in this datapath (ref. Figure

5.39 and Figure 5.40). The next state is the fetch state.

SLIDE 49

33 (a) Show the hexadecimal value of the double precision normalized floating point repre- sentations of the following numbers. Use the IEEE 754 standard. 33 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inac- curacy in the following IEEE 754 computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the mag- nitude of the error if any. 33 (c) Using Booths algorithm and n bit multipliers and multiplicands, how many steps are required? NaN _________________________________

0.125 _0xbfc0000000000000______________

0x7ff0000000000000 (more than one answer:significand must be nonzero) (1.01 x 2125) + (1.001 x 2129) 0.000101 x 2129 1.001000 x 2129

1.001101 x 2129

Rounding up gives us 1.01 x 2129 There is an error and the magnitude of the error is 0.000011 x 2129 n steps

SLIDE 50

33 (d) Consider the operation 18 * 8 using version 3 of the multiplication algorithm in chapter 4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm. What are the initial and final contents of the multiplicand and product registers? Assume that the multiplier and multiplicand have the same number of bits. Note that I have left out the sign bits for these numbers. The solu- tion could also have been presented using 6 bits for the multipli- cand and 11 bits for the product (10 bit result and the sign bit).

Initial Final Multiplicand 10010 10010 Product 0000001000 0010010000

SLIDE 51

34 Consider a base, multi-cycle SPIM machine operating at 200 Mhz, with the following pro- gram execution characteristics. The following parts are independent of each other. 34 (a) The designers decided to add a new addressing mode wherein registers can be automat- ically incremented by 4. Therefore a sequence such as code block 1 below can be replaced by the instruction shown in code block 2. We have 30% of the lw instructions affected in this manner. Assuming that the cycle time does not change and the new instruction also takes 5 cycles, what is the speedup produced by the use of this new addressing mode?

Instruction class CPI Frequency R-Type + Immediate 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3%

lw $t0, 0($t1) addi $t1, $t1, 4 lw $t0, ($t1)+ becomes code block 1 code block 2 CPI_old = 0.454 + 0.255 + 0.204 + 0.073 + 0.033 = 4.15 Ex_old = #Instr 4.15 * clock_cycle_time With the optimization the number of instructions has been changed. If 30% of the lw instructions have been affected that means 0.3 * 0.25 = 0.075 or 7.5% of the total number

f instructions have been affected. For each of these instructions we can eliminate one

instruction by combining two instructions. The new total number of instructions is there- fore (0.925 * #Instr). The eliminated instructions are all immediate instructions. Therefore the percentage of such instructions are reduced by 7.5%. CPI_new = 0.3754 + 0.255 + 0.24 + 0.073 + 0.033)/0.925 = 4.17 Ex_new = (0.925 #Instr) * 4.17 * clock_cycle_time Comparing the execution times we can see that this is a good trade-off.

SLIDE 52

34 (b) New ALU implementation technology has made it possible to double the speed of the R-format and immediate instructions. What is overall the speedup that can be realized for programs with the instructions statistics shown in the table? 34 (c) What is the maximum MIPS rating of the multicycle datapath? From Amdahl’s Law we have Speedup = 1/(s + ((1-s)/n)) where s is the fraction that is unaffected, and n is the factor by which this component can be sped up. Plugging in the values from the table we have Speedup = 1/(0.55 + (0.45/2)) = 1.29 or 29% speedup. The fastest instruction takes 3 cycles, i.e., has a CPI of 3. This is the theoretically lowest CPI possible for any program (although it is unrealistic to have a program that only branches opr jumps!) MIPS Rating = (Clock Rate in MHZ)/CPI = 200/3 = 66.7 MIPS

SLIDE 53

35 We wish to make our datapath implement instructions such as those that are found in the Power PC. Specifically we are interested in adding the following instruction: lwrf $t0, $t1+$t2. As you may recall, this instruction uses the sum of the contents of registers $t1 and $t2 as addresses. Assume that this instruction uses R-format encodings where the shamt and func fields are ignored and an opcode 101000. Explicitly state any assumptions you need to answer the following questions. 35 (a) State any additions/modifications required to the hardware data path of Figure 5.39 or describe how the instruction can be implemented with no physical modifications. 35 (b) Show the modified state machine that will handle this instruction. Provide the values of relevant control signals in each additional state. Any control signals not specified will assumed to be X. No changes are required. Simply add rs and rt to obtain the memory address. ALUSrcA = 1 ALUSrcB = 00 ALUOp = 00 MemRead IorD = 1 RegDst = 1 RegWrite MemToReg = 1 from decode to fetch

SLIDE 54

35 (c) Show the additions to the microcode to implement this instruction. The existing micro- coded implementation of the state machine before addition of this instruction is shown

below. You need only show additions/modifications. Note that there is more than one

approach to solving this problem. If any changes are necessary to the microinstruction for- mat, state them explicitly.

label ALU control src1 src2 Register Control Memory PCwrite Control Sequencing fetch add PC 4 read PC ALU seq add PC extshft dispatch-1 Mem1 add A extend dispatch-2 LW2 read ALU seq write MDR fetch SW2 write ALU fetch Rformat1 Func A B seq write ALU fetch BEQ1 Subt A B ALUOut-Cond fetch JUMP1 jmp addr fetch LWRF add A B seq read ALU seq write MDR-rd fetch

Dispatch ROM 1

Op Name Value 000000 R-format 0110 000010 jmp 1001 000100 beq 1000 100011 lw 0010 101011 sw 0010 101000 lw+ 1010

Dispatch ROM 2

Op Name Value 100011 lw 0011 101011 sw 0101

SLIDE 55

36 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the manner in which the data path controller is modified to support exceptions. 36 (a) How are exceptions different from procedure calls? 36 (b) Consider an illegal instruction exception. Show the changes to the state machine for this exception. Make any assumptions that you feel may be necessary. For each additional state clearly show the values of relevant control signals. Any signals left unspecified will be assumed to be X (don’t care).

Procedure calls perform state saving operations and make use of the

stack Exceptions need not be concerned with state saving operations

Exceptions must encode the cause of the exception or use a vectored

implementation to decode the exception

Exception logic stores the address of the offending instruction rather

than the next instruction.

Procedure calls are normal changes in the flow of control rather than

unexpected changes.

Exceptions can lead to unexpected program termination.

IntCause = 0 CauseWrite AluSrcA = 0 AluSrcB = 01 AluOp = 01 EPCWrite PCwrite PCSource = 11 from decode to fetch code for illegal instruction is 10 The illegal instruction exception can be detected in the decode

state. A transition is made to the exception state where the code

is recorded and the state machine moves to the fetch state to start fetching the first instruction of the exception handler.

SLIDE 56

36 (c) What is the minimum number of cycles from the occurrence of an exception to the time the first instruction of the exception handler is fetched?

One. Exceptions are handled in one state in this datapath (ref. Figure

5.39 and Figure 5.40). The next state is the fetch state.

SLIDE 57

37 (a) Show the single precision normalized floating point representations of the following

numbers. Use the 754 standard and present your solution in hexadecimal notation.

37 (b) Assuming IEEE 754, what is the result of the following operations

1.1101 x 2102 X 1.01 x 2172 (note that the exponent values are represented in

base 10). 37 (c) Consider a double precision floating point number (IEEE 754). As it turns out, the range of numbers that it can represent is not sufficient for your application. Keeping the total number of bits constant, how would you change the format of the number? 37 (d) What is the smallest positive number you can represent in IEEE 754 double precision

format. Represent this number in scientific notation.

1.0 x 2-1022

0.0625 ___________0xbd800000______________________

0.0 _0x00000000___

1.0010001 x 2148 The number of bits in the exponent determines the range. This number should be increased at the expense of the number of bits in the significand.

SLIDE 58

38 How would you modify the single bit ALU to produce an overflow bit. Clearly state which

f the 32 single bit ALUs you would modify and modify the circuit below to show how you

would compute the overflow bit. 1 2 3 a b c_in

binvert result set

verflow

less + c_out Compute the EX-OR of the carry in to the ALU31 and the carry out from ALU 31. Use a single EX-OR gate.

SLIDE 59

39 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle 0. 39 (a) Fill in the following values at the end of the requested cycle. The first row corresponds to the WB cycle for the load instruction. The second row corre- sponds to the decode cycle. 39 (b) On what cycle is the branch instruction fetched for the last time?

165. The loop executes 8 times. Each immediate instuctions takes 4 cycles.

Cycle ALUSrcB ALUOp RegDst PCWrite MemToReg PCSource 12 00 00 1 00 18 11 00 00

add $t4, $0, $0 addi $t3, $0, 8 loop: lw $t0, 0($t1) add $t4, $t4, $t0 addiu $t1, $t1, 4 addi $t3, $t3, -1 bne $t3, $0, loop

SLIDE 60

40 (a) Show the hexadecimal value of the double precision normalized floating point repre- sentations of the following numbers. Use the IEEE 754 standard. 40 (b) Clearly state two advantages of using Booth’s algorithm for multiplication over the standard multiplication algorithm provided in the book. 40 (c) Assuming a total of 6 significant digits including the implicit bit, is there any inaccu- racy in the following computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any.

1.0 ___0xbff0000000000000_______________________________________

0.0625 _0x3fb0000000000000___________________________

1. Potentially reduces the number of addition
perations performed depending on the

value of the multiplier

2. Naturally handles two’s complement

representations performing signed multiplication (-1.11 x 2-2) + (1.01 x 23) The rounded value that is obtained in 1.0011 x 23. The difference between this value and the actual value (assuming a sufficient number of bits) is given by 0.0000001. The correct value is 1.0011001 x 23.

SLIDE 61

41 Consider a base, multi-cycle SPIM machine operating at 50 Mhz, with the following program execution characteristics. We can increase reduce the clock cycle time by 15% at the expense adding an additional cycle for data mem-

ry accesses.

41 (a) Is this a good tradeoff. Provide a clear and quantitative justification for your answer. CPIold = 0.44 + 0.255 + 0.254 + 0.073 + 0.033 = 4.15 Execution Timeold = I 4.15 * clock_cycle CPInew = 0.44 + 0.256 + 0.25 * 5 + 0.073 + 0.033 = 4.65 Execution Timenew = I * 4.65 * 0.85* clock_cycle time Execution Timenew < Execution Timeold 41 (b) What is the minimum improvement in clock cycle time that is necessary for this approach to be profit- able? x* 4.65 < 4.15 x < 4.15/4.65 = 0.892.

Instruction class CPI Frequency R-Type 4 40% lw 5 25% sw 4 25% beq 3 7% j 3 3%

SLIDE 62

42 Consider the multicycle datapath. 42 (a) What is the minimum number of cycles for the execution of any instruction? 42 (b) For a datapath with a N state controller, what is the size of the microcode store (in num- ber of microinstructions)? 42 (c) What is it that determines the duration of the clock cycle time? 3

2 N log The state (or cycle) in the datapath that has the longest execution path or delay.

SLIDE 63

43 (a) Show the hexadecimal value of the double precision normalized floating point repre- sentations of the following numbers. Use the IEEE 754 standard. 43 (b) Using Booths algorithm and n bit multipliers and multiplicands, what is the range of numbers that can be accurately multiplied? 43 (c) Assuming a total of 6 significant digits including the implicit bit, is there any inaccu- racy in the following computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the magnitude of the error if any.

1.5 0xbff800000000000

0.03125 0x3fa0000000000000

2n-1 to 2n-1-1

(-1.1 x 2-1) + (1.001 x 24) Correct answer: 1.000101 x 24 Rounded answer: 1.00011 x 24 Error: 0.000001

SLIDE 64

44 (20 pts) Consider a base, multi-cycle SPIM machine operating at 50 Mhz, with the follow- ing program execution characteristics. 44 (a) With clever algorithms for register allocation we can reduce the number of lw and sw instructions by 10%. What is the improvement in execution time? 44 (b) Would a 10% reduction in clock cycle time justify a single cycle increase in the number

f cycles for memory accesses?

Instruction class CPI Frequency R-Type 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3%

CPIold = 40.45 + 50.25 + 40.2 + 30.07 + 30.03 = 4.15 CPInew = (0.454 + 50.250.9 + 40.20.9 + 30.07 + 30.03)/I” = 3.945/I” I” = 0.45 + 0.250.9 + 0.20.9 + 0.07 + 0.03 = 0.955 CPInew = 3.945/0.955 = 4.13 Execution Time Old = I * 4.15 * clock_cycle_time Execution Time New = 0.955* I * 4.13 * clock_cycle_time Compare! CPInew = (0.454 + 60.25 + 50.2 + 30.07 + 30.03) = 4.6 Execution Time New = I 4.6 * 0.9 * clock_cycle_time = I * 4.14 * clock_cycle_time Barely better

SLIDE 65

45 (a) Show the single precision normalized floating point representations of the following

numbers. Use the 754 standard and present your solution in hexadecimal notation.

45 (b) Assuming IEEE 754, what is the result of the following operations

1.1101 x 282 X 1.01 x 2122 (note that the exponent values are represented in

base 10).

1.001 x 294 + 1.1 x 297 (note that the exponent values are represented in base

10). 0.375 _0x3ec00000____________________________________________

1.5 ___0xbfc00000_____________________________________________

1.0010001 x 278 (do not forget to substract the extra 127 bias!) 1.101001 x 297

SLIDE 66

46 Consider a base, multi-cycle SPIM machine operating at 100 Mhz, with the following pro- gram execution characteristics. Compiler optimizations reduce the number of instructions in each class to the amount shown. 46 (a) What is the base CPI with and without compiler optimizations? 46 (b) What is the speedup due to compiler optimizations?

Instruction class CPI Frequency Compiler Opt R-Type 4 45% 95% lw 5 30% 90% sw 4 20% 95% beq 3 5% 75%

Without Opt = 0.454 + 0.35 + 0.24 + 0.053 = 4.25 With Optimization = Cycles/#instr

=(0.4540.95+0.350.9+0.240.95+0.0530.75)/(0.450.95+0.30.9+0.20.95+0.050.75) = CPInew

Speedup = Told/Tnew

= (I * 4.25 * cycle_time)/(I” * CPInew* cycle_time)

where I” =

(0.450.95+0.30.9+0.20.95+0.050.75) ( from above: this is the new number

instructions)

SLIDE 67

47 Consider the annotated multicycle datapath shown in your text. Fill in the fill in the follow- ing table for the sequence of instructions shown below. Each entry should contain the value

f the requested signal at the end of the requested cycle. The initial contents of registers $1,

$2 and $3 are 0x0000088, 0x0000022 and 0x00000033 respectively. The address of Loop is 0x00400004.

PC Instruction Cycle ALUOutput Value at the Output of the ALU Rt ALUOp 0x0040000c beq $1, $2, Loop 3 0x00400004 0x00000066 0x00000022 01 0x00400010 sw $1, 4($2) 4 0x00000026 0x00000026 0x00000088 00

SLIDE 68

48 (a) Show the double precision normalized floating point representations of the following

numbers. Use the 754 standard and present your solution in hexadecimal notation.

48 (b) Assuming IEEE 754 rules and conventions, what is the result of the following opera-

tions. Assume 5 significant digits including the implicit bit.
1.11 x 212 x 1.001 x 239.
1.0001 x 2133 + 1.0101 x 2139

48 (c) Provide an example of a number that is too small to be represented with single preci- sion notation but can be represented with double precision.

0.1875 _____oxbfc0000000000000___________________________

0.0 ___________________________________________________________

212 x 239. = 251-127 = 2-76 Remember that the exponent is biased and therefore we have to substract 127 from the sum of the exponents. The fact that the result is negative indicates that the result is underflow. 1.0001 x 2133 + 1.0101 x 2139 = 0.0000010001 x 2139 + 1.0101 x 2139 = 1.0101 x 2139 (the result is rounded down according to the rules for 754). 1.0 x 2-130 This number is just smaller than the smallest number that can be represented in single preceision numbers.

SLIDE 69

49 For the following questions, consider the single cycle datapath shown in the text. The value of loop is 0x00400004, the contents of register $t1, $t2, and $t3 are 0x01000033, 0x00004008, and 0x000000c0, respectively, and the contents of register $1 is 0x10010008 49 (a) Fill in the values of control signals in the following table for each instruction. 49 (b) You are designer and wish to reduce the number of control signals that the controller has to generate. Provide one hardware modification that will eliminate the RegDst control signal (but not the mux). Instruction ALUSrc ALUOp RegDst MemToReg Write Data Input to Data Memory lw $t3, 4($t2) 1 00 1 0x000000c0 Note that the RegsDst signal is the complement of the MemToReg signal. This the RegDst mux can be controlled by the MemToReg signal passed through an inverter.

SLIDE 70

50 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the fetch cycle for the first instruction is cycle number 0. Assume that la, li and addi require the same number of states as an add instruction. 50 (a) What would be the CPI and MIPS rating for a 266 MHz machine for the program shown above? 50 (b) Fill in the following values during the requested cycle (remember the first cycle is cycle number 0!). Cycle # ALUSrcB IorD PCWrite RegDst MemToReg PCSource 17 01 1 X X 00 40 XX X 1 XX .data L1: .word 2,3,4,5 L2: .space 16 .text la $t8, L1 la $t9, L2 li $t7, 4 move: lw $t2, 0($t8) sw $t2, 0($t9) addi $t8, $t8, 4 addi $t9, $t9, 4 addi $t7, $t7, -1 end: bne $t7, $0, move The loop executes four times. Each time through the loop the instructions take a total

f 24 cycles. The first three instructions take 12 cycles. The total number of cycles is
108. The number of instructions in the loop are 6. Therefore the total number of

instructions executed is 6*4+3 = 27. The CPI = 108/27 = 4.0. MIPS rating = 266/CPI The first row corresponds to the fetch cycle for the sw instruction. The second row cor- responds to the write back cycle for the lw instuction executed for the second time through the loop.

SLIDE 71

50 (c) On the figure of the multicycle datapath attached overleaf, show any modifications that would be required to add the jr instruction. Below show the modifications to the state machine how any additional states would fit in the state machine. For each additional state that you have added, show the values of the relevant control signals. Any signals that you

mit from the description of a state are assumed to be deasserted.

According to the format definition for the jr instruction, the rs field identifies the register. This value will be available in the A register at the end of the decode cycle. Add another sig- nal from the A register to the PCSource mux. The value of PCSource of 11 will now select the contents of the A register to be written to the PC. This writeback can be handled in

ne extra state after decode with PCWrite = 1 and PCSource = 11.

IF ID PCWrite = 1 PCSource = 11 jr

SLIDE 72

51 Consider the execution of the following block of SPIM code on a multicycle datapath. The text segment starts at 0x00400000 and that the data segment starts at 0x10010000. Assume immediate instructions take 4 cycles. 51 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

f program execution is cycle number 0!

51 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents.

Cycle ALUSrcB MemToReg RegWrite Branch Address RegDst 12 01 X 0x00400030 X 28 00 X 0x0041e05c X

.data start: .word 21, 22, 0x44, 0x12 str: .asciiz “Done ” .align 4 .byte 0x88, 0x32 .text .globl main main: li $t3, 2 lui $t0, 0x1001 lui $t1, 0x1001 addi $t1, $t1, 8 move: lw $t5, 0($t0) lw $t6, 0($t1) add $t7, $t5, $t6 addi $t0, $t0, 4 addi $t1, $t1, 4 addi $t3, $t3, -1 end: bne $t3, $zero, move done:

Data Segment Addresses Data Segment Contents 0x10010000 0x00000015 0x10010004 0x00000016 0x10010008 0x00000044 0x1001000c 0x00000012 0x10010010 0x656e6f44 0x10010014 0x00000020 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00003288 0x10010024

SLIDE 73

51 (c) Show the SPIM program to implement a for-loop that executes 12 times. 51 (d) What are the values of move, str and end? 51 (e) What does the following bit pattern represent assuming that it is one of the following. 00111100 00001001 01000000 00000000. Provide your answer in the form indicated. . li $t1, 12 loop: # loop body # loop body .. .. addi $t1, $t1, -1 bne $t1, $zero, loop move end str 0x00400010 0x00400028 0x10010010 A single precision IEEE 754 floating point number A MIPS instruction (show in symbolic code) lui $t1, 0x4000 ( 1.000100101 x 2-7_________ show actual value in scientific notation)

SLIDE 74

52 (a) Show the hexadecimal value of the single precision normalized floating point repre- sentations of the following numbers. Use the IEEE 754 standard. 52 (b) Assuming a total of 4 significant digits including in the implicit bit, is there any inac- curacy in the following IEEE 754 computation assuming the use of the guard and round bit? Justify your answer by showing how this value is calculated, and identifying the mag- nitude of the error if any. 52 (c) Consider the operation 17 * 4 using version 3 of the multiplication algorithm in chapter 4 (not Booth’s algorithm). Recall that this is the most efficient version of this algorithm. What are the initial and final contents of the multiplicand and product registers? Assume that the multiplier and multiplicand have the same number of bits. Use the minimum num- ber of bits as determined by the above operands. Infinity a denormalized number 0x00011000_______ 0x7f800000 (1.11 x 2114) X (1.01 x 2119) The product is 1.00011 x 2107 With only 4 significant digits and the guard and round bit being 11, we round up to give 1.001 x 2107 The maginitude of the error is 0.00001 x 2107

Initial Final Multiplicand 010001 010001 Product 000000000100 000001000100

SLIDE 75

52 (d) Compare the critical path delays for carry lookahead logic and ripple carry logic for a 32 bit ALU. Ripple carry delay through single bit ALU is 2 gate delays. Delay through a single node in the carry lookahead tree is also 2 gate delays. Ripple carry = 32 * 2 = 64 gate delays. Carry lookahead = 1 (for generate and propagate signal generation) + 5 2 (up the tree) + 4 2 (down the tree) + 2 (through the last ALU) = 21 gate delays

SLIDE 76

53 Consider a base, multi-cycle SPIM machine operating at 300 Mhz, with the following pro- gram execution characteristics. Questions 48(a), 48(b), and 48(c) are independent of each other. 53 (a) The designers propose adding a new branch instruction, decrement-and-branch-if-zero: dcb $t0, loop. This instructions decrements the register $t0. If the resulting value is equal to 0 then the program branches to the instruction at label loop. Therefore a sequence of instructions such as code block 1 below can be replaced by the instruction shown in code block 2. We have 35% of the branch instructions affected in this manner. Assuming that the cycle time does not change and the new instruction takes 4 cycles, is this a good idea?

Instruction class CPI Frequency R-Type or Immediate 4 40% lw 5 25% sw 4 20% bne 3 10% j 3 5%

addi $t2, $t1, -1 bne $t2, $0, loop dcb $t2, loop becomes code block 1 code block 2 CPI_old = 0.44 + 0.255 + 0.204 + 0.103 + 0.053 = 4.10 Ex_old = #Instr 4.1 * clock_cycle_time With the optimization the number of instructions has been changed. If 35% of the branch instructions have been affected that means 0.35 * 0.1 = 0.035 or 3.5% of the total number of instructions have been affected. For each of these instructions we can eliminate one instruction by combining two instructions. The new total number of instructions is therefore (0.965 * #Instr). The eliminated instructions are all immediate

instructions. Therefore the percentage of such instructions are reduced by 3.5%. How-

ever they are replaced by instructions taking 4 cycles. Therefore the net effect is the reduction of the number of branch instructions by 35% (or to 65% of the original). CPI_new = 0.3654 + 0.255 + 0.24 + 0.130.65 + 0.053 + 40.035)/0.965 = 4.1 Ex_new = (0.965 #Instr) * 4.1* clock_cycle_time Comparing the execution times we can see that this is a good trade-off.

SLIDE 77

53 (b) If improved register allocation techniques in the compiler can reduce the number of lw and sw instructions by 10% what is the improvement in the MIPS rating? 53 (c) Suppose in a typical program executable, 20% of the instructions is from system librar- ies and implementing operating system related functionality.What is the maximum speedup that can be obtained by speeding up the application code sequences? Current MIPS rating is 300/4.1 = 73.17 MIPS With the reduced number of instructions the new CPI is given by (0.44 + 0.2255 + 0.184 + 0.13 + 0.053)/I’ where I’ is given by (0.4+ 0.90.25 + 0.9*0.2 + 0.1 + 0.05) The new MIPS rating is 300/CPInew The improvement is given by their differences. This goverened by Amdahl’s Law. We the maximum speedup as 1/0.2 = 5.

SLIDE 78

54 We wish to have our multicycle datapath implement the slti $t0, $t1, 0x4000 instruction. Explicitly state any assumptions you need to answer the following questions. 54 (a) Either state any additions/modifications required to the hardware data path of Figure 5.33, or describe how the instruction can be implemented with no physical modifications. 54 (b) Show the modified state machine that will handle this instruction. Provide the values of relevant control signals in each additional state. Any control signals not specified will assumed to be X. We can simply have the ALU perform a slt than operation on the inputs and select the ALU inputs accordingly. The key problem is getting the ALU controller to issue a slt

pcode to the ALU (111). This is the main hardware modification that is required.

Suppose we use the unused ALUOp value of 11 to signifiy this. Note that the truth table for the ALU controller must be reimplemented since we do not have don’t cares in one of the ALUOp bit fields any more. ALUSrcA = 1 ALUSrcB = 10 ALUOp = 11 from decode RegDst = 0 RegWrite MemToReg = 0 to fetch

SLIDE 79

54 (c) Show the additions to the microcode to implement this instruction. The existing micro- coded implementation of the state machine before addition of this instruction is shown

below. You need only show additions/modifications. Note that there is more than one

approach to solving this problem. Some solutions may require additions to the microin- struction format in the text. If so state them explicitly.

label ALU control src1 src2 Register Control Memory PCwrite Control Sequencing fetch add PC 4 read PC ALU seq add PC extshft dispatch-1 Mem1 add A extend dispatch-2 LW2 read ALU seq write MDR fetch SW2 write ALU fetch Rformat1 Func A B seq write ALU fetch BEQ1 Subt A B ALUOut-Cond fetch JUMP1 jmp addr fetch SLTI SLTOP A Extend seq Write ALU-rt fetch

Dispatch ROM 1

Op Name Value 000000 R-format 0110 000010 jmp 1001 000100 beq 1000 100011 lw 0010 101011 sw 0010 001010 slti 1010

Dispatch ROM 2

Op Name Value 100011 lw 0011 101011 sw 0101

SLIDE 80

55 Consider the manner in which exceptions are accommodated in the SPIM datapath, and the manner in which the data path controller is modified to support exceptions. You certainly do not want your programs to jump to addresses in the data segment, or for that matter let us say to any address greater than 0x10010000. If j instructions generate such an illegal address you want to generate an exception.This questions addresses the implementation of such an exception in the SPIM datapath. 55 (a) How can you test for the occurrence of this exception? 55 (b) Describe the hardware modifications required to control to support such an exception. The occurrence of the exception is detected by using an ALU to compare the jump address with 0x10010000 and generating a sin- gle bit execption single to denote the result of this comparison. From state 1001 we now make a transition to the exception state. To make this transi- tion we must modify the next state logic of the state machine implementation. Cur- rentlky the next state after state 1001 is state 0000. This must now be modified to move to either state 0000 (if no exception) or state 1010 (exception state) of an exception

ccured. You can use a table with these two alternatives just as we did for the next state

logic out of state 0010. The input to the table is a single bit signal denoting whether the exception occured or not (from 5(a)).

SLIDE 81

55 (c) Show the required modifications to the state machine to support such an exception.All control signals left unspecified are assumed to be X. \ IntCause = 10 CauseWrite AluSrcA = 0 AluSrcB = 01 AluOp = 01 EPCWrite PCwrite PCSource = 11 from jump to fetch code for illegal address is 10

encoded into one instruction. As long as your assumptions are consistent you shouldnot lose points. Everything else is 0x00000000 upto and beyond 0x10010100 (this is the point of the .align 8 directive)

Data Segment Addresses Data Segment Contents 0x10010000 0x00000032 0x10010004 0x0000002c 0x10010008 0x00400010 0x1001000c 0x00000033 0x10010010 0x00706f54 0x10010014 0x00000000 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00000000 0x10010024 0x00000000

1 (c) Provide a set of data directives that will perform the memory allocation implied by the following high level language declarations.

integer Limit;

ter $31.

4 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding con- tents.

Cycle ALUSrcA ALUOp RegWrite Instruction Register IorD 14 1 00 0x8d0d0000 X 35 1 01 0x1560fffb X

.data start: .word 21, 22, 23,24 str: .asciiz “CmpE” .align 4 .word 24, 0x77 .text .globl main main: li $t3, 4 lui $t0, 0x1001 lui $t1, 0x1002 move: lw $t5, 0($t0) sw $t5, 0($t1) addiu $t0, $t0, 4 addiu $t1, $t1, 4 addi $t3, $t3, -1 end: bne $t3, $zero, move done:

Data Segment Addresses Data Segment Contents 0x10010000 0x00000015 0x10010004 0x00000016 0x10010008 0x00000017 0x1001000c 0x00000018 0x10010010 0x45706d43 0x10010014 0x00000000 0x10010018 0x00000000 0x1001001c 0x00000000 0x10010020 0x00000018 0x10010024 0x00000077

4 (c) Show a sequence of SPIM instructions that will branch to the label loop if the contents

5 (b) Show a sequence of SPIM instructions to implement the following: x = a[i*4], where a[i] is the access of element i from array a[]. Use whatever register allocation you need.

Cycle ALUSrcB ALUOp RegDst Instruction Register PCWriteCond 9 11 00 X 0x3c091002 20 00 00 0xad2d0000

9 (20 pts) This question deals with procedure calls. Consider the following SPIM program

11 (a) Provide the MIPS code to do the following.

From text

12 Answer the following questions with respect to the MIPS program shown below. Assume that the data segment starts at 0x10010000 and the text segment starts at 0x00400000. 12 (a) What function does the last two instructions in the main program realize? 12 (b) With respect to the above program,

call other procedures. Therefore C does not need to save any of the registers.

the procedure. This is clearly less efficient, e.g., look at proce- dure C.

14 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the text segment starts at 0x00400000 and that the data segment starts at

0x10020020 respectively. 14 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

Cycle ALUSelA ALUOp PCWrite IR (in HEX!) ALUOutput 8 1 00 0xad2c0000 0x10020020 20 1 00 0x20e7ffff 0xf

.data str: .asciiz “Start” .align 2 .word 24, 0x6666 .text move: lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 end: bne $7, $0, move

Data Segment Addresses Data Segment Contents 0x10010000 0x72617453 0x10010004 0x00000074 0x10010008 0x00000018 0x1001000c 0x00006666 lui $1, 0x8000 addi $1, $1, 0x0020 jr $1

15 Consider the execution of the following block of SPIM code on a multicycle datapath. Assume that the text segment starts at 0x00400000 and that the data segment starts at

0x10020020 respectively. 15 (a) Fill in the following values at the end of the requested cycle. Assume that the first cycle

15 (b) Show the addresses and corresponding contents of the locations in the data segment that are loaded by the above SPIM code. Clearly associate addresses and corresponding contents. 15 (c) What is the difference between signed and unsigned instructions and what is the moti-

Cycle ALUSelB ALUOp RegDst IR (in HEX!) MemToReg 7 11 00 X 0xad2c0000 X 17 10 00 0x25290004

.data str: .asciiz “Exams” .align 4 .word 24, 0x6666 .text move: lw $12, 0($8) sw $12, 0($9) addiu $8, $8, 4 addiu $9, $9, 4 addi $7, $7, -1 end: bne $7, $0, move

Data Segment Addresses Data Segment Contents 0x10010000 0x6d617845 0x10010004 0x00000073 0x10010008 0x00000000 0x1001000c 0x00000000 0x10010010 0x00000018 0x10010014 0x00006666

16 (d) The above program does not implement any register saving conventions on a proce- dure call. Assume we will use the MIPS register saving convention i.e., Caller/Callee

frame on the stack when the procedure func is executing? 16 (e) Identify the local and global labels in the program? $fp, $22, since no other procedure is called $ra is not stored main is a global label all other labels are local labels

18 Consider the following isolated block of code. Note that some initialization code is miss- ing. 18 (a) If the text segment starts at 0x04000000, what is the value of the label end?. 18 (b) Let is assume we wish to implement the above function as a procedure.

this procedure from a calling program?

start: addi $2, $0, 85 addi $3, $0, 1 jal mystery end: j exit mystery: add $4, $0, $0 loop: andi $5, $2, 1 beq $5, $0, skip add $4, $4, $3 skip: srl $2, $2, 1 bne $2, $0, loop exit: jr $31 li $v0, 10 syscall

loop ________0x00400014________ skip _________0x00400020________ addi $3, $0, 1 __0x20030001______________ beq $5, $2, skip __0x10a00001______________ 0x0c100004

19 (d) Which of the following are valid MIPS instructions? If they are invalid you must state the correct form or clearly state why it is incorrect.

19 (e) Suppose I want to store the following information in the data segment in the order

each reserved word and the byte array should be labeled so that programs may reference them easily.

19 (f) (2 pts). In the preceding question (1(f)) what assembler directive would cause the byte array to start on a 64 byte boundary? 19 (g) What does the program in question 1 do? You have to specific and state the function or

.align 6 this moves allocation to the next 26 byte boundary Counts the number of bit set to 1 in $2.

Instruction ALUSrc ALUOp RegDst MemToReg Write Data Input to Data Memory j loop X XX X X contents of $16 sw $t2, 8($at) 1 00 X X 0x000000c0 sw, beq, and j. This follows from the truth table for the

values?

now a 2 bit control signal

for writing into the register file (into $31). The MemToReg signal is now a 2 bit control signal

This mux is is expanded to a 2 bit signal.

Instruction ALUOutput Value at the Write Data Input of Data Memory RegDst ALUOp ALUSrc MemTo Reg beq $3, $2, end 0x0000003c 0x100100444 X 01 X lw $22, 0($2) 0x10010440 0x00000000 00 1 1

func: add $2, $4, $0 add $3, $5, $0 add $3, $3, $2 add $9, $0, $0 add $22, $0, $0 loop: lw $22, 0($2) add $9, $9, $22 addi $2, $2, 4 beq $3, $2, end j loop end: move $2, $9 jr $31 Invert the RegDst signal and use it as MemToReg

____15_______(the end of the instuction execution not the EX cycle)

24 (b) Provide an example of a number that is too large to be represented in single precision notation but can be represented in double precision. Infinity

0x7f800000 0xc0900000

1.001 x 2131 + 0.0000111001 x 2131 = 1.0010111001 x 2131 1.0 x 2277

26 (a) Show the single precision normalized floating point representations of the following

26 (b) How would you interpret the following hexadecimal number assuming that it is a sin- gle precision IEEE 754 floating point number: 0x00140000? 26 (c) Assuming IEEE 754 rules and conventions, what is the result of the following operation in hexadecimal notation.

base 10).

0.0 ___________________________________________________________

10.03125 ___________________________________________________________ 0xbe000000 0x41208000 It is a denormalized number since the exponent is 0, but the significand is non-zero. 1.000001 x 2138

With 6 significant digits we have the guard, round and sticky bits as 011 With 4 significant digits we have 001. Note the sticky bit is set if here are non-zero bits to the right of the guard and round bits.

Instruction ALUSrc ALUOp RegDst MemToReg Write Data Input to Data Memory jal loop x x 10 10 contents of $16 bne $t1, $t2, loop 01 X X 0x000000c0

now a 2 bit control signal

for writing into the register file (into $31). The MemToReg signal is now a 2 bit control signal Use the complement of RegDst as the value of this signal. This is not the

none are required. We have the connections necessary to add an immediate operand to the contents of a register. The issue is simply

RegDst AluSrc MemToReg RegWrite MemRead MemWrite Br AluOp 0 1 0 1 0 0 0 00 The values of AluSrc and RegWrite become modified to include another OR term. Note the change in these two columns on the truth table. Also we have an additional gate on the input to recognize the addi instruction.

(1.01 x 2125) X (1.001 x 2129) 1.001000 x 1.01

Initial Final multiplicand 000100 000100 Product 000000111010 000011101000

Instruction class CPI Frequency R-Type + Immediate 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3%

31 (c) Show the additions to the microcode to implement this instruction. The existing micro- coded implementation of the state machine before addition of this instruction is shown

approach to solving this problem. Soem solutions may require additions to the microin- struction format in your text. If so state them explicitly.

Dispatch ROM 1

Op Name Value 000000 R-format 0110 000010 jmp 1001 000100 beq 1000 100011 lw 0010 101011 sw 0010 101000 icb 1010

Dispatch ROM 2

Op Name Value 100011 lw 0011 101011 sw 0101

that performs a memory access.

32 (c) What is the minimum number of cycles from the occurrence of an exception to the time the first instruction of the exception handler is fetched?

5.39 and Figure 5.40). The next state is the fetch state.

0x7ff0000000000000 (more than one answer:significand must be nonzero) (1.01 x 2125) + (1.001 x 2129) 0.000101 x 2129 1.001000 x 2129

Rounding up gives us 1.01 x 2129 There is an error and the magnitude of the error is 0.000011 x 2129 n steps

Initial Final Multiplicand 10010 10010 Product 0000001000 0010010000

Instruction class CPI Frequency R-Type + Immediate 4 45% lw 5 25% sw 4 20% beq 3 7% j 3 3%

35 (c) Show the additions to the microcode to implement this instruction. The existing micro- coded implementation of the state machine before addition of this instruction is shown

approach to solving this problem. If any changes are necessary to the microinstruction for- mat, state them explicitly.

loop ____0x00400014 skip _0x00400020 addi $3, $0, 1 0x20030001 beq $5, $2, skip 0x10a00001____ 0x0c100004

15___(the end of the instuction execution not the EX cycle)