HARDWARE SPECULATION Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

hardware speculation
SMART_READER_LITE
LIVE PREVIEW

HARDWARE SPECULATION Mahdi Nazm Bojnordi Assistant Professor - - PowerPoint PPT Presentation

HARDWARE SPECULATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 3 is due tonight (11:59PM) Midterm exam: Oct. 14 th (right after


slide-1
SLIDE 1

HARDWARE SPECULATION

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor School of Computing University of Utah

slide-2
SLIDE 2

Overview

¨ Announcement

¤ Homework 3 is due tonight (11:59PM) ¤ Midterm exam: Oct. 14th (right after Fall break)

¨ This lecture

¤ Out-of-order pipeline

n Issue queue n Register renaming n Branch recovery n Speculated execution

slide-3
SLIDE 3

Recall: Out-of-Order Execution

¨ Producer-consumer chains on the fly

¤ Register renaming: remove anti-/output-dependences

via register tags

¤ Limited by the number of instructions in the instruction

window (ROB)

¨ Out-of-order issue (dispatch)

¤ Broadcast tags to waiting instructions ¤ Wake up ready instructions and select among them

¨ Out-of-order execute/complete ¨ In-order fetch/decode and commit

slide-4
SLIDE 4

Out-of-Order Pipelines

Inst. Memory Inst. Decoder FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename Re-Order Buffer (ROB) … Branch Data Memory

  • Res. Station
  • Res. Station
  • Res. Station
  • Res. Station

¨ Distributed reservation stations

¤ In-order issue/dispatch

slide-5
SLIDE 5

Out-of-Order Pipelines

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename Re-Order Buffer (ROB) … Branch Data Memory

¨ Out of order issue/dispatch to functional units

¤ Out-of-order issue/dispatch

slide-6
SLIDE 6

Out-of-Order Issue Queue

¨ Two step wakeup and select logic

Rdy-1 Reg-1 Reg-2 Rdy-2 = =

broadcast tag

slide-7
SLIDE 7

Out-of-Order Issue Queue

¨ Two step wakeup and select logic

Rdy-1 Reg-1 Reg-2 Rdy-2 = =

broadcast tag

… Rdy-1 Reg-1 Reg-2 Rdy-2 = =

slide-8
SLIDE 8

Out-of-Order Issue Queue

¨ Two step wakeup and select logic

Selection Logic

Selected Instructions

Rdy-1 Reg-1 Reg-2 Rdy-2 = =

broadcast tag

… Rdy-1 Reg-1 Reg-2 Rdy-2 = =

slide-9
SLIDE 9

Register Renaming

¨ Register aliasing table for fast lookup

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename … Branch Data Memory Register Aliasing Table (RAT) Re-Order Buffer (ROB)

slide-10
SLIDE 10

Register Renaming

¨ Register aliasing table for fast lookup

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename … Branch Data Memory Register Aliasing Table (RAT)

Where to write?

Re-Order Buffer (ROB)

slide-11
SLIDE 11

Register Renaming

¨ Register aliasing table for fast lookup

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename … Branch Data Memory Register Aliasing Table (RAT)

How to add entries to RAT? Where to write?

Re-Order Buffer (ROB)

slide-12
SLIDE 12

Register Renaming

¨ Register aliasing table for fast lookup

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename … Branch Data Memory Register Aliasing Table (RAT)

How to add entries to RAT? Where to write? Search the table for an unallocated tag!

Re-Order Buffer (ROB)

slide-13
SLIDE 13

Register Renaming

¨ Free register list for fast register renaming

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename Re-Order Buffer (ROB) … Branch Data Memory Register Aliasing Table (RAT) Free Register List

slide-14
SLIDE 14

Register Renaming

¨ Free register list for fast register renaming

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename Re-Order Buffer (ROB) … Branch Data Memory Register Aliasing Table (RAT) Free Register List

Proceed only if there is free space in IQ, ROB, and Free List

slide-15
SLIDE 15

Register Renaming

¨ Free register list for fast register renaming

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename Re-Order Buffer (ROB) … Branch Data Memory Register Aliasing Table (RAT) Free Register List

Proceed only if there is free space in IQ, ROB, and Free List When is it safe to free a tag?

slide-16
SLIDE 16

Example: Instruction Commit

¨ Update value in ROB

R1: ?? R2: V2 R3: V3 … R1->T0 R10->T1 T2, T3, T4, … +, R1 ld,R10 … T0=V2+V3 T1=Mem[10]

T0 T1 T2 T3 T4 Head

slide-17
SLIDE 17

Example: Instruction Commit

¨ Update value in ROB

R1: ?? R2: V2 R3: V3 … R1->T0 R10->T1 T2, T3, T4, … +, R1 ld,R10 … T0=V2+V3 T1=Mem[10]

T0 T1 T2 T3 T4 T0,V1

slide-18
SLIDE 18

Example: Instruction Commit

¨ Register file write

R1: ?? R2: V2 R3: V3 … R1->T0 R10->T1 T2, T3, T4, … +, R1 ld,R10 … ALU T1=Mem[10]

T0 T1 T2 T3 T4 R1,V1

slide-19
SLIDE 19

Example: Instruction Commit

¨ Update tables and ROB

R1: V1 R2: V2 R3: V3 … R10->T1 T2, T3, T4, …, T0 ld,R10 … ALU T1=Mem[10]

T0 T1 T2 T3 T4

slide-20
SLIDE 20

Example: Instruction Rename

¨ Allocate entries on ROB, IQ, and FL

R2=R1+R2 R3=R1+R2 R5=R1+R9 R1: V1 R2: V2 R3: V3 … R10->T1 T2, T3, T4, …, T0 ld,R10 … ALU T1=Mem[10]

T0 T1 T2 T3 T4

slide-21
SLIDE 21

Example: Instruction Rename

¨ Allocate entries on ROB, IQ, and FL

R3=R1+R2 R5=R1+R9 R1: V1 R2: ?? R3: V3 … R10->T1 R2->T2 T3, T4, …, T0 ld,R10 +,R2 … ALU T1=Mem[10]

T0 T1 T2 T3 T4

T2=V1+V2

slide-22
SLIDE 22

Example: Instruction Rename

¨ Allocate entries on ROB, IQ, and FL

R5=R1+R9 R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 T4, …, T0 ld,R10 +,R2 +,R3 … ALU T1=Mem[10]

T0 T1 T2 T3 T4

T2=V1+V2 T3=V1+T2

slide-23
SLIDE 23

Example: Instruction Rename

¨ Instruction has to wait for free resources

R5=R1+R9 R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 T4, …, T0 ld,R10 +,R2 +,R3 … ALU T1=Mem[10]

T0 T1 T2 T3 T4

T2=V1+V2 T3=V1+T2

slide-24
SLIDE 24

Example: Instruction Issue

¨ Issue ready instruction if free FU exists

R5=R1+R9 R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 T4, …, T0 ld,R10 +,R2 +,R3 … T2=V1+V2 T1=Mem[10]

T0 T1 T2 T3 T4

T3=V1+T2

slide-25
SLIDE 25

Example: Instruction Issue

¨ Out-of-order issue is now possible

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T2=V1+V2 T1=Mem[10]

T0 T1 T2 T3 T4

T4=V1+V9 T3=V1+T2

slide-26
SLIDE 26

Example: Instruction Issue

¨ Out-of-order issue is now possible

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T2=V1+V2 T1=Mem[10]

T0 T1 T2 T3 T4

T4=V1+V9 T3=V1+T2

slide-27
SLIDE 27

Example: Instruction Issue

¨ Out-of-order issue is now possible

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T2=V1+V2 T1=Mem[10]

T0 T1 T2 T3 T4

T4=V1+V9 T3=V1+T2

T2,V2

slide-28
SLIDE 28

Example: Instruction Issue

¨ Wakeup and select

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 ALU T1=Mem[10]

T0 T1 T2 T3 T4

T4=V1+V9 T3=V1+V2

slide-29
SLIDE 29

Example: Instruction Issue

¨ Keep the program order to avoid starvation

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T3=V1+V2 T1=Mem[10]

T0 T1 T2 T3 T4

T4=V1+V9

slide-30
SLIDE 30

Example: Instruction Issue

¨ Keep the program order to avoid starvation

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T3=V1+V2 T1=Mem[10]

T0 T1 T2 T3 T4

T4=V1+V9

T3,V3

slide-31
SLIDE 31

Example: Instruction Execute

¨ Issue ready instructions

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T4=V1+V9 T1=Mem[10]

T0 T1 T2 T3 T4

slide-32
SLIDE 32

Example: Instruction Execute

¨ Issue ready instructions

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 T4=V1+V9 T1=Mem[10]

T0 T1 T2 T3 T4 T4,V4

slide-33
SLIDE 33

Example: Instruction Execute

¨ Update ROB

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 ALU T1=Mem[10]

T0 T1 T2 T3 T4

slide-34
SLIDE 34

Example: Instruction Execute

¨ Update ROB

R1: V1 R2: ?? R3: ?? … R10->T1 R2->T2 R3->T3 R5->T4 T5, …, T0 ld,R10 +,R2 +,R3 … +,R5 ALU T1=Mem[10]

T0 T1 T2 T3 T4 T1,V10

slide-35
SLIDE 35

Register Renaming Example

¨ Where values are stored?

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

tags R1 R2 R3

Reorder Buffer Register File …

R1 R2 R3

Issue Queue

ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

slide-36
SLIDE 36

Register Renaming Example

¨ Where values are stored?

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

tags R1 R2 R3

Reorder Buffer

T0

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2

R1 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

slide-37
SLIDE 37

Register Renaming Example

¨ Where values are stored?

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

tags R1 R2 R3

Reorder Buffer

T0 T1

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 - R3

R1 R2 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

slide-38
SLIDE 38

Register Renaming Example

¨ Where values are stored?

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

tags R1 R2 R3

Reorder Buffer

T0 T1

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 - R3 BEQ T1, R0

R1 R2

  • ROB 1 T0

ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

slide-39
SLIDE 39

Register Renaming Example

¨ Where values are stored?

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

tags R1 R2 R3

Reorder Buffer

T0 T1 T3

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 - R3 BEQ T1, R0 T3ßT0 ^ T1

R1 R2

  • R3

ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

slide-40
SLIDE 40

Register Renaming Example

¨ Where values are stored?

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

tags R1 R2 R3

Reorder Buffer

T1 T3

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 - R3 BEQ T1, R0 T3ßT0 ^ T1 T4ßT3 & T1

R1 R2

  • R3

R1 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

T4

slide-41
SLIDE 41

Branch Recovery

¨ How to handle branches?

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Register File Fetch Decode Issue Execute Complete Commit FU-n Rename Branch Predictor Re-Order Buffer (ROB) … Branch Data Memory Register Aliasing Table (RAT) Free Register List

slide-42
SLIDE 42

Revisit Branch Prediction

¨ Problem: find the average number of stall cycles

caused by branches in a pipeline, where branch misprediction penalty is 20 cycles, branch predictor accuracy is 90%, and branch target buffer hit rate is 80%. Every fifth instruction is a branch; 30% of branches are actually taken.

slide-43
SLIDE 43

Revisit Branch Prediction

¨ Problem: find the average number of stall cycles

caused by branches in a pipeline, where branch misprediction penalty is 20 cycles, branch predictor accuracy is 90%, and branch target buffer hit rate is 80%. Every fifth instruction is a branch; 30% of branches are actually taken.

¤ Average misses = 1- (0.3x0.9x0.8 + 0.7x0.9) = 0.151 ¤ Average stalls = 20x0.2x0.151 = 0.6

slide-44
SLIDE 44

Speculated Execution

¨ Problem: branch may significantly limit performance

¤ consumer of a load or long latency instructions

¨ Solution: speculative instruction execution

¤ Fetch and decode instructions speculatively ¤ Issue and execute speculative instructions ¤ Branch resolution

n Nullify the impact of speculative instructions if mispredicted n Commit speculative instructions (writes to register

file/memory) only if prediction was correct

slide-45
SLIDE 45

Branch Recovery

¨ Squash all mispredicted entries

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

RAT R1 R2 R3

Reorder Buffer

T4 T1 T3

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 + R3 BEQ T1, R0 T3ßT0 + T1 T4ßT3 + T1

R1 R2

  • R3

R1 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units

slide-46
SLIDE 46

Branch Recovery

¨ Squash all mispredicted entries

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

RAT R1 R2 R3

Reorder Buffer

T4 T1 T3

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 + R3 BEQ T1, R0 T3ßT0 + T1 T4ßT3 + T1

R1 R2

  • R3

R1 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units speculated

slide-47
SLIDE 47

Branch Recovery

¨ Squash all mispredicted entries

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

RAT R1 R2 R3

Reorder Buffer

T4 T1 T3

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 + R3 BEQ T1, R0 T3ßT0 + T1 T4ßT3 + T1

R1 R2

  • R3

R1 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units speculated …

T0 T1

checkpoint

slide-48
SLIDE 48

Physical Register File

¨ Avoid copying register values multiple times

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

RAT R1 R2 R3

Reorder Buffer

T4 T1 T3

Register File …

R1 R2 R3

Issue Queue

T0ßR1 + R2 T1ßT0 + R3 BEQ T1, R0 T3ßT0 + T1 T4ßT3 + T1

R1 R2

  • R3

R1 ROB 1 T0 ROB 2 T1 ROB 3 T2 ROB 4 T3 ROB 5 T4

Functional Units copy …

T0 T1

slide-49
SLIDE 49

Physical Register File

¨ Avoid copying register values multiple times

R1ßR1 + R2 R2ßR1 - R3 BEQ R2, R0 R3ßR1 ^ R2 R1ßR3 & R2

Decode Queue …

Front RAT R1 R2 R3

Reorder Buffer

P4 P1 P3

Retire RAT

R1 R2 R3

Issue Queue

P0ßR1 + R2 P1ßP0 + R3 BEQ P1, R0 P3ßP0 + P1 P4ßP3 + P1

ROB 1 ROB 2 ROB 3 ROB 4 ROB 5

Functional Units

  • Phy. Reg. File

Note2: no need for storing values in ROB or IQ

P0 P1

Note1: only a subset of the Phy. Reg. file is committed at any time.

P0 P1 P2 P3 P4 P5 … R1,P0 R2,P1

  • R3,P3

R1,P4

slide-50
SLIDE 50

Double RAT Architecture

¨ What is the size of ROB?

Inst. Memory Inst. Decoder Issue Queue (IQ) FU-1 Retire RAT Fetch Decode Issue Execute Complete Commit FU-n Rename Branch Predictor Re-Order Buffer (ROB) … Branch Data Memory Front RAT Free Register List Physical Register File

slide-51
SLIDE 51

Physical Register Release

¨ Example: when is it safe to free p30 (R1)?

ADD R1, R2, R3 SUB R2, R1, R3 … ADD R3, R1, R2 … SUB R1, R3, R2 ADD R2, R1, R3 R1 R2 R3

Architectural Registers

p33 p34 p35 p30 p31 p32

Physical Registers

slide-52
SLIDE 52

Physical Register Release

¨ Example: when is it safe to free p30 (R1)?

ADD R1, R2, R3 SUB R2, R1, R3 … ADD R3, R1, R2 … SUB R1, R3, R2 ADD R2, R1, R3 R1 R2 R3

Architectural Registers

p33 p34 p35 p30 p31 p32

Physical Registers

slide-53
SLIDE 53

Physical Register Release

¨ Example: when is it safe to free p30 (R1)?

ADD R1, R2, R3 SUB R2, R1, R3 … ADD R3, R1, R2 … SUB R1, R3, R2 ADD R2, R1, R3 R1 R2 R3

Architectural Registers

p33 p34 p35 p30 p31 p32

Physical Registers

slide-54
SLIDE 54

Physical Register Release

¨ Example: when is it safe to free p30 (R1)?

ADD R1, R2, R3 SUB R2, R1, R3 … ADD R3, R1, R2 … SUB R1, R3, R2 ADD R2, R1, R3 R1 R2 R3

Architectural Registers

p33 p34 p35 p30 p31 p32

Physical Registers

p34

@ retiring the second R1