Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) - - PowerPoint PPT Presentation

▶

Jan 31, 2023 324 likes •397 views

Slide Set #21: Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) Chapter 6 and beyond 1 2 Pipelining Big Picture Remember the single-cycle implementation Improve performance by increasing instruction throughput

SLIDE 1

1 Slide Set #21: Exploiting ILP Chapter 6 and beyond 2

Basic Pipelining Wrap-up (from Slide Set 20)

3 Pipelining

Improve performance by increasing instruction throughput

Program execution

rder

(in instructions) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Time 200 400 600 800 1000 1200 1400 1600 1800 Instruction fetch R e g A L U Data acc es s R e g Instruction fetch R e g A L U Data a c ces s R e g Instruction fetch 800 ps 800 ps 800 ps Program execution

rder

(in instructions) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Time 200 400 600 800 1000 1200 1400 Instruction fetch R e g A L U Data acc e ss R e g Instruction fetch Instruction fetch R e g A L U Data a c ces s R e g R e g A L U Data ac c ess R e g 200 ps 200 ps 200 ps 200 ps 200 ps 200 ps 200 ps

Ideal speedup is number of stages in the pipeline. Do we achieve this?

4

Clock cycle time (vs. single cycle) How many instructions executing at

nce?

Each stage has its own set of hardware? Split instruction into multiple stages (1 per cycle)? Amount of hardware used (vs. single cycle) Pipelined Multicycle

Big Picture

Remember the single-cycle implementation

– Inefficient because low utilization of hardware resources – Each instruction takes one long cycle

Two possible ways to improve on this:

SLIDE 2

5 Pipelining and Beyond

6 Exploiting More ILP

ILP = __________________ _________________ ________________

(parallelism within a single program)

How can we exploit more ILP?
1. ________________________

(Split execution into many stages)

2. ___________________________

(Start executing more than one instruction each cycle)

7 Example – Multiple Issue

How many cycles does it take for this code to execute on a 2-issue CPU? add $t0, $t1, $t2 lw $s1, 0($s2) add $t0, $t0, $t4 sw $s1, 0($s3) Answer?

8 Multiple Issue Processors

Key metric: CPI
IPC
Key questions:
1. What set of instructions can be issued together?
2. Who decides which instructions to issue together?

– Static multiple issue – Dynamic multiple issue

SLIDE 3

9 Multiple Issue Processors

What extra hardware do we need to do Static Multiple Issue?
What else for Dynamic Multiple Issue?

10 Example – MIPS Static Multiple Issue

11 Example – Dynamic Multiple Issue Scheduling

Instruction fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Integer Floating point Load/ Store Commit unit In-order issue Out-of-order execute Functional units In-order commit ... ...

12 Exercise #1

Assume you must execute the following instructions in order. In any one cycle you can issue at most one integer op and one load or store. Show the resultant pipeline diagram. What’s the total number of cycles? If you can’t issue an instruction on a certain cycle, wait for the next cycle. lw $t0, 0($s2) sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 add $a0, $a0, $a3

SLIDE 4

13 Exercise #2

Use same assumptions as with Exercise #1, but first schedule the code to try and eliminate stalls. Show the new pipeline diagram and total number of cycles. lw $t0, 0($s2) sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 add $a0, $a0, $a3

14 Exercise #3: Static vs. Dynamic Multiple Issue

Which do you think has been commercially successful – static or

dynamic issue? Why?

15 Exercise #4

Look ahead at the slide for Idea #4 – loop unrolling. What is the

possible bug?

16 Ideas for improving Multiple Issue

1. Non-blocking caches
2. Speculation
3. Register renaming
4. Loop unrolling

SLIDE 5

17 Idea #3: Register renaming

lw $t0, 0($s0) sw $t0, 4($s0) lw $t0, 0($s2) sw $t0, 4($s2) Problem? Solution?

18 Idea #4: Loop unrolling

Loop: lw $t0, 0($s1) sw $t0, 0($s2) addi $s1, $s1, -4 addi $s2, $s2, -4 bne $s1, $zero,Loop Loop: lw $t0, 0($s1) lw $t1, 4($s1) lw $t2, 8($s1) lw $t3,12($s1) sw $t0, 0($s2) sw $t1, 4($s2) sw $t2, 8($s2) sw $t3,12($s2) addi $s1, $s1, -16 addi $s2, $s2, -16 bne $s1, $zero,Loop

Why is this a good idea?

19

Slower Faster Instructions per clock (IPC = 1/CPI) Multicycle (Section 5.5) Single-cycle (Section 5.4) Deeply pipelined Pipelined Multiple issue with deep pipeline (Section 6.10) Multiple-issue pipelined (Section 6.9)

1 Slide Set #21: Exploiting ILP Chapter 6 and beyond 2

Basic Pipelining Wrap-up (from Slide Set 20)

3

Pipelining

Ideal speedup is number of stages in the pipeline. Do we achieve this?

4

Clock cycle time (vs. single cycle) How many instructions executing at

Each stage has its own set of hardware? Split instruction into multiple stages (1 per cycle)? Amount of hardware used (vs. single cycle) Pipelined Multicycle

Big Picture

– Inefficient because low utilization of hardware resources – Each instruction takes one long cycle

5

Pipelining and Beyond

6

Exploiting More ILP

(parallelism within a single program)

(Split execution into many stages)

(Start executing more than one instruction each cycle)

7

Example – Multiple Issue

How many cycles does it take for this code to execute on a 2-issue CPU? add $t0, $t1, $t2 lw $s1, 0($s2) add $t0, $t0, $t4 sw $s1, 0($s3) Answer?

8

Multiple Issue Processors

– Static multiple issue – Dynamic multiple issue

9

Multiple Issue Processors

10

Example – MIPS Static Multiple Issue

11

Example – Dynamic Multiple Issue Scheduling

12

Exercise #1

13

Exercise #2

Use same assumptions as with Exercise #1, but first schedule the code to try and eliminate stalls. Show the new pipeline diagram and total number of cycles. lw $t0, 0($s2) sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 add $a0, $a0, $a3

14 Exercise #3: Static vs. Dynamic Multiple Issue

dynamic issue? Why?

15

Exercise #4

possible bug?

16

Ideas for improving Multiple Issue

17

Idea #3: Register renaming

lw $t0, 0($s0) sw $t0, 4($s0) lw $t0, 0($s2) sw $t0, 4($s2) Problem? Solution?

18

Idea #4: Loop unrolling

Loop: lw $t0, 0($s1) sw $t0, 0($s2) addi $s1, $s1, -4 addi $s2, $s2, -4 bne $s1, $zero,Loop Loop: lw $t0, 0($s1) lw $t1, 4($s1) lw $t2, 8($s1) lw $t3,12($s1) sw $t0, 0($s2) sw $t1, 4($s2) sw $t2, 8($s2) sw $t3,12($s2) addi $s1, $s1, -16 addi $s2, $s2, -16 bne $s1, $zero,Loop

Why is this a good idea?

19

Slower Faster Instructions per clock (IPC = 1/CPI) Multicycle (Section 5.5) Single-cycle (Section 5.4) Deeply pipelined Pipelined Multiple issue with deep pipeline (Section 6.10) Multiple-issue pipelined (Section 6.9)

Chapter 6 Summary