Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) - - PowerPoint PPT Presentation

basic pipelining wrap up
SMART_READER_LITE
LIVE PREVIEW

Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) - - PowerPoint PPT Presentation

Slide Set #21: Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) Chapter 6 and beyond 1 2 Pipelining Big Picture Remember the single-cycle implementation Improve performance by increasing instruction throughput


slide-1
SLIDE 1

1 Slide Set #21: Exploiting ILP Chapter 6 and beyond 2

Basic Pipelining Wrap-up (from Slide Set 20)

3

Pipelining

  • Improve performance by increasing instruction throughput

Program execution

  • rder

(in instructions) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Time 200 400 600 800 1000 1200 1400 1600 1800 Instruction fetch R e g A L U Data acc es s R e g Instruction fetch R e g A L U Data a c ces s R e g Instruction fetch 800 ps 800 ps 800 ps Program execution

  • rder

(in instructions) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) Time 200 400 600 800 1000 1200 1400 Instruction fetch R e g A L U Data acc e ss R e g Instruction fetch Instruction fetch R e g A L U Data a c ces s R e g R e g A L U Data ac c ess R e g 200 ps 200 ps 200 ps 200 ps 200 ps 200 ps 200 ps

Ideal speedup is number of stages in the pipeline. Do we achieve this?

4

Clock cycle time (vs. single cycle) How many instructions executing at

  • nce?

Each stage has its own set of hardware? Split instruction into multiple stages (1 per cycle)? Amount of hardware used (vs. single cycle) Pipelined Multicycle

Big Picture

  • Remember the single-cycle implementation

– Inefficient because low utilization of hardware resources – Each instruction takes one long cycle

  • Two possible ways to improve on this:
slide-2
SLIDE 2

5

Pipelining and Beyond

6

Exploiting More ILP

  • ILP = __________________ _________________ ________________

(parallelism within a single program)

  • How can we exploit more ILP?
  • 1. ________________________

(Split execution into many stages)

  • 2. ___________________________

(Start executing more than one instruction each cycle)

7

Example – Multiple Issue

How many cycles does it take for this code to execute on a 2-issue CPU? add $t0, $t1, $t2 lw $s1, 0($s2) add $t0, $t0, $t4 sw $s1, 0($s3) Answer?

8

Multiple Issue Processors

  • Key metric: CPI
  • IPC
  • Key questions:
  • 1. What set of instructions can be issued together?
  • 2. Who decides which instructions to issue together?

– Static multiple issue – Dynamic multiple issue

slide-3
SLIDE 3

9

Multiple Issue Processors

  • What extra hardware do we need to do Static Multiple Issue?
  • What else for Dynamic Multiple Issue?

10

Example – MIPS Static Multiple Issue

11

Example – Dynamic Multiple Issue Scheduling

Instruction fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Integer Floating point Load/ Store Commit unit In-order issue Out-of-order execute Functional units In-order commit ... ...

12

Exercise #1

Assume you must execute the following instructions in order. In any one cycle you can issue at most one integer op and one load or store. Show the resultant pipeline diagram. What’s the total number of cycles? If you can’t issue an instruction on a certain cycle, wait for the next cycle. lw $t0, 0($s2) sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 add $a0, $a0, $a3

slide-4
SLIDE 4

13

Exercise #2

Use same assumptions as with Exercise #1, but first schedule the code to try and eliminate stalls. Show the new pipeline diagram and total number of cycles. lw $t0, 0($s2) sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 add $a0, $a0, $a3

14 Exercise #3: Static vs. Dynamic Multiple Issue

  • Which do you think has been commercially successful – static or

dynamic issue? Why?

15

Exercise #4

  • Look ahead at the slide for Idea #4 – loop unrolling. What is the

possible bug?

16

Ideas for improving Multiple Issue

  • 1. Non-blocking caches
  • 2. Speculation
  • 3. Register renaming
  • 4. Loop unrolling
slide-5
SLIDE 5

17

Idea #3: Register renaming

lw $t0, 0($s0) sw $t0, 4($s0) lw $t0, 0($s2) sw $t0, 4($s2) Problem? Solution?

18

Idea #4: Loop unrolling

Loop: lw $t0, 0($s1) sw $t0, 0($s2) addi $s1, $s1, -4 addi $s2, $s2, -4 bne $s1, $zero,Loop Loop: lw $t0, 0($s1) lw $t1, 4($s1) lw $t2, 8($s1) lw $t3,12($s1) sw $t0, 0($s2) sw $t1, 4($s2) sw $t2, 8($s2) sw $t3,12($s2) addi $s1, $s1, -16 addi $s2, $s2, -16 bne $s1, $zero,Loop

Why is this a good idea?

19

Slower Faster Instructions per clock (IPC = 1/CPI) Multicycle (Section 5.5) Single-cycle (Section 5.4) Deeply pipelined Pipelined Multiple issue with deep pipeline (Section 6.10) Multiple-issue pipelined (Section 6.9)

Chapter 6 Summary