Runahead Runahead Runahead Runahead High Level Description - - PowerPoint PPT Presentation

runahead runahead runahead runahead
SMART_READER_LITE
LIVE PREVIEW

Runahead Runahead Runahead Runahead High Level Description - - PowerPoint PPT Presentation

Outline Outline Outline Outline Motivation Motivation Runahead Runahead Runahead Runahead High Level Description High Level Description Processor Processor Processor Processor Microarchitecture


slide-1
SLIDE 1

Runahead Runahead Runahead Runahead Processor Processor Processor Processor

Finale Finale Doshi Doshi Ravi Palakodety Ravi Palakodety

Outline Outline Outline Outline

  • Motivation

Motivation

  • High Level Description

High Level Description

  • Microarchitecture

Microarchitecture

  • Results

Results

  • Conclusions

Conclusions

Where We Left Off… Where We Left Off… Where We Left Off… Where We Left Off…

  • Lab 3

Lab 3 – – Building a 4 Building a 4-

  • stage pipelined

stage pipelined SMIPS processor SMIPS processor

  • Critical Path

Critical Path – – Load Load-

  • α

α

  • Fetch

Fetch Decode Decode Execute Execute

  • ReadDataCache

ReadDataCache Writeback Writeback

  • Data Cache Miss?

Data Cache Miss?

  • Stall until data returns from Main Memory

Stall until data returns from Main Memory

A A A A Baaad Baaad Baaad Baaad Example Example Example Example

  • Ld

Ld-

  • α

α, , Ld Ld-

  • β

β, Ld , Ld-

  • γ

γ, , … …

  • If latency = 100 cycles from main

If latency = 100 cycles from main memory to cache, then: memory to cache, then:

  • Initiate

Initiate Ld Ld-

  • α

α request request

  • Stall for 100 cycles

Stall for 100 cycles

  • Initiate

Initiate Ld Ld-

  • β

β request request

  • Stall for 100 cycles

Stall for 100 cycles

  • And so on

And so on… …

slide-2
SLIDE 2

Key Insight Key Insight Key Insight Key Insight

“Runahead Runahead” to see whether there are ” to see whether there are memory accesses in the near future memory accesses in the near future

  • With an instruction sequence

With an instruction sequence Ld Ld-

  • α

α, , Ld Ld-

  • β

β

  • Initiate memory request for

Initiate memory request for Ld Ld-

  • α

α

  • Continue execution

Continue execution

  • Initiate memory request for

Initiate memory request for Ld Ld-

  • β

β

Outline Outline Outline Outline

  • Motivation

Motivation

  • High Level Description

High Level Description

  • Microarchitecture

Microarchitecture

  • Results

Results

  • Conclusions

Conclusions

DataCache DataCache DataCache DataCache Miss Occurs… Miss Occurs… Miss Occurs… Miss Occurs…

  • Backup the register file

Backup the register file

  • Keep running instructions

Keep running instructions

  • Use

Use INV INV as the result of any ops that: as the result of any ops that:

  • Are

Are DataCache DataCache misses misses

  • Depend on calculations involving

Depend on calculations involving DataCache DataCache misses misses

Data Returns.. Data Returns.. Data Returns.. Data Returns..

  • Cache is updated from

Cache is updated from MainMem MainMem: :

  • Restore the register file

Restore the register file

  • Rerun the original “offending” instruction

Rerun the original “offending” instruction

slide-3
SLIDE 3

Follow the Rules Follow the Rules Follow the Rules Follow the Rules

  • Do NOT

Do NOT

  • Update the

Update the DataCache DataCache while in while in Runahead Runahead mode mode

  • Initiate Memory Requests that depend on

Initiate Memory Requests that depend on INV INV addresses addresses

  • Branch when predicate depends on

Branch when predicate depends on INV INV data data

  • Initiate Memory Requests that cause

Initiate Memory Requests that cause collisions in collisions in DataCache DataCache

Outline Outline Outline Outline

  • Motivation

Motivation

  • High Level Description

High Level Description

  • Microarchitecture

Microarchitecture

  • Results

Results

  • Conclusions

Conclusions

Processor Side Processor Side Processor Side Processor Side Cache Side Cache Side Cache Side Cache Side

slide-4
SLIDE 4

Execution Execution Execution Execution -

  • Enter

Enter Enter Enter Runahead Runahead Runahead Runahead Execution Execution Execution Execution -

  • In

In In In Runahead Runahead Runahead Runahead Execution Execution Execution Execution -

  • Exit

Exit Exit Exit Runahead Runahead Runahead Runahead Design Explorations Design Explorations Design Explorations Design Explorations

  • Store Cache Optimization

Store Cache Optimization

  • Decisions when to exit

Decisions when to exit runahead runahead

slide-5
SLIDE 5

Store Cache Store Cache Store Cache Store Cache

  • Ld

Ld-

  • α

α, St , St-

  • β

β, Ld , Ld-

  • β

β

  • Rather than return

Rather than return Ld Ld-

  • β

β as as INV INV, return , return the value that was just stored. the value that was just stored.

  • Use 4

Use 4-

  • entry table, as in Branch Predictor

entry table, as in Branch Predictor

When to Exit When to Exit When to Exit When to Exit Runahead Runahead Runahead Runahead? ?

  • When the “offending” miss returns? OR

When the “offending” miss returns? OR

  • When all memory requests that are

When all memory requests that are currently in currently in-

  • flight are processed?

flight are processed?

Outline Outline Outline Outline

  • Motivation

Motivation

  • High Level Description

High Level Description

  • Microarchitecture

Microarchitecture

  • Results

Results

  • Conclusions

Conclusions

Key Parameters Key Parameters Key Parameters Key Parameters

  • Vary Latency of Main Memory

Vary Latency of Main Memory

  • As the latency increases, the impact of

As the latency increases, the impact of runahead runahead becomes more significant becomes more significant

  • At small latencies, the penalty for

At small latencies, the penalty for entering/exiting entering/exiting runahead runahead can reduce can reduce performance performance

slide-6
SLIDE 6

Key Parameters Key Parameters Key Parameters Key Parameters

  • Vary Size of

Vary Size of FIFOs FIFOs

  • As the

As the FIFOs FIFOs get larger, the processor is get larger, the processor is able to run further ahead and generate more able to run further ahead and generate more parallel memory requests. parallel memory requests.

  • As the

As the FIFOs FIFOs get larger, the penalty for get larger, the penalty for exiting exiting runahead runahead becomes more severe. becomes more severe.

Testing Strategy Testing Strategy Testing Strategy Testing Strategy

  • Latencies of 1, 20, and 100 cycles

Latencies of 1, 20, and 100 cycles

  • Fifos

Fifos of length 2, 5, 8, 15

  • f length 2, 5, 8, 15
  • Standard benchmarks; focus on

Standard benchmarks; focus on vvadd vvadd We’ll focus on length 15 We’ll focus on length 15 fifos fifos here since here since they allowed for the most extensive they allowed for the most extensive runahead runahead. .

Results Results Results Results Results Results Results Results

slide-7
SLIDE 7

Results Results Results Results Outline Outline Outline Outline

  • Motivation

Motivation

  • High Level Description

High Level Description

  • Microarchitecture

Microarchitecture

  • Results

Results

  • Conclusions

Conclusions

Conclusions Conclusions Conclusions Conclusions

  • Runahead

Runahead is good. is good.

Conclusions Conclusions Conclusions Conclusions

  • Runahead

Runahead is a cheap and simple way to is a cheap and simple way to improve IPS. improve IPS.

  • The enter/exit

The enter/exit runahead runahead penalty is small penalty is small enough that the IPS is always enough that the IPS is always comparable to the Lab 3 processor. comparable to the Lab 3 processor.

  • The control structure is (fairly)

The control structure is (fairly) straightforward, with most improvements straightforward, with most improvements done on the cache side. done on the cache side.

slide-8
SLIDE 8

Extensions Extensions Extensions Extensions

  • Aggressive Branch Prediction

Aggressive Branch Prediction

  • Don’t stall when branch predicate is

Don’t stall when branch predicate is INV INV

  • Save valid

Save valid runahead runahead computations computations

  • Aggressive

Aggressive Prefetching Prefetching

  • Predict addresses for Ld, St, when the given

Predict addresses for Ld, St, when the given address is address is INV INV. .