Complexity-Effective Issue Queue Design Under Load-Hit Speculation - - PowerPoint PPT Presentation
Complexity-Effective Issue Queue Design Under Load-Hit Speculation - - PowerPoint PPT Presentation
Complexity-Effective Issue Queue Design Under Load-Hit Speculation Tali Moreshet and R. Iris Bahar Brown University Division of Engineering Motivation Pipelines are getting deeper Higher clock frequencies Increased architectural
Brown University
WCED 2002
Motivation
Pipelines are getting deeper
Higher clock frequencies Increased architectural complexity
Speculatively issued instructions are
particularly sensitive to pipeline depth
Branch prediction Load hit prediction
Brown University
WCED 2002
Pipeline
Register File Functional Units Register Rename Unit Data Cache Instruction Cache Issue Queue
Load Resolution Loop
Fetch Decode Issue Execute
forwarding
Brown University
WCED 2002
Load Hit Prediction
Issue instructions dependent on load as soon
as possible
- Assume load hits in DL1
BUT…
Load hit status is known only after dependent
instructions may issue
Brown University
WCED 2002
Example
Exec Exec Issue Exec Exec Exec
Cycle: 1 2 3 4 5 6 7 8 LOAD MULT SUB ADD
Issue Issue Issue
Speculative window
Exec
Brown University
WCED 2002
Example
Exec Issue Exec Exec
Cycle: 1 2 3 4 5 6 7 8 9 LOAD ADD Speculative window
Exec Issue Issue Issue Exec
MULT SUB
Exec
Brown University
WCED 2002
Example
Issue Exec Exec Exec
Cycle: 1 2 3 4 5 6 7 8 9 10 LOAD ADD
Exec Issue Issue Issue
Speculative window MULT SUB
Exec Exec
Brown University
WCED 2002
What Happens On a Load Miss?
Re-issue instructions in speculative window
after a load miss
Keep post-issue instructions in issue queue
long enough to ensure re-issuing will not be necessary
Brown University
WCED 2002
Complexity-Effective Load Hit Speculation
- As pipeline depth increases:
- Retain performance benefit
- Consider complexity of re-issue and prediction
policies
- Consider impact on issue queue design
Brown University
WCED 2002
Re-Issue Policies
- 4 different load hit speculation policies:
1)
No load hit speculation
2)
Perfect load hit speculation
3)
Replay only instructions dependent on load that missed
4)
Replay all instructions in speculative window
- Load hit/miss predictor to limit re-issuing
Brown University
WCED 2002
Performance Impact
- 5%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Exe1 Exe3 Exe5 Exe7
Performance Increase from No Load Speculation
Perfect_Int Dep_Int Dep_Pred_Int Seq_Int Seq_Pred_Int Perfect_FP Dep_FP Dep_Pred_FP Seq_FP Seq_Pred_FP
Brown University
WCED 2002
Impact on Issue Queue Occupancy
5 10 15 20 25 30 35 40
No Load Speculation, Integer Benchmarks No Load Speculation, Floating Point Benchmarks Dependent Load Speculation, Integer Benchmarks Dependent Load Speculation, Floating Point Benchmarks
Average Number of Instructions in the Issue Queue
pre-issue post-issue
Brown University
WCED 2002
Impact on Issue Queue Occupancy
0% 10% 20% 30% 40% 50% 60% 70% Exe1 Exe3 Exe5 Exe7
Percentage of Post-Issue Instructions in the Issue Queue
compress ijpeg bzip Int_avg apsi swim art wupwise FP_avg
Brown University
WCED 2002
Impact on Issue Queue Occupancy
As pipeline depth increases:
Issue queue gets cluttered with post-issue
instructions (average 55%)
Limits the available ILP Inefficient use of complexity in instruction
bid/grant arbitration logic
Brown University
WCED 2002
The Bid / Grant Loop
Prioritize & Select M entries Issue Queue
req req req grant grant grant
N-wide
Bid for issue slot Broadcast grant
...
Brown University
WCED 2002
Issue Queue Utilization Problem
Complexity of bid/grant arbitration logic
increases with size of the IQ
IQ consists largely of post-issue instructions Limiting the available ILP that a large IQ is
supposed to provide
- Not a complexity-effective design
Brown University
WCED 2002
IQ Design Options
Increase the IQ size
☺ Improve performance – increase available ILP Increase complexity
Simplify arbitration logic – use slower circuitry
☺ Reduce complexity Hurt performance
Reduce IQ size
☺ Reduce complexity Hurt performance
Brown University
WCED 2002
Double Latency of Issue Queue
- 70%
- 60%
- 50%
- 40%
- 30%
- 20%
- 10%
0%
Exe1 Exe3 Exe5 Exe7
Performance Increase From a 64 Entry Issue Queue, Dependent Load Speculation
compress ijpeg bzip Int_avg apsi swim art wupwize FP_avg
Brown University
WCED 2002
Smaller IQ (48 Entry)
- 25%
- 20%
- 15%
- 10%
- 5%
0% 5% Exe1 Exe3 Exe5 Exe7
Performance Increase From a 64 Entry Issue Queue, Dependent Load Speculation
compress ijpeg bzip Int_avg apsi swim art wupwise FP_avg
Brown University
WCED 2002
Complexity-Effective Issue Queue
Goal
Reduce complexity Do not degrade performance
Solution: The Dual Issue Queue
Move post-issue instructions from main queue to
separate replay queue
Increase available ILP Reduce size of main IQ
Brown University
WCED 2002
Dual Issue Queue
Register File Functional Units Register Rename Unit Data Cache Main Issue Queue Replay Issue Queue
from Fetch unit Replay_req MIQ RIQ
Brown University
WCED 2002
Dual Issue Queue Performance
- 8%
- 6%
- 4%
- 2%
0% 2% 4% 6% 8% 10% Exe1 Exe3 Exe5 Exe7
Performance Increase From Standard Issue Queue, Dependent Load Speculation
compress ijpeg bzip Int_avg apsi swim art wupwise FP_avg
Brown University
WCED 2002
Conclusion
Load hit speculation is critical for high
performance in deeper pipelines
Larger percentage of post-issue instructions
in issue queue
Complexity-effective issue queue scheme
addresses utilization problem
For deepest pipelines, overall performance