Graph Prefetching Using Data Structure Knowledge Sam Ainsworth and - - PowerPoint PPT Presentation
Graph Prefetching Using Data Structure Knowledge Sam Ainsworth and - - PowerPoint PPT Presentation
Graph Prefetching Using Data Structure Knowledge Sam Ainsworth and Timothy M. Jones Computer Laboratory Graph500 Search Performance Current Prefetching Techniques Stride Software Exploit Look-ahead! Work List Vertex List Edge List
Graph500 Search Performance
Current Prefetching Techniques
- Stride
- Software
Exploit Look-ahead!
Vertex List # # 3 5 # # # # # Visited False True True True True True False True Edge List # # # 6 # # # # # # # Work List 5 4 1 2 3 7 ...
Problems
- Need address bounds of data structures
- Need to schedule prefetches
- Need to react to variable latency loads
Problems
- Need address bounds of data structures
- Need to schedule prefetches
- Need to react to variable latency loads
- Configure them in software!
Problems
- Need address bounds of data structures
- Need to schedule prefetches
- Need to react to variable latency loads
- Configure them in software!
- Use observation hardware – EWMAs.
Problems
- Need address bounds of data structures
- Need to schedule prefetches
- Need to react to variable latency loads
- Configure them in software!
- Use observation hardware – EWMAs.
- React to arrival of prefetches, not loads!
Graph Prefetcher
Core Dcache
Snoops Prefetch Reqs Prefetched Data
DTLB
Config
Main Memory Work List Vertex List Edge List L2 Cache
To / From L2 Cache
EWMA Calculator Address Generator Request Queue
Prefetcher
Visited List
Visited False True True True True True False True Edge List # # # 6 1 # # # # # Vertex List # # 3 5 7 # # # # Work List 5 4 1 2 3 7 ...
Graph Prefetcher: Microarchitecture
Address Bounds Registers Work List Start Work List End Vertex List Start Vertex List End Edge List Start Edge List End Visited List Start Visited List End Address Filter EWMA Unit Work List Time EWMA Data Time EWMA Ratio Register Prefetch Request Queue To DTLB & L1 Cache Prefetch Address Generator Snoops & Prefetched Data From L1 Cache
Results – Graph500
Results – Boost Graph Library
Results – Sequential Iteration
Generalized Prefetching - Databases
Key 12 62 43 Hash Table Bucket ( 43, 2, ptr) Bucket 12 43 ptr Bucket 13 87 null
Lookahead by striding in the key list
Hash(43) = 3
Programmable Prefetcher
Programmable Registers Hash XOR Shift Amount Hash Table Start Hash Table End Key List Start Key List End Other Data Other Data Address Filter EWMA Unit Work List Time EWMA Data Time EWMA Ratio Register Prefetch Request Queue To DTLB & L1 Cache Programmable Units Snoops & Prefetched Data From L1 Cache CPU CPU CPU CPU CPU CPU
Graph Prefetching Using Data Structure Knowledge
Sam Ainsworth and Timothy M. Jones
sam.ainsworth@cl.cam.ac.uk timothy.jones@cl.cam.ac.uk
Core
Dcache
Snoops Prefetch Reqs Prefetched Data
DTLB
Config
Main Memory Work List Vertex List Edge List
L2 Cache
To / From L2 Cache
EWMA Calculator Address Generator Request Queue Prefetcher
Visited List