Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism
MA MARK C. JEFFR FFREY, VICTOR A. YING, SUVINAY SUBRAMANIAN, HYUN RYONG LEE, JOEL EMER, DANIEL SANCHEZ MI MICRO 2018
Harmonizing Speculative and Non-Speculative Execution in - - PowerPoint PPT Presentation
Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism MA MARK C. JEFFR FFREY , VICTOR A. YING, SUVINAY SUBRAMANIAN, HYUN RYONG LEE, JOEL EMER, DANIEL SANCHEZ MI MICRO 2018 There is a (false)
MA MARK C. JEFFR FFREY, VICTOR A. YING, SUVINAY SUBRAMANIAN, HYUN RYONG LEE, JOEL EMER, DANIEL SANCHEZ MI MICRO 2018
SPECULATIVE PARALLELIZATION NON-SPECULATIVE PARALLELIZATION
Lower overheads Parallel irrevocable actions
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
2
Simplifies parallel programming Uncovers abundant parallelism
Espresso
Capsules
e.g. memory allocator that improves performance up to 69x
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
3
THE NEED FOR SPECULATIVE AND NON-SPECULATIVE PARALLELISM
4
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source
Order = Distance from source node Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source
Order = Distance from source node
A
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source
Order = Distance from source node
A
Task graph
First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source
Order = Distance from source node
A
Task graph
First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A
Order = Distance from source node
A
Task graph
First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A
Order = Distance from source node
A C B
Task graph
First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A
Order = Distance from source node
A C B
Task graph
First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A C 2
Order = Distance from source node
A C B
Task graph
2 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A C 2
Order = Distance from source node
A C B B
Task graph
D E 2 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A C 2
Order = Distance from source node
A C B B
Task graph
D E 2 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C 3 2
Order = Distance from source node
A C B B
Task graph
D E 2 3 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C 3 2
Order = Distance from source node
A C B B D
Task graph
D E 2 3 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C 3 2
Order = Distance from source node
A C B B D
Task graph
D E 2 3 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C 3 2
Order = Distance from source node
A C B B D
Task graph
D E 2 3 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2
Order = Distance from source node
A C B D
Task graph
D E 2 3 4 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
B
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2
Order = Distance from source node
A C B D
Task graph
D E E 2 3 4 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
B
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2
Order = Distance from source node
A C B D
Task graph
D E E 2 3 4 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
B
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2
Order = Distance from source node
A C B D
Task graph
D E E 2 3 4 E 3 5 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
B
1 2 3 4 5 6 7 8
Finds shortest path tree on a graph with weighted edges
A B C D E 3 2 2 4 1 3 3 source A B C D 1 3 2
Order = Distance from source node
A C B D
Task graph
D E E 2 3 4 E 3 5 First to visit vertex Vertex already visited To be processed
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
5
Input graph
B
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
1 256 512 Speedup 1c 128c 256c
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
Dijkstra on USA-E Non-speculative
Dijkstra performance
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
1 256 512 Speedup 1c 128c 256c
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
Dijkstra on USA-E Non-speculative
Dijkstra performance
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
1 256 512 Speedup 1c 128c 256c
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
Dijkstra on USA-E Non-speculative
Dijkstra performance
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Data dependences
1 256 512 Speedup 1c 128c 256c
Order = Distance from source node
A C B B D D E E
Task graph
1 2 3 4 5 6 7 8
Dijkstra on USA-E Non-speculative
Dijkstra performance
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
6
Data dependences Valid
schedule
A C B B D D E E
Time
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
7
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
7
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
7
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } }
Timestamp
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
7
Function Pointer Arguments
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } } swarm::enqueue(dijkstraTask, 0, sourceVertex); swarm::run();
Timestamp
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
7
Function Pointer Arguments
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) { Timestamp nDist = dist + weight(v, n); swarm::enqueue(dijkstraTask, nDist, n); } } } swarm::enqueue(dijkstraTask, 0, sourceVertex); swarm::run();
Implicit Parallelism No explicit synchronization Timestamp
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
7
Function Pointer Arguments
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
8
Large hardware task queues Scalable ordered speculation Scalable ordered commits
64-tile, 256-core chip Tile organization
Core Core Core Core L1I/D L1I/D L1I/D L1I/D
L2 L3 slice
Router
Task unit
Mem / IO Mem / IO Mem / IO Mem / IO
Tile
Swarm executes all tasks speculatively and out of order
Order = Distance from source node
A C B B D D E E
1 2 3 4 5 6 7 8
Non-speculative
1 256 512 Speedup 1c 128c 256c
Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B B D D E E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B B D D E E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15] A C B B D D E E
Time
1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
Order = Distance from source node
A C B D E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
20%
Order = Distance from source node
A C B B D D E E
1 2 3 4 5 6 7 8
Non-speculative All-speculative [MICRO’15]
1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on cage14 Dijkstra on USA-E
Task graph
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
9
Dijkstra performance
20%
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
To be processed Finished
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
To be processed Finished
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
To be processed Finished
Run tasks non-speculatively when possible
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
Running non-speculatively
Order = Distance from source node Task graph
1 2 3 4 5 6 7
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
10
Running speculatively To be processed Finished
Run tasks non-speculatively when possible Keep cores busy with speculative ordered parallelism
1 256 512 Speedup 1c 128c 256c 1 256 512 Speedup 1c 128c 256c
Dijkstra on USA
1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c
Dijkstra on cage14
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
Non-speculative All-speculative Espresso
11
COORDINATING SPECULATIVE AND NON-SPECULATIVE PARALLELISM
12
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
Programs consist of tasks that run speculatively or non-speculatively
Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Arguments Function pointer Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Arguments Function pointer Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Arguments Function pointer Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Arguments Function pointer Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Arguments Function pointer Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Programs consist of tasks that run speculatively or non-speculatively
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create( dijkstraTask, dist + weight(v, n), n->id, n); } }
Arguments Function pointer Non-Spec. Spec. Timestamp barrier
commits Locale mutex reduce conflicts
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
13
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
Dispatch Candidates
Tile
7 9 10 …
Core Core
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC Dispatch Candidates
Tile
7 9 10 …
Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Core
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC Dispatch Candidates
Tile
7 9 10 …
Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Core
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC MAYSPEC Dispatch Candidates
Tile
7 9 10 …
Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Core
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC MAYSPEC Dispatch Candidates
Tile
7 9 10 …
Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Core
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC MAYSPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC MAYSPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
Espresso supports three task types that control speculation
void dijkstraTask(Timestamp dist, Vertex* v) { if (v->distance == UNSET) { v->distance = dist; for (Vertex* n : v->neighbors) espresso::create< type >( dijkstraTask, dist + weight(v, n), n->id, n); } }
SPEC NONSPEC MAYSPEC Dispatch Candidates
Tile
7 9 10 …
Core Core
7 SPEC 9 SPEC 10 SPEC … 7 NONSPEC 9 SPEC 10 NONSPEC … 7 MAYSPEC 9 SPEC 10 NONSPEC …
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
14
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
NONSPEC Swarm MAYSPEC
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
15
MAYSPEC allows programmers to exploit the best of speculative and non-speculative parallelism
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
NONSPEC Swarm MAYSPEC
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
15
MAYSPEC allows programmers to exploit the best of speculative and non-speculative parallelism
2.5x
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
1 128 256 Speedup
sssp-cage
1 256 512
sssp-usa
1 128 256
cf
1 128 256
triangle
1 64 128 Speedup
genome
1 128 256
kmeans
1 128 256
color
1 256 512
bfs
1 64 128 Speedup 1c 128c 256c
mis
1 128 256 1c 128c 256c
astar
1 128 256 1c 128c 256c
des
NONSPEC Swarm MAYSPEC
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
15
NONSPEC: 29x gmean Swarm: 162x MAYSPEC: 198x 22% 6.9x
MAYSPEC allows programmers to exploit the best of speculative and non-speculative parallelism
2.5x
Microarchitectural details Interactions between speculative and non-speculative tasks:
Espresso exception model Additional results analysis
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
16
ENABLING SOFTWARE-MANAGED SPECULATION WITH ORDERED PARALLELISM
17
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory Core
D A
Core
Read & Write
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c
DES
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory
Free list
Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c
DES
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory
Free list
Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c
DES
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory
Free list
Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c
DES
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory
Free list
Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c
DES
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory
Free list
Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c
DES
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
Discrete event simulation (DES) needs speculation to scale DES also allocates memory within tasks
Memory
Free list
Core
D A
Core
Read & Write
1 128 256 Speedup 1c 128c 256c 1 128 256 Speedup 1c 128c 256c
DES
T C M a l l
Ideal allocator
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
18
A B C D
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
Critical for ordered parallelism
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D Unchecked
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D Unchecked
Critical for ordered parallelism Can cause tasks to lose integrity !
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
19
Speculative tasks can access data written by earlier, uncommitted tasks
1 128 256 Speedup 1c 128c 256c
DES
No forwarding With forwarding
5x
Memory
Free list
Core
D A
Core
A B C D Unchecked
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
D
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory Core
A
Core
A B C D
D
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory Core
A
Core
A B C D
D
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory Core
A
Core
A B C D
D
Unversioned, no conflict checks Only accessible by
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory Core
A
Core
A B C D Unchecked
D
Unversioned, no conflict checks Only accessible by
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory Core
A
Core
A B C D
Unchecked
D
Unversioned, no conflict checks Only accessible by
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory Core
A
Core
A B C D
Unchecked
D
Unversioned, no conflict checks Only accessible by
Untracked memory: protected from tasks that lose integrity
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory Core
A
Core
A B C D
D
Unversioned, no conflict checks Only accessible by
Untracked memory: protected from tasks that lose integrity Vectored call interface: guarantees control-flow integrity in a capsule
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory Core
A
Core
A B C D
D
Unversioned, no conflict checks Only accessible by
Holds the capsule call vector
Untracked memory: protected from tasks that lose integrity Vectored call interface: guarantees control-flow integrity in a capsule
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
20
Tracked memory
Free list
Untracked memory
&malloc &calloc …
Core
A
Core
A B C D
1 64 128 Speedup
genome
1 128 256
des
1 256 512 Speedup 1c 128c 256c
nocsim
1 64 128 1c 128c 256c
silo
1 64 128 Speedup
genome
1 128 256
des
1 256 512 Speedup 1c 128c 256c
nocsim
1 64 128 1c 128c 256c
silo
1 64 128 Speedup
genome
1 128 256
des
1 256 512 Speedup 1c 128c 256c
nocsim
1 64 128 1c 128c 256c
silo
TCMalloc Ideal allocator capalloc
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
21
Capsule-based allocator malloc, etc. are capsule functions metadata resides in untracked memory Only gmean 30% slower than ideal
1 64 128 Speedup
genome
1 128 256
des
1 256 512 Speedup 1c 128c 256c
nocsim
1 64 128 1c 128c 256c
silo
1 64 128 Speedup
genome
1 128 256
des
1 256 512 Speedup 1c 128c 256c
nocsim
1 64 128 1c 128c 256c
silo
1 64 128 Speedup
genome
1 128 256
des
1 256 512 Speedup 1c 128c 256c
nocsim
1 64 128 1c 128c 256c
silo
TCMalloc Ideal allocator capalloc
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
21
Capsule-based allocator malloc, etc. are capsule functions metadata resides in untracked memory Only gmean 30% slower than ideal
Speculative systems should support non-speculative execution to improve efficiency, ease programmability, and enable new capabilities Espresso: an execution model for speculative and non-speculative tasks
Capsules: speculative tasks safely invoke software-managed speculation
HARMONIZING SPECULATIVE AND NON-SPECULATIVE EXECUTION IN ARCHITECTURES FOR ORDERED PARALLELISM
22