Scalable Multi-Core Model Checking Alfons Laarman ( - - PowerPoint PPT Presentation
Scalable Multi-Core Model Checking Alfons Laarman ( - - PowerPoint PPT Presentation
Scalable Multi-Core Model Checking Alfons Laarman ( alfons@laarman.com ), Theory joint work with Jaco van de Pol and Tom van Dijk . 1 / 19 Scalable Multi-Core Model Checking Multi-Core Model Checking Research questions Can model checking
Scalable Multi-Core Model Checking
2 / 19
Multi-Core Model Checking
Research questions
- Can model checking scale (linearly, ideally) on modern multi-cores?
Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c
10 20 30 40 50
- 10
20 30 40 50
Threads Speedup
dfsfifo
- garp
giop2.nomig i−protocol2 leader5
Scalable Multi-Core Model Checking
2 / 19
Multi-Core Model Checking
Research questions
- Can model checking scale (linearly, ideally) on modern multi-cores?
- Are our parallel solutions compatible with other techniques?
Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c
10 20 30 40 50
- 10
20 30 40 50
Threads Speedup
dfsfifo
- garp
giop2.nomig i−protocol2 leader5
+
- Compression techniques
- Symbolic exploration
- Partial-order reduction
Scalable Multi-Core Model Checking
2 / 19
Multi-Core Model Checking
Research questions
- Can model checking scale (linearly, ideally) on modern multi-cores?
- Are our parallel solutions compatible with other techniques?
Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c
10 20 30 40 50
- 10
20 30 40 50
Threads Speedup
dfsfifo
- garp
giop2.nomig i−protocol2 leader5
+
- Compression techniques
- Symbolic exploration
- Partial-order reduction
Related work
- “compiler optimizations diminish the benefits of multi-core processing”
[Holzmann 07]
- “no silver bullet, that would solve all the scalability issues” [Barnat et al. 08]
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
- Cache coherence protocol
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
- Cache coherence protocol
#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; }
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
- Cache coherence protocol
#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
- Cache coherence protocol
#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }
T1 = 27sec T16 = 32sec
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
- Cache coherence protocol (false sharing)
#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }
T1 = 27sec T16 = 32sec
Scalable Multi-Core Model Checking
3 / 19
Challenges
Difficulties of parallelism
- Steep memory hierarchies
- Cache coherence protocol (false sharing)
- Fine-grained operations in model checking (e.g. no subsumption)
#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }
T1 = 27sec T16 = 32sec T16 = 1.8sec
Scalable Multi-Core Model Checking
4 / 19
(Explicit-State) Model Checking
global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; ◆
Scalable Multi-Core Model Checking
4 / 19
(Explicit-State) Model Checking
global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} ◆
Scalable Multi-Core Model Checking
4 / 19
(Explicit-State) Model Checking
global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} Problem: Check all reachable states from s0 ∈ S using next state(S) → 2S with S = ◆k (implicit-)graph search
Scalable Multi-Core Model Checking
4 / 19
(Explicit-State) Model Checking
global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} Problem: Check all reachable states from s0 ∈ S using next state(S) → 2S with S = ◆k (implicit-)graph search
Basis for checking LTL/CTL and timed/probabilistic systems!
Scalable Multi-Core Model Checking
5 / 19
Overview
- 1. Reachability with Shared Hash Table
- 2. Tree Compression
- 3. Symbolic Reachability with Decision Diagrams
Scalable Multi-Core Model Checking
6 / 19
Static partitioning or shared hash table
Worker 1 Worker 2 Worker 3 Worker 4
Queue Queue Queue Queue
store store store store
Static partitioning
X On-the-fly (BFS) ± Scalability (queue contention)
Scalable Multi-Core Model Checking
6 / 19
Static partitioning or shared hash table
Worker 1 Worker 2 Worker 3 Worker 4
Queue Queue Queue Queue
store store store store
Static partitioning
X On-the-fly (BFS) ± Scalability (queue contention)
Load balancer Store Worker 1 Worker 2 Worker 4 Worker 3
Queue Queue Queue Queue
Shared hash table
✓ (Pseudo) DFS & BFS ? Scalability
Scalable Multi-Core Model Checking
7 / 19
Shared hash table
procedure search(p) while balance(Q) do ⊲ with termination detection s := s ∈ Qp; Qp := Qp \ s for all s′ ∈ next state(s) do if s′ V then V := V ∪ {s′} Qp := Qp ∪ {s′} procedure reach(s0,P) V := {s0} Q1 := {s0} search(1) ... search(P)
Scalable Multi-Core Model Checking
7 / 19
Shared hash table
procedure search(p) while balance(Q) do ⊲ with termination detection s := s ∈ Qp; Qp := Qp \ s for all s′ ∈ next state(s) do if s′ V then V := V ∪ {s′}
- atomic
V.find-or-put(s’) Qp := Qp ∪ {s′} procedure reach(s0,P) V := {s0} Q1 := {s0} search(1) ... search(P)
Scalable Multi-Core Model Checking
8 / 19
Lockless Hash Table: Design
Laarman, van de Pol, Weber [fmcad10]
Main bottlenecks
- State store: concurrent access
- Graph traversal: random memory access (bandwidth / latency)
Scalable Multi-Core Model Checking
8 / 19
Lockless Hash Table: Design
Laarman, van de Pol, Weber [fmcad10]
Main bottlenecks
- State store: concurrent access
- Graph traversal: random memory access (bandwidth / latency)
Design
- Open addressing
Scalable Multi-Core Model Checking
8 / 19
Lockless Hash Table: Design
Laarman, van de Pol, Weber [fmcad10]
Main bottlenecks
- State store: concurrent access
- Graph traversal: random memory access (bandwidth / latency)
Design
- Open addressing
- Hash memoization
- Walking the Line
- In-situ locking
|state| data bucket |cache line|
Scalable Multi-Core Model Checking
9 / 19
Experiments from 2010 (BEEM database)
SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)
http://fmt.cs.utwente.nl/tools/ltsmin/
Scalable Multi-Core Model Checking
9 / 19
Experiments from 2010 (BEEM database)
SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)
http://fmt.cs.utwente.nl/tools/ltsmin/
Scalable Multi-Core Model Checking
9 / 19
Experiments from 2010 (BEEM database)
SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)
http://fmt.cs.utwente.nl/tools/ltsmin/
Conclusions
- Scalability comes from limiting memory usage
- Distribute counters!
- No compression, but states are often very similar
due to locality: 3,5,5,4,1,3 3,5,9,3,1,3
Scalable Multi-Core Model Checking
10 / 19
Recursive indexing
[holzmann 97][blom et al. 08]
5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8
HK (K − 1) × H2
Scalable Multi-Core Model Checking
10 / 19
Recursive indexing
[holzmann 97][blom et al. 08]
5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8
HK (K − 1) × H2 N √ N √ N
✓ Combinatorial = ⇒ balanced tree (N + 2 √ N + 4 4
- (N)··· ≈ N)
✓ Compresses states of lenght K to almost 2!
Scalable Multi-Core Model Checking
10 / 19
Recursive indexing
[holzmann 97][blom et al. 08]
5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8
HK (K − 1) × H2 N √ N √ N
✓ Combinatorial = ⇒ balanced tree (N + 2 √ N + 4 4
- (N)··· ≈ N)
✓ Compresses states of lenght K to almost 2! X Hard to parallelize (flatliners)
Scalable Multi-Core Model Checking
11 / 19
Parallel Tree Compression
Laarman, van de Pol, Weber [spin11]
Solution
- Temporary binary tree structure on stack
3,5,5,4,1,3
Scalable Multi-Core Model Checking
11 / 19
Parallel Tree Compression
Laarman, van de Pol, Weber [spin11]
Solution
- Temporary binary tree structure on stack
3,5,5,4,1,3 3 5 4 1 3,5,5 4,1,3 3,5 4,1
Scalable Multi-Core Model Checking
11 / 19
Parallel Tree Compression
Laarman, van de Pol, Weber [spin11]
Solution
- Temporary binary tree structure on stack
- Reuse lockless hash table (merge tables)
4 1 3 5 3,5,5,4,1,3 5 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1
Scalable Multi-Core Model Checking
11 / 19
Parallel Tree Compression
Laarman, van de Pol, Weber [spin11]
Solution
- Temporary binary tree structure on stack
- Reuse lockless hash table (merge tables)
4 1 6 5 1 3 3 5 3,5,5,4,1,3 6 5 1 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1
Scalable Multi-Core Model Checking
11 / 19
Parallel Tree Compression
Laarman, van de Pol, Weber [spin11]
Solution
- Temporary binary tree structure on stack
- Reuse lockless hash table (merge tables)
4 1 6 5 1 3 3 5 2 5 3,5,5,4,1,3 2 5 6 5 1 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1
Scalable Multi-Core Model Checking
11 / 19
Parallel Tree Compression
Laarman, van de Pol, Weber [spin11]
Solution
- Temporary binary tree structure on stack
- Reuse lockless hash table (merge tables)
- Incremental updates: (K − 1) → log2(K − 1) lookups
4 1 6 5 1 3 3 5 2 5 3,5,5,4,1,3 3,5,9,4,1,3 2 5 6 5 1 3 3 5 4 1 ? 5 6 9
Scalable Multi-Core Model Checking
12 / 19
Experiments from 2011 [±300 BEEM models]
Laarman, van de Pol, Weber [spin11]
10 20 30 40 50 60 70 50 100 150 200 250 300
compression factor state length (byte)
Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)
Summary
- Lossless compression up to 4 bytes / state [memics11]
Scalable Multi-Core Model Checking
12 / 19
Experiments from 2011 [±300 BEEM models]
Laarman, van de Pol, Weber [spin11]
10 20 30 40 50 60 70 50 100 150 200 250 300
compression factor state length (byte)
Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)
Summary
- Lossless compression up to 4 bytes / state [memics11]
Scalable Multi-Core Model Checking
12 / 19
Experiments from 2011 [±300 BEEM models]
Laarman, van de Pol, Weber [spin11]
10 20 30 40 50 60 70 50 100 150 200 250 300
compression factor state length (byte)
Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)
Summary
- Lossless compression up to 4 bytes / state [memics11]
- Mean compression of 4.8 bytes / state
- Still same performance (absolute & scalability)
Scalable Multi-Core Model Checking
12 / 19
Experiments from 2011 [±300 BEEM models]
Laarman, van de Pol, Weber [spin11]
10 20 30 40 50 60 70 50 100 150 200 250 300
compression factor state length (byte)
Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)
!"!#$ !"#!$ #"!!$ #!"!!$ #!!"!!$ !"!#$ !"#!$ #"!!$ #!"!!$ #!!"!!$ !"##$%&#'($$ )*&+$,*-.#$%&#'($ %&'(&)*+,$-()*.&$/0&12$ 34$15-&$-()*.&$/0&12$ 6$7$8$
Summary
- Lossless compression up to 4 bytes / state [memics11]
- Mean compression of 4.8 bytes / state
- Still same performance (absolute & scalability)
Parallel tree compression is for free!
Scalable Multi-Core Model Checking
13 / 19
Multi-valued Decision Diagrams
3 4 4 5 5 5 3 1 4 6 1 4 5 8 6 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 4 4 4 4 4 4 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3
- Every path in the MDD represents a concrete state vector
Scalable Multi-Core Model Checking
13 / 19
Multi-valued Decision Diagrams
3 4 4 5 5 5 3 1 4 6 1 4 5 8 6 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 4 4 4 4 4 4 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3
- Every path in the MDD represents a concrete state vector
- Potential compression: exponential (here: 54 → 15)
- Symbolic Reachability: explore sets of states stored as MDDs
Scalable Multi-Core Model Checking
14 / 19
Earlier work: disappointing speedups
Earlier work in parallel BDDs
- early ’90s: vector machines, massive SIMD (not unlike GPU)
- late ’90s: virtual SMP, distributed BDDs (BDDnow)
Scalable Multi-Core Model Checking
14 / 19
Earlier work: disappointing speedups
Earlier work in parallel BDDs
- early ’90s: vector machines, massive SIMD (not unlike GPU)
- late ’90s: virtual SMP, distributed BDDs (BDDnow)
Earlier work in parallel symbolic model checking
- ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
- ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
- CAV’07: L¨
uttgen et al: - parallelisation of saturation with Cilk
- PDMC’09: Ciardo - Difficult, but what is the alternative?
Scalable Multi-Core Model Checking
14 / 19
Earlier work: disappointing speedups
Earlier work in parallel BDDs
- early ’90s: vector machines, massive SIMD (not unlike GPU)
- late ’90s: virtual SMP, distributed BDDs (BDDnow)
Earlier work in parallel symbolic model checking
- ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
- ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
- CAV’07: L¨
uttgen et al: - parallelisation of saturation with Cilk
- PDMC’09: Ciardo - Difficult, but what is the alternative?
Recent developments
- 2012: Tom van Dijk, Alfons Laarman, Jaco van de Pol:
Sylvan, a library for multi-core BDD operations (PDMC’12)
Scalable Multi-Core Model Checking
15 / 19
Symbolic Model Checking based on BDDs
Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:
V := ∅
3:
Q := {s0}
4:
while Q ∅ do
5:
Q := RelProd(R,Q)
6:
Q := Q Minus V
7:
V := V Or Q
X Y Z 1
Scalable Multi-Core Model Checking
15 / 19
Symbolic Model Checking based on BDDs
Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:
V := ∅
3:
Q := {s0}
4:
while Q ∅ do
5:
Q := RelProd(R,Q)
6:
Q := Q Minus V
7:
V := V Or Q
X Y Z 1
Scalable Multi-Core Model Checking
15 / 19
Symbolic Model Checking based on BDDs
Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:
V := ∅
3:
Q := {s0}
4:
while Q ∅ do
5:
Q := RelProd(R,Q)
6:
Q := Q Minus V
7:
V := V Or Q
BDD data structures
- Unique Table (to store BDD nodes)
X Y Z 1
Scalable Multi-Core Model Checking
15 / 19
Symbolic Model Checking based on BDDs
Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:
V := ∅
3:
Q := {s0}
4:
while Q ∅ do
5:
Q := RelProd(R,Q)
6:
Q := Q Minus V
7:
V := V Or Q
BDD data structures
- Unique Table (to store BDD nodes)
1: procedure Or(A,B) 2: 3:
A′,B′ := Reduce A and B one step
4:
A′,B′ := Call recursively Or(A′) and Or(B′)
5:
C := Create new node (A′,B′) in UniqueT
6: 7:
return C
X Y Z 1
Scalable Multi-Core Model Checking
15 / 19
Symbolic Model Checking based on BDDs
Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:
V := ∅
3:
Q := {s0}
4:
while Q ∅ do
5:
Q := RelProd(R,Q)
6:
Q := Q Minus V
7:
V := V Or Q
BDD data structures
- Unique Table (to store BDD nodes)
- Computed Cache (dynamic programming)
1: procedure Or(A,B) 2:
if ∃C : (Or,A,B,C) ∈ ComputedT then return C
3:
A′,B′ := Reduce A and B one step
4:
A′,B′ := Call recursively Or(A′) and Or(B′)
5:
C := Create new node (A′,B′) in UniqueT
6:
Store (Or,A,B,C) in ComputedT
7:
return C
X Y Z 1
Scalable Multi-Core Model Checking
16 / 19
Multi-core Binary Decision Diagrams
Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]
Multi-core BDDs
- Use shared hashtable for Unique Table, Operations Cache
- Parallelize computation tree of recursive operations
Scalable Multi-Core Model Checking
16 / 19
Multi-core Binary Decision Diagrams
Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]
Multi-core BDDs
- Use shared hashtable for Unique Table, Operations Cache
- Parallelize computation tree of recursive operations
Complications for Parallelism
- BDD nodes can be removed
- Irregular task graph
- Fine-grained parallelism
Scalable Multi-Core Model Checking
16 / 19
Multi-core Binary Decision Diagrams
Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]
Multi-core BDDs
- Use shared hashtable for Unique Table, Operations Cache
- Parallelize computation tree of recursive operations
Complications for Parallelism
- BDD nodes can be removed
- Irregular task graph
- Fine-grained parallelism
Solutions
- Garbage collection
- Either using tombstones
- Or using rehashing
Scalable Multi-Core Model Checking
16 / 19
Multi-core Binary Decision Diagrams
Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]
Multi-core BDDs
- Use shared hashtable for Unique Table, Operations Cache
- Parallelize computation tree of recursive operations
Complications for Parallelism
- BDD nodes can be removed
- Irregular task graph
- Fine-grained parallelism
Solutions
- Garbage collection
- Either using tombstones
- Or using rehashing
EMPTY WAIT(h) DONE(h,count) TOMBSTONE cas write data cas +, − cas delete cas
Scalable Multi-Core Model Checking
16 / 19
Multi-core Binary Decision Diagrams
Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]
Multi-core BDDs
- Use shared hashtable for Unique Table, Operations Cache
- Parallelize computation tree of recursive operations
Complications for Parallelism
- BDD nodes can be removed
- Irregular task graph
- Fine-grained parallelism
Solutions
- Garbage collection
- Either using tombstones
- Or using rehashing
- Work-stealing
EMPTY WAIT(h) DONE(h,count) TOMBSTONE cas write data cas +, − cas delete cas
Scalable Multi-Core Model Checking
17 / 19
Parallelizing the BDD Operations
BDD operations Become Tasks
- Organize recursive calls to RelProd, \ and ∪
in a task dependency graph
- Same task might be created several times:
store result in the shared Computed Table
Scalable Multi-Core Model Checking
17 / 19
Parallelizing the BDD Operations
BDD operations Become Tasks
- Organize recursive calls to RelProd, \ and ∪
in a task dependency graph
- Same task might be created several times:
store result in the shared Computed Table
- Fine-grained task-parallelism:
- Parent spawns children for subtasks,
and waits upon their completion
- Load balancing by work-stealing; use e.g. Cilk
[Blumofe ’95] or Wool [Fax´ en ’08] or Lace [Van Dijk ’14]
Scalable Multi-Core Model Checking
17 / 19
Parallelizing the BDD Operations
BDD operations Become Tasks
- Organize recursive calls to RelProd, \ and ∪
in a task dependency graph
- Same task might be created several times:
store result in the shared Computed Table
- Fine-grained task-parallelism:
- Parent spawns children for subtasks,
and waits upon their completion
- Load balancing by work-stealing; use e.g. Cilk
[Blumofe ’95] or Wool [Fax´ en ’08] or Lace [Van Dijk ’14]
Split double-ended queue in public and private part
t s h
- stolen
stealable worker-private
Scalable Multi-Core Model Checking
18 / 19
Results: speedup of BDD operations for model checking
5 10 15 20 25 30 10 20 30 40
Workers Speedup
Model bakery.4 bakery.8 collision.5 iprotocol.7 lifts.4 lifts.7 schedule world.2 schedule world.3
Experiments
- BEEM benchmarks, again
- On 4 × 12 = 48 core NUMA
- Speedup up to 32 (=66.7%)
- Small models don’t scale
(time spent in work stealing) http://fmt.cs.utwente.nl/tools/sylvan/
Scalable Multi-Core Model Checking
19 / 19
Conclusion
Recap
- Concurrent hash table
[fmcad10]
- Tree compression
[spin11]
- DDs (set-based)
[pdmc12]
Scalable Multi-Core Model Checking
19 / 19
Conclusion
Recap
- Concurrent hash table
[fmcad10]
- Tree compression
[spin11]
- DDs (set-based)
[pdmc12]
The Trick
- (Re)design data structures
- Think about memory accesses
Scalable Multi-Core Model Checking
19 / 19
Conclusion
Recap
- Concurrent hash table
[fmcad10]
- Tree compression
[spin11]
- DDs (set-based)
[pdmc12]
The Trick
- (Re)design data structures
- Think about memory accesses
Example
- Divine: Parallel distributed model checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barnat, et al. [pdmc10]
- Parallelizing the spin model checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holzmann [spin12]
- Scalable multi-core model checking fairness enhanced systems . . . . . . . . . . . . . . . . Yang et al. [fmse09]
References
- Scalable Multi-Core Model Checking (PhD thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . .Laarman [UTwente]
- Sylvan: Multi-Core Decision Diagram (PhD thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vDijk [UTwente]
- Boosting Multi-Core Reachability with Shared Hash Tables . . . . . . . . . . . . . Laarman et al. [fmcad10]
- Parallel Recursive State Compression for Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Laarman et al. [spin11]
- Multi-Core BDD Operations for Symbolic Reachability . . van Dijk, Laarman, van de Pol [pdmc12]