[PPT] - Scalable Multi-Core Model Checking Alfons Laarman ( PowerPoint Presentation

SLIDE 1

Scalable Multi-Core Model Checking

1 / 19

Scalable Multi-Core Model Checking

Alfons Laarman (alfons@laarman.com), Theory

joint work with Jaco van de Pol and Tom van Dijk.

SLIDE 2

Scalable Multi-Core Model Checking

2 / 19

Multi-Core Model Checking

Research questions

Can model checking scale (linearly, ideally) on modern multi-cores?

Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c

10 20 30 40 50

10

20 30 40 50

Threads Speedup

dfsfifo

garp

giop2.nomig i−protocol2 leader5

SLIDE 3

Scalable Multi-Core Model Checking

2 / 19

Multi-Core Model Checking

Research questions

Can model checking scale (linearly, ideally) on modern multi-cores?
Are our parallel solutions compatible with other techniques?

Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c

10 20 30 40 50

10

20 30 40 50

Threads Speedup

dfsfifo

garp

giop2.nomig i−protocol2 leader5

+

Compression techniques
Symbolic exploration
Partial-order reduction

SLIDE 4

Scalable Multi-Core Model Checking

2 / 19

Multi-Core Model Checking

Research questions

Can model checking scale (linearly, ideally) on modern multi-cores?
Are our parallel solutions compatible with other techniques?

Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c

10 20 30 40 50

10

20 30 40 50

Threads Speedup

dfsfifo

garp

giop2.nomig i−protocol2 leader5

+

Compression techniques
Symbolic exploration
Partial-order reduction

Related work

“compiler optimizations diminish the benefits of multi-core processing”

[Holzmann 07]

“no silver bullet, that would solve all the scalability issues” [Barnat et al. 08]

SLIDE 5

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies

SLIDE 6

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies
Cache coherence protocol

SLIDE 7

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies
Cache coherence protocol

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; }

SLIDE 8

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies
Cache coherence protocol

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

SLIDE 9

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies
Cache coherence protocol

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

T1 = 27sec T16 = 32sec

SLIDE 10

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies
Cache coherence protocol (false sharing)

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

T1 = 27sec T16 = 32sec

SLIDE 11

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

Steep memory hierarchies
Cache coherence protocol (false sharing)
Fine-grained operations in model checking (e.g. no subsumption)

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

T1 = 27sec T16 = 32sec T16 = 1.8sec

SLIDE 12

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; ◆

SLIDE 13

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} ◆

SLIDE 14

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} Problem: Check all reachable states from s0 ∈ S using next state(S) → 2S with S = ◆k (implicit-)graph search

SLIDE 15

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} Problem: Check all reachable states from s0 ∈ S using next state(S) → 2S with S = ◆k (implicit-)graph search

Basis for checking LTL/CTL and timed/probabilistic systems!

SLIDE 16

Scalable Multi-Core Model Checking

5 / 19

Overview

1. Reachability with Shared Hash Table
2. Tree Compression
3. Symbolic Reachability with Decision Diagrams

SLIDE 17

Scalable Multi-Core Model Checking

6 / 19

Static partitioning or shared hash table

Worker 1 Worker 2 Worker 3 Worker 4

Queue Queue Queue Queue

store store store store

Static partitioning

X On-the-fly (BFS) ± Scalability (queue contention)

SLIDE 18

Scalable Multi-Core Model Checking

6 / 19

Static partitioning or shared hash table

Worker 1 Worker 2 Worker 3 Worker 4

Queue Queue Queue Queue

store store store store

Static partitioning

X On-the-fly (BFS) ± Scalability (queue contention)

Load balancer Store Worker 1 Worker 2 Worker 4 Worker 3

Queue Queue Queue Queue

Shared hash table

✓ (Pseudo) DFS & BFS ? Scalability

SLIDE 19

Scalable Multi-Core Model Checking

7 / 19

Shared hash table

procedure search(p) while balance(Q) do ⊲ with termination detection s := s ∈ Qp; Qp := Qp \ s for all s′ ∈ next state(s) do if s′ V then V := V ∪ {s′} Qp := Qp ∪ {s′} procedure reach(s0,P) V := {s0} Q1 := {s0} search(1) ... search(P)

SLIDE 20

Scalable Multi-Core Model Checking

7 / 19

Shared hash table

procedure search(p) while balance(Q) do ⊲ with termination detection s := s ∈ Qp; Qp := Qp \ s for all s′ ∈ next state(s) do if s′ V then V := V ∪ {s′}

atomic

V.find-or-put(s’) Qp := Qp ∪ {s′} procedure reach(s0,P) V := {s0} Q1 := {s0} search(1) ... search(P)

SLIDE 21

Scalable Multi-Core Model Checking

8 / 19

Lockless Hash Table: Design

Laarman, van de Pol, Weber [fmcad10]

Main bottlenecks

State store: concurrent access
Graph traversal: random memory access (bandwidth / latency)

SLIDE 22

Scalable Multi-Core Model Checking

8 / 19

Lockless Hash Table: Design

Laarman, van de Pol, Weber [fmcad10]

Main bottlenecks

State store: concurrent access
Graph traversal: random memory access (bandwidth / latency)

Design

Open addressing

SLIDE 23

Scalable Multi-Core Model Checking

8 / 19

Lockless Hash Table: Design

Laarman, van de Pol, Weber [fmcad10]

Main bottlenecks

State store: concurrent access
Graph traversal: random memory access (bandwidth / latency)

Design

Open addressing
Hash memoization
Walking the Line
In-situ locking

|state| data bucket |cache line|

SLIDE 24

Scalable Multi-Core Model Checking

9 / 19

Experiments from 2010 (BEEM database)

SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)

http://fmt.cs.utwente.nl/tools/ltsmin/

SLIDE 25

Scalable Multi-Core Model Checking

9 / 19

Experiments from 2010 (BEEM database)

SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)

http://fmt.cs.utwente.nl/tools/ltsmin/

SLIDE 26

Scalable Multi-Core Model Checking

9 / 19

Experiments from 2010 (BEEM database)

SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)

http://fmt.cs.utwente.nl/tools/ltsmin/

Conclusions

Scalability comes from limiting memory usage
Distribute counters!
No compression, but states are often very similar

due to locality: 3,5,5,4,1,3 3,5,9,3,1,3

SLIDE 27

Scalable Multi-Core Model Checking

10 / 19

Recursive indexing

[holzmann 97][blom et al. 08]

5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8

HK (K − 1) × H2

SLIDE 28

Scalable Multi-Core Model Checking

10 / 19

Recursive indexing

[holzmann 97][blom et al. 08]

5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8

HK (K − 1) × H2 N √ N √ N

✓ Combinatorial = ⇒ balanced tree (N + 2 √ N + 4 4

(N)··· ≈ N)

✓ Compresses states of lenght K to almost 2!

SLIDE 29

Scalable Multi-Core Model Checking

10 / 19

Recursive indexing

[holzmann 97][blom et al. 08]

5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8

HK (K − 1) × H2 N √ N √ N

✓ Combinatorial = ⇒ balanced tree (N + 2 √ N + 4 4

(N)··· ≈ N)

✓ Compresses states of lenght K to almost 2! X Hard to parallelize (flatliners)

SLIDE 30

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

Temporary binary tree structure on stack

3,5,5,4,1,3

SLIDE 31

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

Temporary binary tree structure on stack

3,5,5,4,1,3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

SLIDE 32

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

Temporary binary tree structure on stack
Reuse lockless hash table (merge tables)

4 1 3 5 3,5,5,4,1,3 5 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

SLIDE 33

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

Temporary binary tree structure on stack
Reuse lockless hash table (merge tables)

4 1 6 5 1 3 3 5 3,5,5,4,1,3 6 5 1 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

SLIDE 34

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

Temporary binary tree structure on stack
Reuse lockless hash table (merge tables)

4 1 6 5 1 3 3 5 2 5 3,5,5,4,1,3 2 5 6 5 1 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

SLIDE 35

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

Temporary binary tree structure on stack
Reuse lockless hash table (merge tables)
Incremental updates: (K − 1) → log2(K − 1) lookups

4 1 6 5 1 3 3 5 2 5 3,5,5,4,1,3 3,5,9,4,1,3 2 5 6 5 1 3 3 5 4 1 ? 5 6 9

SLIDE 36

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

Summary

Lossless compression up to 4 bytes / state [memics11]

SLIDE 37

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

Summary

Lossless compression up to 4 bytes / state [memics11]

SLIDE 38

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

Summary

Lossless compression up to 4 bytes / state [memics11]
Mean compression of 4.8 bytes / state
Still same performance (absolute & scalability)

SLIDE 39

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

!"!#$ !"#!$ #"!!$ #!"!!$ #!!"!!$ !"!#$ !"#!$ #"!!$ #!"!!$ #!!"!!$ !"##$%&#'($$ )*&+$,*-.#$%&#'($ %&'(&)*+,$-()*.&$/0&12$ 34$15-&$-()*.&$/0&12$ 6$7$8$

Summary

Lossless compression up to 4 bytes / state [memics11]
Mean compression of 4.8 bytes / state
Still same performance (absolute & scalability)

Parallel tree compression is for free!

SLIDE 40

Scalable Multi-Core Model Checking

13 / 19

Multi-valued Decision Diagrams

3 4 4 5 5 5 3 1 4 6 1 4 5 8 6 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 4 4 4 4 4 4 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3

Every path in the MDD represents a concrete state vector

SLIDE 41

Scalable Multi-Core Model Checking

13 / 19

Multi-valued Decision Diagrams

3 4 4 5 5 5 3 1 4 6 1 4 5 8 6 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 4 4 4 4 4 4 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3

Every path in the MDD represents a concrete state vector
Potential compression: exponential (here: 54 → 15)
Symbolic Reachability: explore sets of states stored as MDDs

SLIDE 42

Scalable Multi-Core Model Checking

14 / 19

Earlier work: disappointing speedups

Earlier work in parallel BDDs

early ’90s: vector machines, massive SIMD (not unlike GPU)
late ’90s: virtual SMP, distributed BDDs (BDDnow)

SLIDE 43

Scalable Multi-Core Model Checking

14 / 19

Earlier work: disappointing speedups

Earlier work in parallel BDDs

early ’90s: vector machines, massive SIMD (not unlike GPU)
late ’90s: virtual SMP, distributed BDDs (BDDnow)

Earlier work in parallel symbolic model checking

’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
CAV’07: L¨

uttgen et al: - parallelisation of saturation with Cilk

PDMC’09: Ciardo - Difficult, but what is the alternative?

SLIDE 44

Scalable Multi-Core Model Checking

14 / 19

Earlier work: disappointing speedups

Earlier work in parallel BDDs

early ’90s: vector machines, massive SIMD (not unlike GPU)
late ’90s: virtual SMP, distributed BDDs (BDDnow)

Earlier work in parallel symbolic model checking

’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
CAV’07: L¨

uttgen et al: - parallelisation of saturation with Cilk

PDMC’09: Ciardo - Difficult, but what is the alternative?

Recent developments

2012: Tom van Dijk, Alfons Laarman, Jaco van de Pol:

Sylvan, a library for multi-core BDD operations (PDMC’12)

SLIDE 45

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

X Y Z 1

SLIDE 46

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

X Y Z 1

SLIDE 47

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

BDD data structures

Unique Table (to store BDD nodes)

X Y Z 1

SLIDE 48

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

BDD data structures

Unique Table (to store BDD nodes)

1: procedure Or(A,B) 2: 3:

A′,B′ := Reduce A and B one step

4:

A′,B′ := Call recursively Or(A′) and Or(B′)

5:

C := Create new node (A′,B′) in UniqueT

6: 7:

return C

X Y Z 1

SLIDE 49

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

BDD data structures

Unique Table (to store BDD nodes)
Computed Cache (dynamic programming)

1: procedure Or(A,B) 2:

if ∃C : (Or,A,B,C) ∈ ComputedT then return C

3:

A′,B′ := Reduce A and B one step

4:

A′,B′ := Call recursively Or(A′) and Or(B′)

5:

C := Create new node (A′,B′) in UniqueT

6:

Store (Or,A,B,C) in ComputedT

7:

return C

X Y Z 1

SLIDE 50

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

Use shared hashtable for Unique Table, Operations Cache
Parallelize computation tree of recursive operations

SLIDE 51

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

Use shared hashtable for Unique Table, Operations Cache
Parallelize computation tree of recursive operations

Complications for Parallelism

BDD nodes can be removed
Irregular task graph
Fine-grained parallelism

SLIDE 52

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

Use shared hashtable for Unique Table, Operations Cache
Parallelize computation tree of recursive operations

Complications for Parallelism

BDD nodes can be removed
Irregular task graph
Fine-grained parallelism

Solutions

Garbage collection
Either using tombstones
Or using rehashing

SLIDE 53

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

Use shared hashtable for Unique Table, Operations Cache
Parallelize computation tree of recursive operations

Complications for Parallelism

BDD nodes can be removed
Irregular task graph
Fine-grained parallelism

Solutions

Garbage collection
Either using tombstones
Or using rehashing

EMPTY WAIT(h) DONE(h,count) TOMBSTONE cas write data cas +, − cas delete cas

SLIDE 54

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

Use shared hashtable for Unique Table, Operations Cache
Parallelize computation tree of recursive operations

Complications for Parallelism

BDD nodes can be removed
Irregular task graph
Fine-grained parallelism

Solutions

Garbage collection
Either using tombstones
Or using rehashing
Work-stealing

EMPTY WAIT(h) DONE(h,count) TOMBSTONE cas write data cas +, − cas delete cas

SLIDE 55

Scalable Multi-Core Model Checking

17 / 19

Parallelizing the BDD Operations

BDD operations Become Tasks

Organize recursive calls to RelProd, \ and ∪

in a task dependency graph

Same task might be created several times:

store result in the shared Computed Table

SLIDE 56

Scalable Multi-Core Model Checking

17 / 19

Parallelizing the BDD Operations

BDD operations Become Tasks

Organize recursive calls to RelProd, \ and ∪

in a task dependency graph

Same task might be created several times:

store result in the shared Computed Table

Fine-grained task-parallelism:
Parent spawns children for subtasks,

and waits upon their completion

Load balancing by work-stealing; use e.g. Cilk

[Blumofe ’95] or Wool [Fax´ en ’08] or Lace [Van Dijk ’14]

SLIDE 57

Scalable Multi-Core Model Checking

17 / 19

Parallelizing the BDD Operations

BDD operations Become Tasks

Organize recursive calls to RelProd, \ and ∪

in a task dependency graph

Same task might be created several times:

store result in the shared Computed Table

Fine-grained task-parallelism:
Parent spawns children for subtasks,

and waits upon their completion

Load balancing by work-stealing; use e.g. Cilk

[Blumofe ’95] or Wool [Fax´ en ’08] or Lace [Van Dijk ’14]

Split double-ended queue in public and private part

t s h

stolen

stealable worker-private

SLIDE 58

Scalable Multi-Core Model Checking

18 / 19

Results: speedup of BDD operations for model checking

5 10 15 20 25 30 10 20 30 40

Workers Speedup

Model bakery.4 bakery.8 collision.5 iprotocol.7 lifts.4 lifts.7 schedule world.2 schedule world.3

Experiments

BEEM benchmarks, again
On 4 × 12 = 48 core NUMA
Speedup up to 32 (=66.7%)
Small models don’t scale

(time spent in work stealing) http://fmt.cs.utwente.nl/tools/sylvan/

SLIDE 59

Scalable Multi-Core Model Checking

19 / 19

Conclusion

Recap

Concurrent hash table

[fmcad10]

Tree compression

[spin11]

DDs (set-based)

[pdmc12]

SLIDE 60

Scalable Multi-Core Model Checking

19 / 19

Conclusion

Recap

Concurrent hash table

[fmcad10]

Tree compression

[spin11]

DDs (set-based)

[pdmc12]

The Trick

(Re)design data structures
Think about memory accesses

SLIDE 61

Scalable Multi-Core Model Checking

19 / 19

Conclusion

Recap

Concurrent hash table

[fmcad10]

Tree compression

[spin11]

DDs (set-based)

[pdmc12]

The Trick

(Re)design data structures
Think about memory accesses

Example

Divine: Parallel distributed model checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barnat, et al. [pdmc10]
Parallelizing the spin model checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holzmann [spin12]
Scalable multi-core model checking fairness enhanced systems . . . . . . . . . . . . . . . . Yang et al. [fmse09]

References

Scalable Multi-Core Model Checking (PhD thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . .Laarman [UTwente]
Sylvan: Multi-Core Decision Diagram (PhD thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vDijk [UTwente]
Boosting Multi-Core Reachability with Shared Hash Tables . . . . . . . . . . . . . Laarman et al. [fmcad10]
Parallel Recursive State Compression for Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Laarman et al. [spin11]
Multi-Core BDD Operations for Symbolic Reachability . . van Dijk, Laarman, van de Pol [pdmc12]