Scalable Multi-Core Model Checking Alfons Laarman ( - - PowerPoint PPT Presentation

scalable multi core model checking
SMART_READER_LITE
LIVE PREVIEW

Scalable Multi-Core Model Checking Alfons Laarman ( - - PowerPoint PPT Presentation

Scalable Multi-Core Model Checking Alfons Laarman ( alfons@laarman.com ), Theory joint work with Jaco van de Pol and Tom van Dijk . 1 / 19 Scalable Multi-Core Model Checking Multi-Core Model Checking Research questions Can model checking


slide-1
SLIDE 1

Scalable Multi-Core Model Checking

1 / 19

Scalable Multi-Core Model Checking

Alfons Laarman (alfons@laarman.com), Theory

joint work with Jaco van de Pol and Tom van Dijk.

slide-2
SLIDE 2

Scalable Multi-Core Model Checking

2 / 19

Multi-Core Model Checking

Research questions

  • Can model checking scale (linearly, ideally) on modern multi-cores?

Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c

10 20 30 40 50

  • 10

20 30 40 50

Threads Speedup

dfsfifo

  • garp

giop2.nomig i−protocol2 leader5

slide-3
SLIDE 3

Scalable Multi-Core Model Checking

2 / 19

Multi-Core Model Checking

Research questions

  • Can model checking scale (linearly, ideally) on modern multi-cores?
  • Are our parallel solutions compatible with other techniques?

Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c

10 20 30 40 50

  • 10

20 30 40 50

Threads Speedup

dfsfifo

  • garp

giop2.nomig i−protocol2 leader5

+

  • Compression techniques
  • Symbolic exploration
  • Partial-order reduction
slide-4
SLIDE 4

Scalable Multi-Core Model Checking

2 / 19

Multi-Core Model Checking

Research questions

  • Can model checking scale (linearly, ideally) on modern multi-cores?
  • Are our parallel solutions compatible with other techniques?

Speedup: SP = Tseq/TP Ideal: SP = P Linear: SP = P/c

10 20 30 40 50

  • 10

20 30 40 50

Threads Speedup

dfsfifo

  • garp

giop2.nomig i−protocol2 leader5

+

  • Compression techniques
  • Symbolic exploration
  • Partial-order reduction

Related work

  • “compiler optimizations diminish the benefits of multi-core processing”

[Holzmann 07]

  • “no silver bullet, that would solve all the scalability issues” [Barnat et al. 08]
slide-5
SLIDE 5

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
slide-6
SLIDE 6

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
  • Cache coherence protocol
slide-7
SLIDE 7

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
  • Cache coherence protocol

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; }

slide-8
SLIDE 8

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
  • Cache coherence protocol

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

slide-9
SLIDE 9

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
  • Cache coherence protocol

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

T1 = 27sec T16 = 32sec

slide-10
SLIDE 10

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
  • Cache coherence protocol (false sharing)

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

T1 = 27sec T16 = 32sec

slide-11
SLIDE 11

Scalable Multi-Core Model Checking

3 / 19

Challenges

Difficulties of parallelism

  • Steep memory hierarchies
  • Cache coherence protocol (false sharing)
  • Fine-grained operations in model checking (e.g. no subsumption)

#define B (102410241024) int main (void) { int result = 0; for (int i = 0; i < B; i++) result++; return result; } #define P 16 static void count (void arg) { int counter = (int ) arg; for (int i = 0; i < B / P; i++) ( counter)++; } int main (void) { pthread t thread[P]; int attribute ((aligned(64))) counters[P] = 0; for (int i = 0; i < P; i++) pthread create (&thread[i], NULL, count, &counters[i]); int result = 0; for (int i = 0; i < P; i++) { pthread join (thread[i], NULL); result += counters[i]; } return result; }

T1 = 27sec T16 = 32sec T16 = 1.8sec

slide-12
SLIDE 12

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; ◆

slide-13
SLIDE 13

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} ◆

slide-14
SLIDE 14

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} Problem: Check all reachable states from s0 ∈ S using next state(S) → 2S with S = ◆k (implicit-)graph search

slide-15
SLIDE 15

Scalable Multi-Core Model Checking

4 / 19

(Explicit-State) Model Checking

global x = 7; global y = 3; 1 for (int a = 1 .. 10) 2 x += y; 1 int b = y + x; 2 y = 2 b; S : x,y,a,b,pc1,pc2 s0 = 7,3,0,0,1,1 next state(7,3,0,0,1,1) = {7,3,1,0,2,1,7,3,0,10,1,2} Problem: Check all reachable states from s0 ∈ S using next state(S) → 2S with S = ◆k (implicit-)graph search

Basis for checking LTL/CTL and timed/probabilistic systems!

slide-16
SLIDE 16

Scalable Multi-Core Model Checking

5 / 19

Overview

  • 1. Reachability with Shared Hash Table
  • 2. Tree Compression
  • 3. Symbolic Reachability with Decision Diagrams
slide-17
SLIDE 17

Scalable Multi-Core Model Checking

6 / 19

Static partitioning or shared hash table

Worker 1 Worker 2 Worker 3 Worker 4

Queue Queue Queue Queue

store store store store

Static partitioning

X On-the-fly (BFS) ± Scalability (queue contention)

slide-18
SLIDE 18

Scalable Multi-Core Model Checking

6 / 19

Static partitioning or shared hash table

Worker 1 Worker 2 Worker 3 Worker 4

Queue Queue Queue Queue

store store store store

Static partitioning

X On-the-fly (BFS) ± Scalability (queue contention)

Load balancer Store Worker 1 Worker 2 Worker 4 Worker 3

Queue Queue Queue Queue

Shared hash table

✓ (Pseudo) DFS & BFS ? Scalability

slide-19
SLIDE 19

Scalable Multi-Core Model Checking

7 / 19

Shared hash table

procedure search(p) while balance(Q) do ⊲ with termination detection s := s ∈ Qp; Qp := Qp \ s for all s′ ∈ next state(s) do if s′ V then V := V ∪ {s′} Qp := Qp ∪ {s′} procedure reach(s0,P) V := {s0} Q1 := {s0} search(1) ... search(P)

slide-20
SLIDE 20

Scalable Multi-Core Model Checking

7 / 19

Shared hash table

procedure search(p) while balance(Q) do ⊲ with termination detection s := s ∈ Qp; Qp := Qp \ s for all s′ ∈ next state(s) do if s′ V then V := V ∪ {s′}

  • atomic

V.find-or-put(s’) Qp := Qp ∪ {s′} procedure reach(s0,P) V := {s0} Q1 := {s0} search(1) ... search(P)

slide-21
SLIDE 21

Scalable Multi-Core Model Checking

8 / 19

Lockless Hash Table: Design

Laarman, van de Pol, Weber [fmcad10]

Main bottlenecks

  • State store: concurrent access
  • Graph traversal: random memory access (bandwidth / latency)
slide-22
SLIDE 22

Scalable Multi-Core Model Checking

8 / 19

Lockless Hash Table: Design

Laarman, van de Pol, Weber [fmcad10]

Main bottlenecks

  • State store: concurrent access
  • Graph traversal: random memory access (bandwidth / latency)

Design

  • Open addressing
slide-23
SLIDE 23

Scalable Multi-Core Model Checking

8 / 19

Lockless Hash Table: Design

Laarman, van de Pol, Weber [fmcad10]

Main bottlenecks

  • State store: concurrent access
  • Graph traversal: random memory access (bandwidth / latency)

Design

  • Open addressing
  • Hash memoization
  • Walking the Line
  • In-situ locking

|state| data bucket |cache line|

slide-24
SLIDE 24

Scalable Multi-Core Model Checking

9 / 19

Experiments from 2010 (BEEM database)

SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)

http://fmt.cs.utwente.nl/tools/ltsmin/

slide-25
SLIDE 25

Scalable Multi-Core Model Checking

9 / 19

Experiments from 2010 (BEEM database)

SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)

http://fmt.cs.utwente.nl/tools/ltsmin/

slide-26
SLIDE 26

Scalable Multi-Core Model Checking

9 / 19

Experiments from 2010 (BEEM database)

SPIN 5.2.4 (NASA/JPL) DiVinE 2.2 (Brno,CZ) LTSmin (shared hash table)

http://fmt.cs.utwente.nl/tools/ltsmin/

Conclusions

  • Scalability comes from limiting memory usage
  • Distribute counters!
  • No compression, but states are often very similar

due to locality: 3,5,5,4,1,3 3,5,9,3,1,3

slide-27
SLIDE 27

Scalable Multi-Core Model Checking

10 / 19

Recursive indexing

[holzmann 97][blom et al. 08]

5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8

HK (K − 1) × H2

slide-28
SLIDE 28

Scalable Multi-Core Model Checking

10 / 19

Recursive indexing

[holzmann 97][blom et al. 08]

5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8

HK (K − 1) × H2 N √ N √ N

✓ Combinatorial = ⇒ balanced tree (N + 2 √ N + 4 4

  • (N)··· ≈ N)

✓ Compresses states of lenght K to almost 2!

slide-29
SLIDE 29

Scalable Multi-Core Model Checking

10 / 19

Recursive indexing

[holzmann 97][blom et al. 08]

5 1 2 1 2 1 2 1 2 1 1 1 2 2 2 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 5 4 4 4 4 4 4 4 4 4 4 1 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3 5 6 1 1 2 3 3 5 6 8

HK (K − 1) × H2 N √ N √ N

✓ Combinatorial = ⇒ balanced tree (N + 2 √ N + 4 4

  • (N)··· ≈ N)

✓ Compresses states of lenght K to almost 2! X Hard to parallelize (flatliners)

slide-30
SLIDE 30

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

  • Temporary binary tree structure on stack

3,5,5,4,1,3

slide-31
SLIDE 31

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

  • Temporary binary tree structure on stack

3,5,5,4,1,3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

slide-32
SLIDE 32

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

  • Temporary binary tree structure on stack
  • Reuse lockless hash table (merge tables)

4 1 3 5 3,5,5,4,1,3 5 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

slide-33
SLIDE 33

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

  • Temporary binary tree structure on stack
  • Reuse lockless hash table (merge tables)

4 1 6 5 1 3 3 5 3,5,5,4,1,3 6 5 1 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

slide-34
SLIDE 34

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

  • Temporary binary tree structure on stack
  • Reuse lockless hash table (merge tables)

4 1 6 5 1 3 3 5 2 5 3,5,5,4,1,3 2 5 6 5 1 3 3 5 4 1 3,5,5 4,1,3 3,5 4,1

slide-35
SLIDE 35

Scalable Multi-Core Model Checking

11 / 19

Parallel Tree Compression

Laarman, van de Pol, Weber [spin11]

Solution

  • Temporary binary tree structure on stack
  • Reuse lockless hash table (merge tables)
  • Incremental updates: (K − 1) → log2(K − 1) lookups

4 1 6 5 1 3 3 5 2 5 3,5,5,4,1,3 3,5,9,4,1,3 2 5 6 5 1 3 3 5 4 1 ? 5 6 9

slide-36
SLIDE 36

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

Summary

  • Lossless compression up to 4 bytes / state [memics11]
slide-37
SLIDE 37

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

Summary

  • Lossless compression up to 4 bytes / state [memics11]
slide-38
SLIDE 38

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

Summary

  • Lossless compression up to 4 bytes / state [memics11]
  • Mean compression of 4.8 bytes / state
  • Still same performance (absolute & scalability)
slide-39
SLIDE 39

Scalable Multi-Core Model Checking

12 / 19

Experiments from 2011 [±300 BEEM models]

Laarman, van de Pol, Weber [spin11]

10 20 30 40 50 60 70 50 100 150 200 250 300

compression factor state length (byte)

Hash Table Optimal compression (4 byte / state) Tree compression Mean compression (4.8 byte / state)

!"!#$ !"#!$ #"!!$ #!"!!$ #!!"!!$ !"!#$ !"#!$ #"!!$ #!"!!$ #!!"!!$ !"##$%&#'($$ )*&+$,*-.#$%&#'($ %&'(&)*+,$-()*.&$/0&12$ 34$15-&$-()*.&$/0&12$ 6$7$8$

Summary

  • Lossless compression up to 4 bytes / state [memics11]
  • Mean compression of 4.8 bytes / state
  • Still same performance (absolute & scalability)

Parallel tree compression is for free!

slide-40
SLIDE 40

Scalable Multi-Core Model Checking

13 / 19

Multi-valued Decision Diagrams

3 4 4 5 5 5 3 1 4 6 1 4 5 8 6 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 4 4 4 4 4 4 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3

  • Every path in the MDD represents a concrete state vector
slide-41
SLIDE 41

Scalable Multi-Core Model Checking

13 / 19

Multi-valued Decision Diagrams

3 4 4 5 5 5 3 1 4 6 1 4 5 8 6 1 1 1 1 1 1 6 8 5 6 6 8 5 5 8 4 3 3 4 3 3 4 3 3 5 4 5 5 4 5 5 4 5 4 4 4 4 4 4 4 4 4 5 5 5 6 6 6 3 3 3 3 3 3

  • Every path in the MDD represents a concrete state vector
  • Potential compression: exponential (here: 54 → 15)
  • Symbolic Reachability: explore sets of states stored as MDDs
slide-42
SLIDE 42

Scalable Multi-Core Model Checking

14 / 19

Earlier work: disappointing speedups

Earlier work in parallel BDDs

  • early ’90s: vector machines, massive SIMD (not unlike GPU)
  • late ’90s: virtual SMP, distributed BDDs (BDDnow)
slide-43
SLIDE 43

Scalable Multi-Core Model Checking

14 / 19

Earlier work: disappointing speedups

Earlier work in parallel BDDs

  • early ’90s: vector machines, massive SIMD (not unlike GPU)
  • late ’90s: virtual SMP, distributed BDDs (BDDnow)

Earlier work in parallel symbolic model checking

  • ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
  • ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
  • CAV’07: L¨

uttgen et al: - parallelisation of saturation with Cilk

  • PDMC’09: Ciardo - Difficult, but what is the alternative?
slide-44
SLIDE 44

Scalable Multi-Core Model Checking

14 / 19

Earlier work: disappointing speedups

Earlier work in parallel BDDs

  • early ’90s: vector machines, massive SIMD (not unlike GPU)
  • late ’90s: virtual SMP, distributed BDDs (BDDnow)

Earlier work in parallel symbolic model checking

  • ’00vv: Grumberg et al: vertical splitting of BDDs (distributed)
  • ’00vv: Ciardo et al: horizontal splitting of BDDs (distributed)
  • CAV’07: L¨

uttgen et al: - parallelisation of saturation with Cilk

  • PDMC’09: Ciardo - Difficult, but what is the alternative?

Recent developments

  • 2012: Tom van Dijk, Alfons Laarman, Jaco van de Pol:

Sylvan, a library for multi-core BDD operations (PDMC’12)

slide-45
SLIDE 45

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

X Y Z 1

slide-46
SLIDE 46

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

X Y Z 1

slide-47
SLIDE 47

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

BDD data structures

  • Unique Table (to store BDD nodes)

X Y Z 1

slide-48
SLIDE 48

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

BDD data structures

  • Unique Table (to store BDD nodes)

1: procedure Or(A,B) 2: 3:

A′,B′ := Reduce A and B one step

4:

A′,B′ := Call recursively Or(A′) and Or(B′)

5:

C := Create new node (A′,B′) in UniqueT

6: 7:

return C

X Y Z 1

slide-49
SLIDE 49

Scalable Multi-Core Model Checking

15 / 19

Symbolic Model Checking based on BDDs

Bryant [IEEE Trans. Comp.’86], Burch,Clarke, McMillan [LICS’90] 1: procedure search(s0) 2:

V := ∅

3:

Q := {s0}

4:

while Q ∅ do

5:

Q := RelProd(R,Q)

6:

Q := Q Minus V

7:

V := V Or Q

BDD data structures

  • Unique Table (to store BDD nodes)
  • Computed Cache (dynamic programming)

1: procedure Or(A,B) 2:

if ∃C : (Or,A,B,C) ∈ ComputedT then return C

3:

A′,B′ := Reduce A and B one step

4:

A′,B′ := Call recursively Or(A′) and Or(B′)

5:

C := Create new node (A′,B′) in UniqueT

6:

Store (Or,A,B,C) in ComputedT

7:

return C

X Y Z 1

slide-50
SLIDE 50

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

  • Use shared hashtable for Unique Table, Operations Cache
  • Parallelize computation tree of recursive operations
slide-51
SLIDE 51

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

  • Use shared hashtable for Unique Table, Operations Cache
  • Parallelize computation tree of recursive operations

Complications for Parallelism

  • BDD nodes can be removed
  • Irregular task graph
  • Fine-grained parallelism
slide-52
SLIDE 52

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

  • Use shared hashtable for Unique Table, Operations Cache
  • Parallelize computation tree of recursive operations

Complications for Parallelism

  • BDD nodes can be removed
  • Irregular task graph
  • Fine-grained parallelism

Solutions

  • Garbage collection
  • Either using tombstones
  • Or using rehashing
slide-53
SLIDE 53

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

  • Use shared hashtable for Unique Table, Operations Cache
  • Parallelize computation tree of recursive operations

Complications for Parallelism

  • BDD nodes can be removed
  • Irregular task graph
  • Fine-grained parallelism

Solutions

  • Garbage collection
  • Either using tombstones
  • Or using rehashing

EMPTY WAIT(h) DONE(h,count) TOMBSTONE cas write data cas +, − cas delete cas

slide-54
SLIDE 54

Scalable Multi-Core Model Checking

16 / 19

Multi-core Binary Decision Diagrams

Tom van Dijk, Alfons Laarman, Jaco van de Pol [pdmc’12]

Multi-core BDDs

  • Use shared hashtable for Unique Table, Operations Cache
  • Parallelize computation tree of recursive operations

Complications for Parallelism

  • BDD nodes can be removed
  • Irregular task graph
  • Fine-grained parallelism

Solutions

  • Garbage collection
  • Either using tombstones
  • Or using rehashing
  • Work-stealing

EMPTY WAIT(h) DONE(h,count) TOMBSTONE cas write data cas +, − cas delete cas

slide-55
SLIDE 55

Scalable Multi-Core Model Checking

17 / 19

Parallelizing the BDD Operations

BDD operations Become Tasks

  • Organize recursive calls to RelProd, \ and ∪

in a task dependency graph

  • Same task might be created several times:

store result in the shared Computed Table

slide-56
SLIDE 56

Scalable Multi-Core Model Checking

17 / 19

Parallelizing the BDD Operations

BDD operations Become Tasks

  • Organize recursive calls to RelProd, \ and ∪

in a task dependency graph

  • Same task might be created several times:

store result in the shared Computed Table

  • Fine-grained task-parallelism:
  • Parent spawns children for subtasks,

and waits upon their completion

  • Load balancing by work-stealing; use e.g. Cilk

[Blumofe ’95] or Wool [Fax´ en ’08] or Lace [Van Dijk ’14]

slide-57
SLIDE 57

Scalable Multi-Core Model Checking

17 / 19

Parallelizing the BDD Operations

BDD operations Become Tasks

  • Organize recursive calls to RelProd, \ and ∪

in a task dependency graph

  • Same task might be created several times:

store result in the shared Computed Table

  • Fine-grained task-parallelism:
  • Parent spawns children for subtasks,

and waits upon their completion

  • Load balancing by work-stealing; use e.g. Cilk

[Blumofe ’95] or Wool [Fax´ en ’08] or Lace [Van Dijk ’14]

Split double-ended queue in public and private part

t s h

  • stolen

stealable worker-private

slide-58
SLIDE 58

Scalable Multi-Core Model Checking

18 / 19

Results: speedup of BDD operations for model checking

5 10 15 20 25 30 10 20 30 40

Workers Speedup

Model bakery.4 bakery.8 collision.5 iprotocol.7 lifts.4 lifts.7 schedule world.2 schedule world.3

Experiments

  • BEEM benchmarks, again
  • On 4 × 12 = 48 core NUMA
  • Speedup up to 32 (=66.7%)
  • Small models don’t scale

(time spent in work stealing) http://fmt.cs.utwente.nl/tools/sylvan/

slide-59
SLIDE 59

Scalable Multi-Core Model Checking

19 / 19

Conclusion

Recap

  • Concurrent hash table

[fmcad10]

  • Tree compression

[spin11]

  • DDs (set-based)

[pdmc12]

slide-60
SLIDE 60

Scalable Multi-Core Model Checking

19 / 19

Conclusion

Recap

  • Concurrent hash table

[fmcad10]

  • Tree compression

[spin11]

  • DDs (set-based)

[pdmc12]

The Trick

  • (Re)design data structures
  • Think about memory accesses
slide-61
SLIDE 61

Scalable Multi-Core Model Checking

19 / 19

Conclusion

Recap

  • Concurrent hash table

[fmcad10]

  • Tree compression

[spin11]

  • DDs (set-based)

[pdmc12]

The Trick

  • (Re)design data structures
  • Think about memory accesses

Example

  • Divine: Parallel distributed model checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barnat, et al. [pdmc10]
  • Parallelizing the spin model checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holzmann [spin12]
  • Scalable multi-core model checking fairness enhanced systems . . . . . . . . . . . . . . . . Yang et al. [fmse09]

References

  • Scalable Multi-Core Model Checking (PhD thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . .Laarman [UTwente]
  • Sylvan: Multi-Core Decision Diagram (PhD thesis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vDijk [UTwente]
  • Boosting Multi-Core Reachability with Shared Hash Tables . . . . . . . . . . . . . Laarman et al. [fmcad10]
  • Parallel Recursive State Compression for Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Laarman et al. [spin11]
  • Multi-Core BDD Operations for Symbolic Reachability . . van Dijk, Laarman, van de Pol [pdmc12]