Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken PLDI, Santa - - PowerPoint PPT Presentation

β–Ά
stefan heule eric schkufza rahul sharma alex aiken
SMART_READER_LITE
LIVE PREVIEW

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken PLDI, Santa - - PowerPoint PPT Presentation

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken PLDI, Santa Barbara, June 16, 2016 Symbolic Execution Automatically Program Reason about Verification Programs Program Equivalence 2 Automatically reasoning about


slide-1
SLIDE 1

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken

PLDI, Santa Barbara, June 16, 2016

slide-2
SLIDE 2

2

Automatically Reason about Programs

Symbolic Execution Program Verification Program Equivalence

…

𝜚 ≑ ≑

slide-3
SLIDE 3

Automatically reasoning about programs requires

3

slide-4
SLIDE 4

testq %rdi, %rdi je .L1 xorq %rax, %rax .L0: movq %rdi, %rdx andq $0x1, %rdx addq %rdx, %rax shrq $0x1, %rdi jne .L0 cltq retq .L1: xorq %rax, %rax retq 4

slide-5
SLIDE 5

5

addq $0x1, %rax rax ← rax +64 164 64-bit bit-vector addition 64-bit constant previous value of rax

slide-6
SLIDE 6

6

addq $0x1, %rax rax ← rax +64 164 al ← al +8 18 addb $0x1, %al

slide-7
SLIDE 7

7

addq $0x1, %rax rax ← rax +64 164 addb $0x1, %al

eax

32 bits

ax 16 bits al ah rax

al ← al +8 18

8 bits 64 bits 8 bits

slide-8
SLIDE 8

8

addq $0x1, %rax rax ← rax +64 164 addb $0x1, %al

eax

32 bits

ax 16 bits al ah rax

al ← al +8 18

8 bits 64 bits 8 bits

rax ← rax 63: 8 ∘ rax 7: 0 +8 18

slide-9
SLIDE 9

9

rax ← rax 63: 8 ∘ rax 7: 0 +8 18 rax ← rax[63: 32] ∘ (rax[31: 0] +32 132) addw $0x1, %ax rax ← rax 63: 16 ∘ rax 15: 0 +16 116 addl $0x1, %eax addq $0x1, %rax rax ← rax +64 164 addb $0x1, %al

slide-10
SLIDE 10

10

rax ← rax 63: 8 ∘ rax 7: 0 +8 18 rax ← 032 ∘ (rax[31: 0] +32 132) addw $0x1, %ax rax ← rax 63: 16 ∘ rax 15: 0 +16 116 addl $0x1, %eax addq $0x1, %rax rax ← rax +64 164 addb $0x1, %al

slide-11
SLIDE 11

11

rax ← rax 63: 8 ∘ rax 7: 0 +8 18 rax ← 032 ∘ (rax[31: 0] +32 132) addw $0x1, %ax rax ← rax 63: 16 ∘ rax 15: 0 +16 116 addl $0x1, %eax addq $0x1, %rax rax ← rax +64 164 addb $0x1, %al zf ← 032 = (eax +32 132) cf ← 01 ∘ eax +33 133 [32,32] sf ← eax +32 132 [31,31]

  • f ← Β¬eax 31,31 ∧ (eax +32 132)[31,31]

pf ← (eax +32 132)[0,0] βŠ• (eax +32 132)[1,1] βŠ• (eax +32 132)[2,2] βŠ• (eax +32 132)[3,3] βŠ• (eax +32 132)[4,4] βŠ• (eax +32 132)[5,5] βŠ• (eax +32 132)[6,6] βŠ• (eax +32 132)[7,7]

slide-12
SLIDE 12
  • Manual partial specifications

– CompCert [CACM’09], BAP [CAV’11], BitBlaze [ICISS’08], Codesurfer/x86 [ETAPS’05], McVeto [CAV’10], STOKE [ASPLOS’13], Jakstab [CAV’08], many others

  • Taly/Godefroid [PLDI’12]

– Automatically synthesize specification from templates – Only 534 instructions

13

slide-13
SLIDE 13

14

Bit-vector formulas of input-output behavior

slide-14
SLIDE 14

15

Base set Specify manually Remaining Instructions Learn specification automatically All instructions

slide-15
SLIDE 15

16

Instruction 𝑗 Program π‘ž synthesize combine base formulas Formula 𝜚

Formal guarantee? 𝑗 ≑ 𝜚 How do we synthesize programs?

slide-16
SLIDE 16

17

How do we synthesize programs?

Randomized search Guided by cost function Based on test-cases Using STOKE [ASPLOS’13]

Instruction 𝑗 Program π‘ž synthesize combine base formulas Formula 𝜚

slide-17
SLIDE 17

18

Instruction 𝑗 Program π‘ž synthesize combine base formulas Formula 𝜚

Formal guarantee? 𝑗 ≑ 𝜚

π‘ž ≑ 𝜚

slide-18
SLIDE 18

19

Instruction 𝑗 Program π‘ž synthesize combine base formulas Formula 𝜚

Formal guarantee? 𝑗 ≑ 𝜚

𝑗 ≑ π‘ž ≑ 𝜚

slide-19
SLIDE 19

20

Instruction 𝑗 Program π‘ž synthesize combine base formulas Candidate formula 𝜚

Formal guarantee? 𝑗 ≑ 𝜚

𝑗 ≑ π‘ž ≑ 𝜚

slide-20
SLIDE 20

21

Instruction 𝑗 Program π‘ž synthesize combine base formulas Program π‘žβ€² 𝜚 ֞

? πœšβ€²

yes no βœ” increase confidence Add counter example, remove wrong program(s)

…

Candidate formula 𝜚 Candidate formula πœšβ€² Candidate formula πœšβ€²β€²

slide-21
SLIDE 21

22

𝜚 ֞

? πœšβ€²

Increase confidence Remove incorrect program(s) No information about equivalence

slide-22
SLIDE 22

23

𝜚 ֞

? πœšβ€²

Increase confidence Remove incorrect program(s) No information about equivalence

slide-23
SLIDE 23

24

𝜚 ֞

? πœšβ€²

Increase confidence Remove incorrect program(s) No information about equivalence Equivalence class 1 Equivalence class 2

slide-24
SLIDE 24

25

  • Prefer programs whose formulas are

– Precise (fewest uninterpreted functions) – Fast (fewest non-linear arithmetic operations) – Simple (fewest nodes)

Equivalence class 1 Equivalence class 2 Equivalence class 3

slide-25
SLIDE 25

26

  • Prefer programs whose formulas are

– Precise (fewest uninterpreted functions) – Fast (fewest non-linear arithmetic operations) – Simple (fewest nodes)

Equivalence class 1 Equivalence class 2 Equivalence class 3

slide-26
SLIDE 26

27

synthesize

slide-27
SLIDE 27

28

slide-28
SLIDE 28

29

addw %ax, %dx dx ← dx +16 ax addw %cx, %bx

Learn

bx ← bx +16 cx

Rename

addw (%rsp), %dx dx ← dx +16 M rsp addw $0x5, %dx dx ← dx +16 516

βœ” βœ” βœ”

slide-29
SLIDE 29

1. Learn formula for register-only instructions

  • 2. Generalize formulas

‐ To other types of operands

  • 3. Check on test inputs

30

slide-30
SLIDE 30

31

shufps $0xb3, %xmm0, %xmm1

Solution: Brute force a formula for every constant Problem: No corresponding register-only variant

slide-31
SLIDE 31
  • Base set (51 instructions)

– Integer, bitwise and float operations – Data movement (including conditional move) – Conversion operations

  • Pseudo instructions (11 templates)

– Split and combine registers – Changing status flags

32

slide-32
SLIDE 32
  • Total instructions

3,684

  • Out-of-scope

– System instructions 302 – Crypto instructions 35 – Deprecated instructions 332 – String instructions 97

  • Goal instructions

2,918

33

invpcid, jle aeskeygenassist fadd scasq

slide-33
SLIDE 33
  • Base set

51

  • Pseudo instructions

11

  • Register-only instructions learned

692

  • Generalized

984

  • 8-bit constant instructions learned

119.42

  • Total formulas learned

1,795.42

34

slide-34
SLIDE 34

35

Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent Equivalent with additional lemma 1,377.91 4

slide-35
SLIDE 35

1,431.91 1,377.91 4

36

Compare with handwritten formulas (from STOKE) Available for comparison Automatically proven equivalent Equivalent with additional lemma

fadd 𝑏, 𝑐 = fadd 𝑐, 𝑏

slide-36
SLIDE 36

37

Compare with handwritten formulas (from STOKE) Available for comparison Automatically proven equivalent Equivalent with additional lemma Semantically different Handwritten formula correct Learned formula correct 50 50 1,431.91 1,377.91 4

slide-37
SLIDE 37

38

stratum 𝑗 = ࡝ if 𝑗 ∈ baseset 1 + max

π‘—β€²βˆˆπ‘(𝑗) stratum iβ€²

  • therwise

Stratum 0 Stratum 1 Stratum 2 Stratum 3 base set

slide-38
SLIDE 38

39

stratum 𝑗 = ࡝ if 𝑗 ∈ baseset 1 + max

π‘—β€²βˆˆπ‘(𝑗) stratum iβ€²

  • therwise
slide-39
SLIDE 39

40

100 200 300 400 500 600 700 800 50 100 150 200 250 Number of formulas learned Wall-clock time elapsed [hours] Stratification Without stratification

slide-40
SLIDE 40

41

number of nodes in learned formula number of nodes in handwritten formula Fully inlined: 3526 instructions

slide-41
SLIDE 41
  • 1. Automatically learned 1,795 formulas
  • 2. Stratification key to scale program synthesis
  • 3. Compare to hand-written specification

‐ More correct, equally precise, same size

Source code, formulas, experimental results

42

https://github.com/StanfordPL/strata/

slide-42
SLIDE 42

43

slide-43
SLIDE 43

1. Missing base instructions

Some integer and floating point operations are missing

2. Program synthesis limits

Shortest known program is long and outside of reach e.g., byte-vectorized operation

3. Cost function limitation

For one bit of output, the cost function does not give enough signal

  • 4. Crazy instructions

44

slide-44
SLIDE 44
  • Total decisions

7,075

  • Equivalent

6,669 (94.26%)

  • New equivalence class

356 (5.03%)

  • Counter-examples

50 (0.71%)

  • Timeouts (45 seconds):

3

45

slide-45
SLIDE 45
  • Intel Xeon E5-2697 (28 cores) at 2.6 GHz

– 268.86 hours (register-only) – 159.12 hours (8-bit constants)

  • Total of 11,983.37 core hours

46

slide-46
SLIDE 46
  • Random inputs (random machine state)
  • β€œInteresting” bit-patterns

0, 1, βˆ’1, 2π‘œ, NaN, Infinity

  • Test cases learned from counter-examples

47

slide-47
SLIDE 47
  • Formulas are simplified

– Constant propagation – Move bit-selection over concatenation

264 βˆ—64 464

≑ 864

064 ∘ rax 63,0 ≑ rax

48

slide-48
SLIDE 48
  • Formula precision (number of uninterpreted functions)

– Learned formulas equally precise in all but 4 cases

  • Formula quality (number of non-linear operations)

– Learned formulas contain same number of non- linear operations, except for 11 cases

49