PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY - - PowerPoint PPT Presentation

programming
SMART_READER_LITE
LIVE PREVIEW

PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY - - PowerPoint PPT Presentation

AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ WASOWSKI Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding


slide-1
SLIDE 1

AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING

1

SLIDES BY CLAIRE LE GOUES (MOSTLY)

BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ WASOWSKI

Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In ICSE '09. IEEE Computer

slide-2
SLIDE 2

2

slide-3
SLIDE 3

PROBLEM: BUGGY SOFTWARE

“Everyday, almost 300 bugs appear […] far too many for

  • nly the Mozilla programmers

to handle.”

– Mozilla Developer, 2005

Annual cost of software errors in the US: $59.5 billion (0.6% of GDP).

3

90%: Maintenance 10%: Everything Else

Average time to fix a security-critical error: 28 days.

http://www.clairelegoues.com

slide-4
SLIDE 4

HOW DO HUMANS FIX NEW BUGS?

4

http://www.clairelegoues.com

slide-5
SLIDE 5

Mike

(developer)

5

http://www.clairelegoues.com

slide-6
SLIDE 6

6

(Mike’s project)

http://www.clairelegoues.com

??!

slide-7
SLIDE 7

7

printf transformer

http://www.clairelegoues.com

slide-8
SLIDE 8

8

Input:

2 5 6 1 3 4 8 7 9 1 1 1 1 2

http://www.clairelegoues.com

slide-9
SLIDE 9

9

Input:

2 5 6 1 3 4 8 7 9 1 1 1 1 2

Legend Likely faultyability Maybe faultyobabilit Not faulty

http://www.clairelegoues.com

slide-10
SLIDE 10
  • Test cases scalably inform about

program behavior

  • Use test cases to evaluate

candidate repairs

  • Existing program code contains the

seeds of many repairs

  • Better use existing developer

expertise than invent new code

10

SECRET SAUCES

http://www.clairelegoues.com

slide-11
SLIDE 11

Given a program and a set of test cases, conduct a biased, random search for a set of edits to a program that fixes a given bug.

11

APPROACH

http://www.clairelegoues.com

slide-12
SLIDE 12

GENETIC PROGRAMMING: the application of evolutionary or genetic algorithms to program source code.

12

http://www.clairelegoues.com

slide-13
SLIDE 13

INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT MUTATE

13

slide-14
SLIDE 14

MUTATE DISCARD INPUT EVALUATE FITNESS ACCEPT OUTPUT

14

slide-15
SLIDE 15

GENETIC SEARCH

15

  • Fig. courtesy of Hossam Faris. https://www.researchgate.net/figure/Flow-chart-of-the-genetic-programming-approach_fig2_253458069
slide-16
SLIDE 16

An individual is a candidate patch or set of changes to the input program. A patch is a series of statement-level edits:

  • delete X
  • replace X with Y
  • insert Y after X.

Replace/insert: pick Y from somewhere else in the program. We are not touching the tests.

16

INDIVIDUAL CANDIDATES (INITIAL POPULATION)

Reduces search space by at least 2—10x

http://www.clairelegoues.com

slide-17
SLIDE 17

To mutate an individual, we add a new random edits to a given patch.

  • (or we generate a new individual by generating a couple of

random edits to make a new patch)

  • We are not touching the tests

17

MUTATION: HOW

http://www.clairelegoues.com

slide-18
SLIDE 18

Hypothesis: statements executed only by the failing test case(s) should be weighted more heavily than those also executed by the passing test cases.

18

SEARCH SPACE: FAULT LOCALIZATION

http://www.clairelegoues.com

slide-19
SLIDE 19
  • Instrument the program to record lines visited during tests
  • The positive test case gcd(1071,1029)
  • visits lines 2–3 and 6–13
  • The negative test case gcd(0,55)
  • visits lines 2–5, 6–7, and 9–10

When selecting portions of the program to modify we favor those:

  • Were visited during the negative test case
  • Were not also visited during the positive one
  • In this example, repairs are focused on lines 4–5
  • This particular fault localization heuristics (custom for this

paper) turned out not to be very good in long run. We return to this later.

19

FAULT LOCALIZATION

slide-20
SLIDE 20

20 1 void gcd(int a, int b) { 2

if (a == 0) {

3

printf(“%d”, b);

4

}

5

while (b > 0) {

6

if (a > b)

7

a = a – b;

8

else

9

b = b – a;

10

}

11

printf(“%d”, a);

12

return;

13 }

>

http://www.clairelegoues.com

slide-21
SLIDE 21

21 1 void gcd(int a, int b) { 2

if (a == 0) {

3

printf(“%d”, b);

4

}

5

while (b > 0) {

6

if (a > b)

7

a = a – b;

8

else

9

b = b – a;

10

}

11

printf(“%d”, a);

12

return;

13 }

> gcd(4,2) > 2 > > gcd(1071,1029) > 21 > > gcd(0,55) > 55

http://www.clairelegoues.com

(looping forever)

slide-22
SLIDE 22

22 1 void gcd(int a, int b) { 2

if (a == 0) {

3

printf(“%d”, b);

4

}

5

while (b > 0) {

6

if (a > b)

7

a = a – b;

8

else

9

b = b – a;

10

}

11

printf(“%d”, a);

12

return;

13 }

(a=0; b=55) true > 55 (a=0; b=55) true false b = 55 - 0

http://www.clairelegoues.com

!

slide-23
SLIDE 23

23 printf(b) {block} while (b>0) {block} {block} if(a==0) if(a>b) a = a – b {block} {block} printf(a) return b = b – a

Input:

Legend High change probability Low change probability Not changed

http://www.clairelegoues.com

{block}

slide-24
SLIDE 24

24 printf(b) {block} while (b>0) {block} {block} {block} if(a==0) if(a>b) a = a – b {block} {block} printf(a) return b = b – a

Input:

An edit is:

  • Insert statement X

after statement Y

  • Replace statement X

with statement Y

  • Delete statement X

http://www.clairelegoues.com

slide-25
SLIDE 25

25 printf(b) {block} while (b>0) {block} {block} {block} if(a==0) if(a>b) a = a – b {block} {block} printf(a) return b = b – a

Input:

An edit is:

  • Insert statement X

after statement Y

  • Replace statement X

with statement Y

  • Delete statement X

http://www.clairelegoues.com

slide-26
SLIDE 26

26 {block} while (b>0) {block} {block} {block} if(a==0) if(a>b) a = a – b {block} {block} printf(a) return b = b – a

Input:

An edit is:

  • Insert statement X

after statement Y

  • Replace statement X

with statement Y

  • Delete statement X

return printf(b)

http://www.clairelegoues.com

slide-27
SLIDE 27

EVALUATE FITNESS MUTATE

27

INPUT OUTPUT ACCEPT DISCARD

slide-28
SLIDE 28

28

MOTIVATING EXAMPLE (CONT…)

  • Consider the following program variant:

gcd_2(1071,1029) produces 1029 instead of 21

  • Thus, the variants must pass the negative test case

while retaining other core functionality

  • This is enforced through positive test cases
slide-29
SLIDE 29

INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT MUTATE

29

slide-30
SLIDE 30
  • The fitness function returns a number indicating the

acceptability of the program

  • We first compile the variant’s AST to an executable program
  • Then record which test cases are passed by that executable
  • A program variant that does not compile: fitness zero
  • 32.19% of variants failed to compile in our experiment
  • The weights WPosT and WNegT should be positive values

30

FITNESS FUNCTION

slide-31
SLIDE 31
  • Exit(0) is inserted

correctly

  • a = a - b in line 5 is

extraneous

  • Patch minimization (by

search, delta- debugging)

31

PATCH MINIMIZATION

slide-32
SLIDE 32

GenProg can generically fix a variety of bugs in real programs without a priori knowledge. GenProg is human competitive in both expressive power and actual cost.

32

CLAIMS

http://www.clairelegoues.com

slide-33
SLIDE 33

Program Description LOC Bug Type

Time

gcd example 22 infinite loop 153 nullhttpd webserver 5575 heap buffer overflow (code) 578 zune example 28 infinite loop 42 uniq text processing 1146 segmentation fault 34 look-u dictionary lookup 1169 segmentation fault 45 look-s dictionary lookup 1363 infinite loop 55 units metric conversion 1504 segmentation fault 109 deroff document processing 2236 segmentation fault 131 indent code processing 9906 infinite loop 546 flex lexical analyzer generator 18774 segmentation fault 230

  • penldap

directory protocol 292598 non-overflow denial of service 665 ccrypt encryption utility 7515 segmentation fault 330 lighttpd webserver 51895 heap buffer overflow (vars) 394 atris graphical game 21553 local stack buffer exploit 80 php scripting language 764489 integer overflow 56 wu-ftpd FTP server 67029 format string vulnerability 2256 leukocyte computational biology 6718 segmentation fault 360 tiff image processing 84067 segmentation fault 108 imagemagick image processing 450516 wrong output 2160

(s)

slide-34
SLIDE 34

GenProg: scalable, generic, expressive automatic bug repair

  • Genetic programming search for a patch that addresses a

given bug.

  • Render the search tractable by restricting the search space

intelligently. It works!

  • Fixes a variety of bugs in a variety of programs.
  • Repaired 60 of 105 bugs for < $8 each, on average.

Benchmarks/results/source code/VM images available:

  • http://genprog.cs.virginia.edu

34

CONCLUSIONS

http://www.clairelegoues.com

slide-35
SLIDE 35
  • What if we write a new test case? what do we do

about that?

  • Machine learning folks have known for years that

minimization does not affect quality positively: model size can be independent of degree of

  • verfitting. How could we evaluate overfitting?

35

WHAT COULD’VE GONE WRONG?

slide-36
SLIDE 36

36

slide-37
SLIDE 37

SEMFIX: PROGRAM REPAIR VIA SEMANTIC ANALYSIS

37

SemFix: program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13)

slide-38
SLIDE 38

REPAIRING PROGRAMS WITH SEMANTIC CODE SEARCH

Yalin Ke Kathryn T. Stolee Claire Le Goues Yuriy Brun Iowa State Carnegie Mellon UMass Amherst Iowa State

38

  • Y. Ke, K. T. Stolee, C. L. Goues and Y. Brun. Repairing Programs with

Semantic Code Search. In ASE’15

slide-39
SLIDE 39

Does the patch generalize beyond the test cases used to create it?

42

OVERFITTING

Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun, Is the Cure Worse than the Disease? Overfitting in Automated Program Repair, ESEC/FSE 2015.

slide-40
SLIDE 40

44 SearchRepair Input: buggy program, tests Potential patches Output: fixed program Performance

  • n withheld

tests!

slide-41
SLIDE 41

COMPUTE THE MEDIAN OF THREE NUMBERS

45

slide-42
SLIDE 42

int median(int a, int b, int c) { int result; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

46

slide-43
SLIDE 43

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

47

slide-44
SLIDE 44

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

48

slide-45
SLIDE 45

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

49

slide-46
SLIDE 46

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

50

slide-47
SLIDE 47

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

51

slide-48
SLIDE 48

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

52

slide-49
SLIDE 49

int median(int a, int b, int c) { int result = 0; if ((b<=a && a<=c) || (c<=a && a<=b)) result = a; if ((a<b && b <= c) || (c<=b && b<a)) result = b; if ((a<c && c<b) || (b<c && c<a)) result = c; return result; }

53

slide-50
SLIDE 50

int med_broken(int a, int b, int c) { int result; if ((a==b) || (a==c) || (b<a && a<c) || (c<a && a<b)) result = a; else if ((b==c) || (a<b && b<c) || (c<b && b<a)) result = b; else if (a<c && c<b) result = c; return result; }

54

slide-51
SLIDE 51

Instead of trying to make small changes, we replaced buggy regions with code that correctly captures the

  • verall desired logic?

Principle: using human-written code to fix code at a higher granularity level leads to better quality repairs.

WHAT IF…

64

slide-52
SLIDE 52
  • 1. Localize bug to a region.
  • 2. Create input/output examples that show what the

code should do.

  • 3. Use semantic code search to find snippets that do

the right thing.

  • 4. Construct and test candidate patches for each result

from the search.

65

SearchRepair: THE PLAN

slide-53
SLIDE 53

patch construction Snippet DB encoding Profile/Qu eries fault localization + analysis Results 66

slide-54
SLIDE 54

int med_broken(int a, int b, int c) { int result; if ((a==b) || (a==c) || (b<a && a<c) || (c<a && a<b)) result = a; else if ((b==c) || (a<b && b<c) || (c<b && b<a)) result = b; else if (a<c && c<b) result = c; return result; }

67

MODIFIED SB-FAULT LOCALIZATION

Input Expected Pass? 6,2,8 6 ✓ 6,8,2 6 ✓ 8,2,6 6 X 8,6,2 6 ✓

James A. Jones, Mary Jean Harrold, and John Stasko. Visualization of test information to assist fault localization. ICSE 2002.

  • M. Gabel and Z. Su. A study of the uniqueness of source code. FSE, 2010.
slide-55
SLIDE 55

patch construction Snippet DB encoding Profile/Qu eries fault localization + analysis Results 68

slide-56
SLIDE 56

SEARCHREPAIR: HIGH- QUALITY AUTOMATED BUG REPAIR USING SEMANTIC SEARCH

69

slide-57
SLIDE 57

SEMANTIC CODE SEARCH

Keyword: “C median three numbers” Semantic:

Input Expected 2,6,8 6 2,8,6 6 6,2,8 6 6,8,2 6 8,6,2 6 9,9,9 9 70

  • K. T. Stolee, S. Elbaum, M. B. Dwyer, "Code search with input/output queries: Generalizing, ranking, and assessment”, JSS 2015.
  • K. T. Stolee, S. Elbaum, and D. Dobos. 2014. "Solving the Search for Source Code". TOSEM 2014.

Steven P. Reiss. Semantics-based code search. ICSE, 2009.

slide-58
SLIDE 58

Query Results Repository Code Search Engine

71

2,6,8 → 6

slide-59
SLIDE 59

Query Results R a n k i n g Indexing Code Search Engine Matching Repository

72

slide-60
SLIDE 60
  • 1. Store candidate snippets as symbolic constraints.
  • 2. Search using input/output examples that show

what the desired code should do.

  • 3. See which symbolic constraints are co-satisfiable

with the input/output examples constraints (Z3).

SEMANTIC CODE SEARCH

73

slide-61
SLIDE 61

patch construction Snippet DB encoding Profile/Qu eries fault localization + analysis Results 74

Dynamic analysis captures types, values of variables before/after buggy region on the passing test cases.

slide-62
SLIDE 62

int med_broken(int a, int b, int c) { int result; if ((a==b) || (a==c) || (b<a && a<c) || (c<a && a<b)) result = a; else if ((b==c) || (a<b && b<c) || (c<b && b<a)) result = b; else if (a<c && c<b) result = c; return result; }

75 Input Expected Pass? 6,2,8 6 ✓ 6,8,2 6 ✓ 8,2,6 6 X 8,6,2 6 ✓

Input: a=6, b=2, c=8, result=* Output: a=6, b=2, c=8, result=6

slide-63
SLIDE 63

if((x<=y && x>=z)||(x>=y && x<=z)) m = x; else if((y<=x && y>=z)||(y>=x && y<=z)) m = y; else m = z; Input: a=6, b=2, c=8, result=* Output: a=6, b=2, c=8, result=6

Repository Code Search Engine

Match! (Eliding encoding details, but note that SMT solvers provide satisfying models; we use it to establish mapping between snippet and buggy context.) 76

slide-64
SLIDE 64

patch construction Snippet DB encoding Profile/Qu eries fault localization + analysis Results 77

slide-65
SLIDE 65

int med_broken(int a, int b, int c) { int result; if ((a==b) || (a==c) || (b<a && a<c) || (c<a && a<b)) result = a; else if ((b==c) || (a<b && b<c) || (c<b && b<a)) result = b; else if (a<c && c<b) result = c; return result; }

78

slide-66
SLIDE 66

int med_broken(int a, int b, int c) { int result; if ((a==b) || (a==c) || (b<a && a<c) || (c<a && a<b)) result = a; else if ((b==c) || (a<b && b<c) || (c<b && b<a)) result = b; else if (a<c && c<b) result = c; return result; }

79

slide-67
SLIDE 67

int med_broken(int a, int b, int c) { int result; if((x<=y && x>=z)|| (x>=y && x<=z)) m = x; else if((y<=x && y>=z)|| (y>=x && y<=z)) m = y; else m = z; return result; }

80

slide-68
SLIDE 68

int med_broken(int a, int b, int c) { int result; if((a<=b && a>=c)|| (a>=b && a<=c)) result = a; else if((b<=a && b>=c)|| (b>=a && b<=c)) result = b; else result = c; return result; }

81

slide-69
SLIDE 69

int med_broken(int a, int b, int c) { int result; if((a<=b && a>=c)|| (a>=b && a<=c)) result = a; else if((b<=a && b>=c)|| (b>=a && b<=c)) result = b; else result = c; return result; }

82

slide-70
SLIDE 70

int med_broken(int a, int b, int c) { int result; if((a<=b && a>=c)|| (a>=b && a<=c)) result = a; else if((b<=a && b>=c)|| (b>=a && b<=c)) result = b; else result = c; return result; }

83 Input Expected Pass? 6,2,8 6 ✓ 6,8,2 6 ✓ 8,2,6 6 ✓ 8,6,2 6 ✓

slide-71
SLIDE 71

patch construction Snippet DB encoding Profile/Qu eries fault localization + analysis Results 84

slide-72
SLIDE 72

EVALUATION

RECALL GOAL: FIXING BUGS THIS WAY RESULTS IN HIGHER-QUALITY PATCHES.

85

slide-73
SLIDE 73

Program Versions Description checksum 29 check sum of a string digits 91 digits of a number grade 226 grade from score median 168 median of three numbers smallest 155 smallest of four numbers syllables 109 count vowels in string Total 778

INTROCLASS

86

Dataset: benchmark of student-written C programs

Key: two independent test

  • suites. Use one for repair, one

for validation of quality claims!

  • Code DB constructed of
  • ther students’ answers.

Le Goues et al., The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs, TSE 2015

slide-74
SLIDE 74

SUCCESS CRITERIA

METRICS

Defects repaired. Patch quality: percentage of held-out test cases that a patched program passes.

COMPARISON

Previous work:

  • GenProg [1]
  • AE [2]
  • TrpAutoRepair/RSRepair [3,

4]

87

[1] Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest and Westley Weimer. GenProg: A Generic Method for Automated Software Repair. TSE 2012. [2] Westley Weimer, Zachary P. Fry, Stephanie Forrest: Leveraging Program Equivalence for Adaptive Program Repair: Models and First Results. ASE 2013. [3] Y. Qi, X. Mao, and Y. Lei. Efficient automated program repair through fault-recorded testing prioritization. ICSM 2013. [4] Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. The strength of random search on automated program repair. ICSE 2014.

slide-75
SLIDE 75

program SearchRepair AE GenProg Total checksum 8 29 digits 17 30 91 grade 5 2 2 227 median 68 58 108 168 smallest 73 71 120 155 syllables 4 11 19 109 total 150 159 287 778 88

IS THIS A FAILURE?

slide-76
SLIDE 76

92 AE: 1 SearchRepair: 20 GenProg: 32 52 68 RSRepair: 2 10 90 GenProg total: 287 AE total: 159 RSRepair total: 247 SearchRepair total: 150

slide-77
SLIDE 77

Use the second test suite (from KLEE) to assess degree to which the patches generalize beyond the tests used to create them.

  • Recall: Patched programs pass all tests used to create

them by definition.

QUALITY

SearchRepair GenProg RSRepair/ TRPAutoRepair AE 97.2% 68.7% 72.1% 64.2%

93

slide-78
SLIDE 78

SearchRepair uses semantic search to fix bugs by looking for code that does the right thing. Compared to previous work, SearchRepair:

  • Repairs different faults
  • Produces patches of measurably higher quality.

Code at: https://github.com/ProgramRepair/SearchRepair

94

TAKEAWAY

slide-79
SLIDE 79

What makes a high- quality repair?

  • Retains required

functionality.

  • Does not introduce

new bugs.

  • Addresses the cause,

not just the symptom.

95

QUANTITATIVE REPAIR QUALITY

Behavior on held-out workloads. Large-scale black- box fuzz testing. Exploit variant fuzzing.

http://www.clairelegoues.com