Occam : Automated Software Winnowing Gregory Malecha 1 Ashish Gehani - - PowerPoint PPT Presentation

occam automated software winnowing
SMART_READER_LITE
LIVE PREVIEW

Occam : Automated Software Winnowing Gregory Malecha 1 Ashish Gehani - - PowerPoint PPT Presentation

Occam : Automated Software Winnowing Gregory Malecha 1 Ashish Gehani 2 Natarajan Shankar 2 1 Harvard 2 SRI Malecha, Gehani, Shankar Occam : Automated Software Winnowing 1 / 18 A story of success... Software engineering has been so successful


slide-1
SLIDE 1

Occam: Automated Software Winnowing

Gregory Malecha 1 Ashish Gehani2 Natarajan Shankar 2

1Harvard 2SRI Malecha, Gehani, Shankar Occam: Automated Software Winnowing 1 / 18

slide-2
SLIDE 2

A story of success...

Software engineering has been so successful that it’s easy to write incredibly complex pieces of code.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 2 / 18

slide-3
SLIDE 3

A story of success...

Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple

MiniBlog – “Simple” PHP blogging application

683 lines of PHP code Depends on PHP & MySQL

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 2 / 18

slide-4
SLIDE 4

A story of success...

Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple

MiniBlog – “Simple” PHP blogging application

683 lines of PHP code Depends on PHP & MySQL

PHP – Programming language interpreter

625,000 lines of C Depends on LibC

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 2 / 18

slide-5
SLIDE 5

A story of success...

Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple

MiniBlog – “Simple” PHP blogging application

683 lines of PHP code Depends on PHP & MySQL

PHP – Programming language interpreter

625,000 lines of C Depends on LibC

LibC – C standard runtime library

366,000 lines of C

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 2 / 18

slide-6
SLIDE 6

A story of success...

Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple

MiniBlog – “Simple” PHP blogging application

683 lines of PHP code Depends on PHP & MySQL

PHP – Programming language interpreter

625,000 lines of C Depends on LibC

LibC – C standard runtime library

366,000 lines of C

Where could there possibly be a bug?

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 2 / 18

slide-7
SLIDE 7

Winnowing

Outline

1 Reduce the “functionality” of a system

thttpd is a simple webserver. It doesn’t need to be able to listen on arbitrary ports. Make configuration options static.

2 Overcome static analysis limitations

Miniblog should never send email, so that functionality should not be in the system. We need to cut it out, since mail is in the PHP standard library (compiled into the interpreter!).

3 Monitor systems and enforce dynamic policies

Log function calls as the program runs. Check security properties.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 3 / 18

slide-8
SLIDE 8

Winnowing

Reducing Functionality

Program: thttpd Size: 11,322 lines Problems

Uses potentially dangerous functions like listen , connect, etc. Reads configuration data from the command line.

Solutions

Limit the ways that dangerous functions can be called. Compile configuration data into the program.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 4 / 18

slide-9
SLIDE 9

Winnowing Winnowing a Single Module

Module Winnowing Overview

main.bc (llvm) main.c a.out Winnowing (0) compile (4) link (1) “partial evaluation” (2) specialize (3) reduction

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 5 / 18

slide-10
SLIDE 10

Winnowing Winnowing a Single Module

(1) “Partial Evaluation”

Simplify the program as much as possible, want to expose constants.

foo(int x, int y) { bar(x, 1 + 2); bar(2*5, y); } bar(int a, int b) { ...a...b...}

Use LLVM’s -O3.

slide-11
SLIDE 11

Winnowing Winnowing a Single Module

(1) “Partial Evaluation”

Simplify the program as much as possible, want to expose constants.

foo(int x, int y) { bar(x, 1 + 2); bar(2*5, y); } bar(int a, int b) { ...a...b...} foo(int x, int y) { bar(x, 3); bar(10, y); } bar(int a, int b) { ...a...b... }

Use LLVM’s -O3.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 6 / 18

slide-12
SLIDE 12

Winnowing Winnowing a Single Module

(2) Specialization

Specialize functions when they take constant arguments.

foo(int x, int y) { bar(x, 3); bar(10, y); } bar(int a, int b) { ...a...b... }

Duplicate functions and inline constants using a custom LLVM pass.

slide-13
SLIDE 13

Winnowing Winnowing a Single Module

(2) Specialization

Specialize functions when they take constant arguments.

foo(int x, int y) { bar(x, 3); bar(10, y); } bar(int a, int b) { ...a...b... } foo(int x, int y) { bar’(x); bar”(y); } bar’(int a) { ...a...3... } bar”(int b) { ...10...b... } bar(int a, int b) { ...a...b... }

Duplicate functions and inline constants using a custom LLVM pass.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 7 / 18

slide-14
SLIDE 14

Winnowing Winnowing a Single Module

(3) Reduction

Eliminate unused code.

foo(int x, int y) { bar’(x); bar’’(y); } bar’(int a) { ...a...3... } bar’’(int b) { ...10...b... } bar(int a, int b) { ...a...b... }

LLVM dead-code/global elimination pass.

slide-15
SLIDE 15

Winnowing Winnowing a Single Module

(3) Reduction

Eliminate unused code.

foo(int x, int y) { bar’(x); bar’’(y); } bar’(int a) { ...a...3... } bar’’(int b) { ...10...b... } bar(int a, int b) { ...a...b... } foo(int x, int y) { bar’(x); bar’’(y); } bar’(int a) { ...a...3... } bar’’(int b) { ...10...b... }

LLVM dead-code/global elimination pass.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 8 / 18

slide-16
SLIDE 16

Making Dynamics into Statics

Outline

1 Reduce the “functionality” of a system

thttpd is a simple webserver. It doesn’t need to be able to listen on arbitrary ports. Make configuration options static.

2 Overcome static analysis limitations

Miniblog should never send email, so that functionality should not be in the system. We need to cut it out, since mail is in the PHP standard library (compiled into the interpreter!).

3 Monitor systems and enforce dynamic policies

Log function calls as the program runs. Check security properties.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 9 / 18

slide-17
SLIDE 17

Making Dynamics into Statics

Specializing PHP

Program: MiniBlog & PHP Interpreter Size: 1,000 lines of PHP & 625,000 lines of C Problems

PHP interpreter provides unnecessary functions (dead code) Some of these functions are potentially dangerous, e.g. system and mail.

Solutions

Remove unnecessary functions from the PHP interpreter binary. Force PHP to exit when dangerous functions are called.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 10 / 18

slide-18
SLIDE 18

Making Dynamics into Statics

Function Transformations

1 “Statically analyze” the PHP code and determine the functions that

it will call.

For relatively static applications this can be done with a grep-like static analysis. Miniblog requires about 46 PHP functions out of the 1028 functions that a minimal PHP install would have.

2 Implement a transformation that will replace these unused functions

with a simple exit (1).

Winnow the result to remove all the unnecessary code.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 11 / 18

slide-19
SLIDE 19

Making Dynamics into Statics

Specifying Rewrites

We can specify subs the same way that we refer to specializations. Remove system Function

z i f s y s t e m (?) = > f a i l fail is a keyword meaning call exit (1).

Question marks specify wildcard arguments; here we stub all calls to

zif system.

Also support integer constants, so we can reject some calls but not

  • thers.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 12 / 18

slide-20
SLIDE 20

Making Dynamics into Statics

Rewriting Code

Small transformation pass to replace function bodies. zif system ... ...

zif system(char* cmd) { system(cmd); } system(char* cmd) { libc code }

Implemented as a custom LLVM transformation pass.

slide-21
SLIDE 21

Making Dynamics into Statics

Rewriting Code

Small transformation pass to replace function bodies. zif system ... ...

zif system(char* cmd) { system(cmd); } system(char* cmd) { libc code }

zif system ... ...

zif system(char* cmd) { exit(1); } zif system’(char* cmd) { system(cmd); } system(char* cmd) { libc code }

Implemented as a custom LLVM transformation pass.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 13 / 18

slide-22
SLIDE 22

Making Dynamics into Statics

Reusing the Winnowing Hammer

Remove dead code using winnowing. zif system ... ...

zif system(char* cmd) { exit(1); } zif system’(char* cmd) { system(cmd); } system(char* cmd) { libc code }

Reduce to an already solved problem!

slide-23
SLIDE 23

Making Dynamics into Statics

Reusing the Winnowing Hammer

Remove dead code using winnowing. zif system ... ...

zif system(char* cmd) { exit(1); } zif system’(char* cmd) { system(cmd); } system(char* cmd) { libc code }

zif system ... ...

zif system(char* cmd) { exit(1); }

Reduce to an already solved problem!

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 14 / 18

slide-24
SLIDE 24

Monitoring

Outline

1 Reduce the “functionality” of a system

thttpd is a simple webserver. It doesn’t need to be able to listen on arbitrary ports. Make configuration options static.

2 Overcome static analysis limitations

Miniblog should never send email, so that functionality should not be in the system. We need to cut it out, since mail is in the PHP standard library (compiled into the interpreter!).

3 Monitor systems and enforce dynamic policies

Log function calls as the program runs. Check security properties.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 15 / 18

slide-25
SLIDE 25

Monitoring

Peering into the Black Box

Program: PHP Interpreter Size: 625,000 lines of C Problems

Not sure if an application is doing something bad. Want to enforce safety checks on programs.

Solutions

Inject calls to a monitor into the binary. Implement a monitor to check the desired properties.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 16 / 18

slide-26
SLIDE 26

Monitoring

Approach

Extend the enforcement mechanism.

Monitor when execution enters/exits a function. Support access to function arguments and return values. Support conditional monitoring.

Allowing exit (1) on certain parameters.

Monitored binaries can be run without monitors.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 17 / 18

slide-27
SLIDE 27

Conclusions

Conclusions

Occam is a tool for winnowing

Program specialization to reduce functionality. “Partial-evaluation” through optimization. Works well for generic platforms.

Monitoring program execution

Monitors can be placed around functions and modify both inputs and outputs. Monitors can be arbitrary C++ code and can maintain state between calls.

Malecha, Gehani, Shankar Occam: Automated Software Winnowing 18 / 18