Automatic intrusion recovery with system-wide history Taesoo Kim - - PowerPoint PPT Presentation

automatic intrusion recovery with system wide history
SMART_READER_LITE
LIVE PREVIEW

Automatic intrusion recovery with system-wide history Taesoo Kim - - PowerPoint PPT Presentation

Automatic intrusion recovery with system-wide history Taesoo Kim MIT CSAIL Current focus of system security: preventing attacks System hardening tools/techniques (e.g.,) Firewall, AntiVirus 2 My work on preventing attacks


slide-1
SLIDE 1

Automatic intrusion recovery with system-wide history

Taesoo Kim

MIT CSAIL

slide-2
SLIDE 2

2

Current focus of system security: preventing attacks

  • System hardening tools/techniques

– (e.g.,) Firewall, AntiVirus …

slide-3
SLIDE 3

3

My work on preventing attacks (proactive security)

  • StealthMem [Security '12]
  • Morula [Oakland '14]
  • UserFS [Security '10]
  • Mbox [ATC '13]
  • VMsec [APSys '13]
slide-4
SLIDE 4

4

My work on preventing attacks (proactive security)

  • StealthMem [Security '12]
  • Morula [Oakland '14]
  • UserFS [Security '10]
  • Mbox [ATC '13]
  • VMsec [APSys '13]

Cloud (HyperV) Mobile (Android)

slide-5
SLIDE 5

5

My work on preventing attacks (proactive security)

  • StealthMem [Security '12]
  • Morula [Oakland '14]
  • UserFS [Security '10]
  • Mbox [ATC '13]
  • VMsec [APSys '13]

Cloud (HyperV) Mobile (Android) Linux

slide-6
SLIDE 6

6

Attackers routinely compromise computer systems

slide-7
SLIDE 7

7

Attackers routinely compromise computer systems

slide-8
SLIDE 8

8

Attackers routinely compromise computer systems

slide-9
SLIDE 9

9

Attackers routinely compromise computer systems

slide-10
SLIDE 10

10

Compromises inevitable

  • Programmers write buggy code

– A single bug can lead to system compromises

  • Admins mis-confjgure policies
  • Users choose weak, guessable passwords
slide-11
SLIDE 11

11

Compromises inevitable

  • Programmers write buggy code

– A single bug can lead to system compromises

  • Admins mis-confjgure policies
  • Users choose weak, guessable passwords

Need both proactive security mechanism and reactive recovery mechanism! Recovering integrity is required to continue operating!

slide-12
SLIDE 12

12

Existing recovery tools are limited

  • Anti-virus tools

– Only repair from predictable attacks

  • Backup tools

– Attack may be detected days or weeks later – Restoring from backup discards all changes

slide-13
SLIDE 13

13

Existing recovery tools are limited

  • Anti-virus tools

– Only repair from predictable attacks

  • Backup tools

– Attack may be detected days or weeks later – Restoring from backup discards all changes

Admins spend days or weeks manually tracking down all efgects of the attack with no guarantee that everything is cleaned up!

slide-14
SLIDE 14

14

Example: kernel.org

  • A main repository of code for the Linux kernel

– Also host open source projects like Git and Android

slide-15
SLIDE 15

15

Example: kernel.org attack

  • Detected that kernel.org had been compromised

– Noticed error messages from a program that

administrators never installed themselves

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd

Detected the attack

slide-16
SLIDE 16

16

Example: kernel.org attack

  • Investigated the attack for three days

– The initial break-in likely happened a month ago

(Trojaned SSHD was modifjed around that time)

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd

Investigation Likely initial break-in

slide-17
SLIDE 17

17

Example: kernel.org attack

  • Fully re-installed all servers with the latest backup

– Rollback is only safe option (too many suspects to clean up) – Took a month for security experts to fully recover

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd

Only safe opt is rollback: Trojaned SSHD → everything suspicious

slide-18
SLIDE 18

18

Example: kernel.org attack

  • Fully re-installed all servers with the latest backup

– Rollback is only safe option (too many suspects to clean up) – Took a month for security experts to fully recover

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd

Site down for recovery! Only safe opt is rollback: Trojaned SSHD → everything suspicious

slide-19
SLIDE 19

19

Problems in today's repair strategies

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • Manual analysis & recovery is time consuming
slide-20
SLIDE 20

20

Problems in today's repair strategies

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

  • Manual analysis & recovery is time consuming
  • Rollback ends up losing changes
slide-21
SLIDE 21

21

Problems in today's repair strategies

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

  • 3. No guarantees

(safe to rollback?)

... ?

  • Manual analysis & recovery is time consuming
  • Rollback ends up losing changes
  • No guarantees of complete removal of attack
slide-22
SLIDE 22

22

Problems in today's repair strategies

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

...

  • 3. No guarantees

(safe to rollback?)

... ?

How can we design automate recovery system that preserves legitimate changes and provides guarantees?

slide-23
SLIDE 23

23

Idea: keep complete history

  • f computations

Inputs Outputs

Time

  • Inputs/outputs on time-line
slide-24
SLIDE 24

24

Idea: keep complete history

  • f computations

Time

  • Represent computer in fjne-grained details
slide-25
SLIDE 25

25

Idea: keep complete history

  • f computations

Time

  • Represent objects and dependencies

New opportunities to track down attacks!

Attack

slide-26
SLIDE 26

26

Approach: change our past with history of computations

  • Recovery

cancel the initial attack input →

Time

Attack

Cancel?

slide-27
SLIDE 27

27

Approach: change our past with history of computations

  • Recovery

cancel the initial attack input →

  • Reconstruct states as if attack never happened!

Time

Attack

Cancel?

slide-28
SLIDE 28

28

Approach: change our past with history of computations

  • Recovery

cancel the initial attack input →

  • Reconstruct states as if attack never happened!

Time

Attack

Cancel?

Turn problem of manual recovery into problem of manipulating history!

slide-29
SLIDE 29

29

  • Existing systems are not designed for history

– Implicit dependencies and time-line

  • Attacks can be anywhere in the history

– Attacks are often detected days or weeks later

  • History can not be changed in some cases

– External dependencies: spam sent out

Challenges in real systems

slide-30
SLIDE 30

30

Contribution: built real-world systems

  • Automatic recovery

– Operating system: Retro [OSDI'10] – Web application: Warp [SOSP'11] – Distributed web services: Aire [SOSP'13]

  • Automatic detection of attacks

– Web application: Poirot [OSDI'12]

slide-31
SLIDE 31

31

Today's talk

  • Automatic recovery

– Operating system: Retro [OSDI'10] – Web application: Warp [SOSP'11] – Distributed web services: Aire [SOSP'13]

  • Automatic detection of attacks

– Web application: Poirot [OSDI'12]

  • Future research agenda
slide-32
SLIDE 32

32

Today's talk

  • Automatic recovery

– Operating system: Retro [OSDI'10] – Web application: Warp [SOSP'11] – Distributed web services: Aire [SOSP'13]

  • Automatic detection of attacks

– Web application: Poirot [OSDI'12]

  • Future research agenda
slide-33
SLIDE 33

33

Example attack scenario

Attacker Admin Alice

slide-34
SLIDE 34

34

Example attack scenario

  • Adds new account for himself

( modifjes → /etc/passwd)

  • Installs trojaned pdflatex

Attacker Admin Alice

slide-35
SLIDE 35

35

Example attack scenario

  • Adds new account for himself

( modifjes → /etc/passwd)

  • Installs trojaned pdflatex
  • Adds new account for Alice

( modifjes → /etc/passwd)

Attacker Admin Alice

slide-36
SLIDE 36

36

Example attack scenario

  • Adds new account for himself

( modifjes → /etc/passwd)

  • Installs trojaned pdflatex
  • Logs in via SSH

( SSHD reads → /etc/passwd)

  • Runs trojaned pdflatex
  • Adds new account for Alice

( modifjes → /etc/passwd)

Attacker Admin Alice

slide-37
SLIDE 37

37

History strawman 1: Taint tracking

… Attacker process passwd fjle pdfmatex binary ... adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's PDF fjle

slide-38
SLIDE 38

38

  • Track dependencies between processes & fjles

… Attacker process passwd fjle pdfmatex binary ... adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's PDF fjle

History strawman 1: Taint tracking

slide-39
SLIDE 39

39

  • Given attack, track down all afected fjles

→ restore those fjles from earlier backup

… Attacker process passwd fjle pdfmatex binary ... adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's PDF fjle

Attack

History strawman 1: Taint tracking

slide-40
SLIDE 40

40

  • Given attack, track down all afected fjles

→ restore those fjles from earlier backup

… Attacker process passwd fjle pdfmatex binary ... adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's PDF fjle

Attack

History strawman 1: Taint tracking

slide-41
SLIDE 41

41

  • Given attack, track down all afected fjles

→ restore those fjles from earlier backup

… Attacker process passwd fjle pdfmatex binary ... adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's PDF fjle

History strawman 1: Taint tracking

slide-42
SLIDE 42

42

Problem with taint tracking: false positives

  • Lost Alice's account and fjles that are not

actually afected by attacker!

… Attacker process passwd fjle pdfmatex binary ... adduser alice Alice's login LaTeX process Alice's shell Admin's shell Alice's paper Alice's PDF fjle

Lost Alice account

slide-43
SLIDE 43

43

History strawman 2: VM replay

Virtual machine

Time

slide-44
SLIDE 44

44

History strawman 2: VM replay

Virtual machine Inputs Outputs

Time

slide-45
SLIDE 45

45

Periodic VM checkpoints

Inputs Outputs Virtual machine

Time

slide-46
SLIDE 46

46

Step 1: identify attack input

Inputs Outputs

Attack input

Virtual machine

Time

slide-47
SLIDE 47

47

Step 2: rollback to the latest checkpoint

Inputs Outputs

Attack input

Virtual machine

Time

slide-48
SLIDE 48

48

Step 3: replay non-attack inputs

Inputs Outputs

Attack input

X

Virtual machine

Time

slide-49
SLIDE 49

49

Problems with VM replay

  • VM replay is expensive

– Repairing a week-old attack needs a week for replay

  • Past inputs are meaningless to new system

– Non-determinism: new SSH crypto keys ... – Deterministic replay won't work

slide-50
SLIDE 50

50

Retro's approach: Action history graph

  • Represent fjne-grained history

– Includes kernel objects, system calls, function calls, … – Assume tamper-proof kernel, storage

slide-51
SLIDE 51

51

Retro's approach: Action history graph

  • Represent fjne-grained history

– Includes kernel objects, system calls, function calls, … – Assume tamper-proof kernel, storage

  • Rollback objects directly afected by attack

– Avoid the false positives of Taint tracking

  • Selectively re-execute indirectly afected actions

– Avoid the expensive VM replay

slide-52
SLIDE 52

52

Action history graph: Objects represent fjles, processes

Attacker's process password fjle adduser Alice Admin's shell Time

slide-53
SLIDE 53

53

Action history graph: Actions represent execution (syscall)

Time Attacker's process password fjle adduser Alice Admin's shell

slide-54
SLIDE 54

54

Action history graph: Actions have dependencies

w r i t e (

  • f

s e t , d a t a )

Time Attacker's process password fjle adduser Alice Admin's shell

slide-55
SLIDE 55

55

exec (prog, args, ..)

Action history graph: Actions have dependencies

w r i t e (

  • f

s e t , d a t a )

Time Attacker's process password fjle adduser Alice Admin's shell

slide-56
SLIDE 56

56

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Action history graph: Actions have dependencies

w r i t e (

  • f

s e t , d a t a )

Time Attacker's process password fjle adduser Alice Admin's shell

slide-57
SLIDE 57

57

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Action history graph: Objects have checkpoints

w r i t e (

  • f

s e t , d a t a )

Time Attacker's process password fjle adduser Alice Admin's shell

slide-58
SLIDE 58

58

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Step 1: fjnd attack action

w r i t e (

  • f

s e t , d a t a )

Time Attacker's process password fjle adduser Alice Admin's shell

slide-59
SLIDE 59

59

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Step 2: rollback afgected objects

w r i t e (

  • f

s e t , d a t a )

Time Attacker's process password fjle adduser Alice Admin's shell

slide-60
SLIDE 60

60

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Step 3: skip attack action

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

slide-61
SLIDE 61

61

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Step 4: redo non-attack actions

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

slide-62
SLIDE 62

62

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Repeat step 2: rollback objects

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

slide-63
SLIDE 63

63

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Repeat step 3: redo actions

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

Key advantage over VM replay: Re-run only adduser, not entire VM.

slide-64
SLIDE 64

64

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Repeat step 3: redo actions

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

slide-65
SLIDE 65

65

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Repeat step 3: redo actions

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

Key advantage over Taint tracking: Attacker removed, Alice account preserved

slide-66
SLIDE 66

66

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Challenge: how to avoid re-executing everything?

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

Exit status afgects shell, which afgects sshd, and so on… Naïve process-level re-execution still re-executes entire system!

slide-67
SLIDE 67

67

Observation: Admin's shell was not afgected

  • “Adduser alice” succeed as before

– This is what Admin wanted to do – If failed, need to re-execute Admin's shell

slide-68
SLIDE 68

68

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Example 1: exit status to shell unchanged

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle adduser Alice Admin's shell

slide-69
SLIDE 69

69

exec (prog, args, ..) e x i t ( s t a t u s ) w r i t e (

  • f

s e t , d a t a ) read (ofset, data)

Predicates: avoid equivalent re-execution

w r i t e (

  • f

s e t , d a t a )

Time

X

Check if adduser succeed as before? Skip the re-run

  • f admin's shell

Attacker's process password fjle adduser Alice Admin's shell

slide-70
SLIDE 70

70

r e a d (

  • f

s e t , d a t a )

Example 2: user's password unchanged

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle Alice's SSHD

slide-71
SLIDE 71

71

Observation: Alice's SSHD was not afgected

  • Alice's SSHD checked only Alice's account

– This is what Alice's SSHD wanted to do – If Alice's account changed, need to re-execute SSHD

slide-72
SLIDE 72

72

read (ofset, data) return (Alice's password) c a l l g e t p w n a m ( “ a l i c e ” )

Refjnement: exploits high-level semantics

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle getpwnam() function Alice's SSHD

slide-73
SLIDE 73

73

read (ofset, data) return (Alice's password) c a l l g e t p w n a m ( “ a l i c e ” )

Refjnement: exploits high-level semantics

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle getpwnam() function Alice's SSHD

Get username, return passwd entry

slide-74
SLIDE 74

74

read (ofset, data) return (Alice's password) c a l l g e t p w n a m ( “ a l i c e ” )

Refjnement: exploits high-level semantics

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle getpwnam() function Alice's SSHD

slide-75
SLIDE 75

75

read (ofset, data) return (Alice's password) c a l l g e t p w n a m ( “ a l i c e ” )

Refjnement: exploits high-level semantics

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle getpwnam() function Alice's SSHD

Rerun getpwnam() instead of SSHD

slide-76
SLIDE 76

76

read (ofset, data) return (Alice's password) c a l l g e t p w n a m ( “ a l i c e ” )

Refjnement: exploits high-level semantics

w r i t e (

  • f

s e t , d a t a )

Time

X

Attacker's process password fjle getpwnam() function Alice's SSHD

Predicate:

Check if return same Alice's passwd? Skip the re-run

  • f Alice's SSHD
slide-77
SLIDE 77

77

Quick summary: Retro's approach

  • Action history graph: represent history in detail
  • Two techniques to minimize re-execution:

– Predicates: skips equivalent computations – Refjnement: re-executes fjne-grained actions

slide-78
SLIDE 78

78

Challenge: external dependencies

  • What if the attack was externally-visible?

– Spam sent out ... – Hard in general case

ask for user's decision →

  • Help users to understand repaired state

– (e.g.) notify user spam email was sent out ...

slide-79
SLIDE 79

79

Compensating action: notify changes in terminal output

... [redo] cat ~/.ssh/authorized_keys ... ! --- old ! +++ new ! @@ -1,3 +1,2 @@ ! ssh-rsa AAAAB3NzaC1yc2EAAAABIw... vagrant ! -ssh-rsa AAAAB3NzaC1yc2EAAAADAQ... attacker ! ssh-rsa AAAAB3NzaC1yc2EAAAAAao... new pubkey ...

You should not have seen this output!

slide-80
SLIDE 80

80

Action history graph

Retro implementation

Linux kernel Retro module Processes File system (checkpts) Kernel Userspace Runtime: Record action history graph

slide-81
SLIDE 81

81

Action history graph

Retro implementation

Linux kernel Retro module Processes File system (checkpts) Repair Managers Repair Controller (e.g., fs, terminal ..) Kernel Userspace Recovery: repair logic/mgr

slide-82
SLIDE 82

82

Action history graph

Retro implementation

Linux kernel Retro module Processes File system (checkpts) Repair Managers Repair Controller (e.g., fs, terminal ..) Kernel Userspace Application specifjc mgrs using well-defjned API

slide-83
SLIDE 83

83

Demo: recovering from inadvertently installed virus

  • Backtracking tool
  • Selective re-execution
  • Compensating action
slide-84
SLIDE 84

84

Problem: detecting an entry point of attacks is hard

  • How to fjnd one-month-old attack?
  • Too much information

– Manual analysis is time-consuming

slide-85
SLIDE 85

85

Observation: security patch renders attack harmless

  • Escape URL arguments for fjrefox

// slider.c

  • sprintf(cmd, “firefox %s”, evt->uri);

+ sprintf(cmd, “firefox %s”, escape(evt->uri));

vs

Unpatched

slider sh fjrefox virus

Patched

slider sh fjrefox virus

x

slide-86
SLIDE 86

86

Approach: comparing both histories to detect past attacks

  • How can we get history of patched execution?

– Replay inputs after applying security patches – Diferent history

potential threats →

vs

slider sh fjrefox virus slider sh fjrefox virus

x

Unpatched Patched

slide-87
SLIDE 87

87

Approach: comparing both histories to detect past attacks

  • How can we get history of 'secure' execution?

– Replay one more after applying security patches – Diferent history

potential threats →

vs

slider sh fjrefox virus slider sh fjrefox virus

x

Turn manual efgort of auditing process into computational problem! (patch-based auditing)

Unpatched Patched

slide-88
SLIDE 88

88

Challenge: performance

  • Re-executing is costly for busy computer

– Auditing requests

re-executes all requests again →

– Auditing one month

takes another month! →

slide-89
SLIDE 89

89

Three techniques developed for partial re-execution

  • Control fmow fjltering

– Audit possibly afected executions

  • Function-level auditing

– Compare function-level executions

  • Memoized re-execution

– Avoid duplicated executions while replaying

slide-90
SLIDE 90

90

Putting all together: fjxing our past & future with patch

Patch from upstream (fjxing a bug in SSHD)

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

...

  • 3. No guarantees

(safe to rollback?)

... ?

slide-91
SLIDE 91

91

Putting all together: fjxing our past & future with patch

Patch from upstream (fjxing a bug in SSHD)

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

...

  • 3. No guarantees

(safe to rollback?)

... ?

x

  • Automatic detection
slide-92
SLIDE 92

92

Putting all together: fjxing our past & future with patch

Patch from upstream (fjxing a bug in SSHD)

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

...

  • 3. No guarantees

(safe to rollback?)

... ?

x

  • Automatic detection
  • Preserve changes

x

slide-93
SLIDE 93

93

Putting all together: fjxing our past & future with patch

Patch from upstream (fjxing a bug in SSHD)

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

...

  • 3. No guarantees

(safe to rollback?)

... ?

x

  • Automatic detection
  • Preserve changes
  • Strong guarantees

x x

slide-94
SLIDE 94

94

Putting all together: fjxing our past & future with patch

Patch from upstream (fjxing a bug in SSHD)

  • Sept. 1st

2011

  • Aug. 28st
  • Aug. ??
  • Oct. 3rd
  • 1. Manual and

time consuming

  • 2. Lost changes

(a month!)

...

  • 3. No guarantees

(safe to rollback?)

... ?

x

  • Automatic detection
  • Preserve changes
  • Strong guarantees

x x

Whenever new patches are released, not only prevent future attacks, but also detect and repair past attacks for free!

slide-95
SLIDE 95

95

  • Existing systems are not designed for history

– Implicit dependencies and time-line

  • Attacks can be anywhere in the history

– Attacks are often detected days or weeks later

  • History can not be changed in some cases

– External dependencies: spam sent out

Summary of our approach: building real systems

slide-96
SLIDE 96

96

  • Existing systems are not designed for history

– Implicit dependencies and time-line

  • Attacks can be anywhere in the history

– Attacks are often detected days or weeks later

  • History can not be changed in some cases

– External dependencies: spam sent out

Summary of our approach: building real systems

→ Action history graph & re-execution techniques

slide-97
SLIDE 97

97

  • Existing systems are not designed for history

– Implicit dependencies and time-line

  • Attacks can be anywhere in the history

– Attacks are often detected days or weeks later

  • History can not be changed in some cases

– External dependencies: spam sent out

Summary of our approach: building real systems

→ Action history graph & re-execution techniques → Patch-based auditing

slide-98
SLIDE 98

98

  • Existing systems are not designed for history

– Implicit dependencies and time-line

  • Attacks can be anywhere in the history

– Attacks are often detected days or weeks later

  • History can not be changed in some cases

– External dependencies: spam sent out

Summary of our approach: building real systems

→ Action history graph & re-execution techniques → Patch-based auditing → (Not solved) compensating actions in some cases (see our recent work, Aire [SOSP'13] in this direction of research)

slide-99
SLIDE 99

99

Evaluation questions

  • Automatic intrusion recovery

– How much better than manual repair? – How much runtime overhead?

  • Patch-based auditing

– What attacks can be detected? – How fast is re-execution?

slide-100
SLIDE 100

100

Experimental setup for Retro (automatic recovery)

  • 2.8 GHz Intel Core i7, 8 GB RAM
  • 64-bit Linux 2.6.35
  • Tested with

– 2 real-world attacks from Honeypot – 8 synthetic attacks

slide-101
SLIDE 101

101

Retro recovers from real-world and synthetic attacks

  • 2 real-world attacks from Honeypot

– Remove log entries, add accounts, run botnet

  • 8 synthetic attacks

– 2 examples: LaTeX and SSHD trojan – 6 scenario: File sharing, Web servers ...

slide-102
SLIDE 102

102

Retro's runtime overheads in realistic workloads

Workload CPU cost Storage

  • verhead

HotCRP conference web site 35% 4GB / day

slide-103
SLIDE 103

103

Retro's runtime overheads in challenging workloads

  • Can store 2 weeks of logs on 2TB disk ($100)

even for worst-case workloads Workload CPU cost Storage

  • verhead

HotCRP conference web site 35% 4GB / day Apache, small static fjles 127% 100GB / day Continuous kernel recompile 89% 150GB / day

slide-104
SLIDE 104

104

Retro imposes acceptable

  • verheads in practice

Workload CPU cost w/ 2nd core Storage

  • verhead

HotCRP conference web site 35% 2% 4GB / day Apache, small static fjles 127% 33% 100GB / day Continuous kernel recompile 89% 18% 150GB / day

  • Can store 2 weeks of logs on 2TB disk ($100)

even for worst-case workloads

  • Can of-load CPU overhead to dedicated core
slide-105
SLIDE 105

105

Retro imposes acceptable

  • verheads in practice

Workload CPU cost w/ 2nd core Storage

  • verhead

HotCRP conference web site 35% 2% 4GB / day Apache, small static fjles 127% 33% 100GB / day Continuous kernel recompile 89% 18% 150GB / day

  • Can store 2 weeks of logs on 2TB disk ($100)

even for worst-case workloads

  • Can of-load CPU overhead to extra core

For systems where recovery is critical, Retro's overheads can be acceptable

slide-106
SLIDE 106

106

Experimental setup for Poirot (patch-based auditing)

  • 3.07 GHz Core i7-950, 12GB RAM
  • PHP 5.3.6
  • No application changes required
  • Tested with

– Security patches in Wikipedia and HotCRP – Under real Wikipedia traces

slide-107
SLIDE 107

107

Poirot effjciently audits attacks

  • 34 real patches in Wikipedia
  • Auditing 3.4h of executions

– 29 patches

→ <0.2 sec (rarely executed code)

– 5 patches

→ ~9.2 min (commonly executed code)

Poirot can re-execute 12-51x faster than the original execution even for worst-case patches

slide-108
SLIDE 108

108

Poirot detects real attacks

  • Wikipedia: detected 5 difgerent types of attacks

(e.g., Stored XSS, CSRF …)

  • HotCRP: detected 4 info. leak vulnerabilities

(e.g., accepted papers ...)

slide-109
SLIDE 109

109

Poirot imposes reasonable runtime overheads

  • Testing with real Wikipedia traces

– 14.1% latency overhead – 15.3% throughput overhead – 5.4 KB/req storage overhead

For systems where integrity is critical, Poirot's overheads can be acceptable

slide-110
SLIDE 110

110

Related work

  • Tracking down attacks: BackTracker, IntroVirt

– Not for recovery, but only for analyzing attacks

  • Taint tracking for recovery: Taser, Polygraph

– False positives: recovering too conservatively

  • Selective undo/redo: Undoable mail store

– Fixing confjguration errors in email server

slide-111
SLIDE 111

111

Today's talk

  • Automatic recovery

– Operating system: Retro [OSDI'10] – Web application: Warp [SOSP'11] – Distributed web services: Aire [SOSP'13]

  • Automatic detection of attacks

– Web application: Poirot [OSDI'12]

  • Future research agenda
slide-112
SLIDE 112

112

Research agenda

Can undoability be part of our daily computing life?

① Undoable OS

– New design of components / interfaces in OS – Usable / intuitive user interface

Idea: use history for everything

slide-113
SLIDE 113

114

Research agenda

② Haskell Kernel Idea: protect history from adversaries

Can kernel be secure by design?

– Track and keep history safe? – Purely functional

better undo/redo-ability →

slide-114
SLIDE 114

116

Research agenda

③ Security Analytics Idea: connect history of all computers

Can we understand security for larger systems?

– Better understand security with concrete histories – Leverage recent tools for Big Data

slide-115
SLIDE 115

118

Summary: building secure systems with system-wide history

  • Big step toward “undo computing”
  • Automatic recovery

– Operating system: Retro [OSDI'10] – Web application: Warp [SOSP'11] – Distributed web services: Aire [SOSP'13]

  • Automatic detection of attacks

– Web application: Poirot [OSDI'12]

slide-116
SLIDE 116

119

Summary: building secure systems with system-wide history

  • Big step toward “undo computing”
  • Automatic recovery in real-world systems

– Operating system: Retro [OSDI'10] – Web application: Warp [SOSP'11] – Distributed web services: Aire [SOSP'13]

  • Patch-based auditing system

– Web application: Poirot [OSDI'12]

Thank you!

Work in collaboration with: Ramesh Chandra, Meelap Shah, Neha Narula, Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek