A jump-target identification method for multi-architecture static - - PowerPoint PPT Presentation

a jump target identification method for multi
SMART_READER_LITE
LIVE PREVIEW

A jump-target identification method for multi-architecture static - - PowerPoint PPT Presentation

A jump-target identification method for multi-architecture static binary translation Alessandro Di Federico Giovanni Agosta Politecnico di Milano CASES 2016 October 4, 2016 Index Introduction Our solution Evaluation Static binary


slide-1
SLIDE 1

A jump-target identification method for multi-architecture static binary translation

Alessandro Di Federico Giovanni Agosta

Politecnico di Milano

CASES 2016

October 4, 2016

slide-2
SLIDE 2

Index

Introduction Our solution Evaluation

slide-3
SLIDE 3

Static binary translation

Static binary translation requires several steps:

1 Parse an input binary 2 Identify all the code it contains 3 Translate it from the input architecture to the target one 4 Produce an output binary

slide-4
SLIDE 4

Static binary translation

Static binary translation requires several steps:

1 Parse an input binary 2 Identify all the code it contains 3 Translate it from the input architecture to the target one 4 Produce an output binary

slide-5
SLIDE 5

Identify the code

  • The binaries are typically divide in segments
  • Certain segments are marked as executable
  • We can be sure that all the code is contained in them
slide-6
SLIDE 6

Translation works at a basic block granularity

slide-7
SLIDE 7

Translation works at a basic block granularity But where does a basic block start?

slide-8
SLIDE 8

Jump targets

Jump target Any address in the executable segment where it’s possible to jump to. A jump target denotes the beginning of a new basic block.

slide-9
SLIDE 9

The naïve solution

  • We could create a basic block for each executable address
  • However we would get:
  • very complex control flow graph
  • poor information to the user
  • increased translation time
  • increased binary output
  • increased execution time
slide-10
SLIDE 10

The dispatcher

  • Typically SBTs use a dispatcher to handle indirect jumps
  • Maps each address to the corresponding translated code

dispatcher: switch (program_counter) { case 0x400000: goto bb_0x400000; case 0x400010: goto bb_0x400010; /* ... */ }

slide-11
SLIDE 11

A more principled approach

  • Explore the code from the entry points
  • Follow the control flow
  • Exhaustively collect the control flow graph
slide-12
SLIDE 12

This is impossible in the general case

slide-13
SLIDE 13

This is impossible in the general case

typedef void (*fptr)(void); int main(int argc , char *argv []) { fptr function_pointer = (fptr) argv [1]; function_pointer (); }

slide-14
SLIDE 14

Typical challenging situations

  • Indirect control-flow transfers are challenging
  • We can classify them in the following categories:
  • return instructions
  • calls to function pointers
  • far jumps
  • switch statements
slide-15
SLIDE 15

Far jump

lui t9 , 0x42 addiu t9 , t9 , 0xd188 jr t9

slide-16
SLIDE 16

Switch statements examples (ARM)

cmp r0 , #240 addls pc , pc , r0 , lsl #2 b 21304 b 21320 b 21710 b 212fc

slide-17
SLIDE 17

Switch statements examples (x86-64)

cmp eax ,0x21 ja 400990 mov rbx ,rdi mov rbp ,rsi jmp PTR [rax *8+0 x422e40]

slide-18
SLIDE 18

How can we handle these situations?

slide-19
SLIDE 19

How can we handle these situations? Can we do it in an architecture independent way?

slide-20
SLIDE 20

Index

Introduction Our solution Evaluation

slide-21
SLIDE 21

The ingredients

We make heavy use of: QEMU Use it as a frontend, supports ~17 architectures. It produces an IR known as tiny code. LLVM Mature compiler framework, suitable to perform sophisticated analysis and recompile the translated code.

slide-22
SLIDE 22

System overview

md5sum.arm Collect JTs from global data Generate tiny code Translate to LLVM IR Collect JTs from direct jumps + ∅ Collect JTs from indirect jumps + Link runtime functions md5sum.x86-64 ∅

slide-23
SLIDE 23

Key characteristics

  • All the code is collected in single LLVM IR function
  • Each input BB is associated to a LLVM BB
  • Indirect jumps go through a dispatcher
  • Each part of the CPU state is mapped to a global variable
slide-24
SLIDE 24

The basic block identification process

  • Global data harvesting
  • Simple Expression Tracker
  • OSR Analysis
slide-25
SLIDE 25

Global data harvesting

  • Parse global data byte-by-byte
  • Interpret pointer-sized integers
  • Is its value a code pointer?
  • Does it point to an executable segment?
  • Does it have an appropriate alignment?
slide-26
SLIDE 26

What does it catch?

  • Function pointers stored in global data
  • Virtual tables
  • Jump tables
slide-27
SLIDE 27

Simple Expression Tracker (SET)

  • Consider each store to the CPU state (e.g., a register)
  • Track how the stored value is computed:
  • push each operation on a helper stack
  • stop in case of more than a single non-constant operand
  • Proceed until an operation with no non-constant operands
  • Go through the stack applying the operations
  • Obtain a possible jump target
slide-28
SLIDE 28

Load instructions

1 Load from the CPU state:

Perform a depth-first visit to all the reaching definitions

2 Load from standard memory:

If in global data, actually read it

slide-29
SLIDE 29

Example

lui $v0 , 0x42 ble $a0 , $t0 , do_call nop lui $v0 , 0x88 addi $v0 , 1 do_call:

  • ri

$v0 , 0x1234 jal $v0

slide-30
SLIDE 30

SET example

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-31
SLIDE 31

SET example

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-32
SLIDE 32

SET example

  • r 4660

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-33
SLIDE 33

SET example

  • r 4660

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-34
SLIDE 34

SET example

  • r 4660

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-35
SLIDE 35

SET example

add 1

  • r 4660

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-36
SLIDE 36

SET example

add 1

  • r 4660

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-37
SLIDE 37

SET example

add 1

  • r 4660

and -2 0x880000

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-38
SLIDE 38

SET example

add 1

  • r 4660

and -2 0x880000 0x881234

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-39
SLIDE 39

SET example

  • r 4660

and -2

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-40
SLIDE 40

SET example

  • r 4660

and -2 0x420000

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-41
SLIDE 41

SET example

  • r 4660

and -2 0x420000 0x421234

store i32 0x420000 , i32* @v0 ; ... br i1 %3 , label %call , label %ft ft: store i32 0x880000 , i32* @v0 %4 = load i32 , i32* @v0 %5 = add i32 %4 , 1 store i32 %5 , @v0 br label %do_call do_call: %6 = load i32 , i32* @v0 %7 = or i32 %6 , 0x1234 %8 = and i32 %7 , -2 store i32 %8 , i32* @pc br label %dispatcher

slide-42
SLIDE 42

What does it catch?

  • Return addresses
  • Far jumps
  • Function pointers embedded in the code
slide-43
SLIDE 43

OSR Analysis

  • Its main objective is to handle switch statements
  • It considers each SSA value
  • Tracks of it can be expressed w.r.t. x:
  • plus an offset a
  • and a factor b
  • For each basic block it tracks:
  • the boundaries of x
  • the signedness of x
slide-44
SLIDE 44

An Offset Shifted Range (OSR)

a + b · x, with

  • x :

c ≤ x ≤ d x < c, x > d and x is signed unsigned

slide-45
SLIDE 45

Example: the input

cmp r1 , #5 addls pc , pc , r1 , lsl #2

slide-46
SLIDE 46

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 %3 = icmp uge i32 %1 , 4 br i1 %3 , label %BB2 , label %BB3 BB2: %4 = icmp ne i32 %2 , 0 br i1 %4 , label %exit , label %BB3 BB3: %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-47
SLIDE 47

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 br i1 %3 , label %BB2 , label %BB3 BB2: %4 = icmp ne i32 %2 , 0 br i1 %4 , label %exit , label %BB3 BB3: %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-48
SLIDE 48

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: %4 = icmp ne i32 %2 , 0 br i1 %4 , label %exit , label %BB3 BB3: %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-49
SLIDE 49

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 br i1 %4 , label %exit , label %BB3 BB3: %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-50
SLIDE 50

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 br i1 %4 , label %exit , label %BB3 BB3: ; <BB1 , (x < 4, unsigned)> %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-51
SLIDE 51

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 ; (x > 4, unsigned) br i1 %4 , label %exit , label %BB3 BB3: ; <BB1 , (x < 4, unsigned)> %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-52
SLIDE 52

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 ; (x > 4, unsigned) br i1 %4 , label %exit , label %BB3 BB3: ; <BB1 , (x < 4, unsigned)> ; <BB2 , (x == 4, unsigned)> %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-53
SLIDE 53

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 ; (x > 4, unsigned) br i1 %4 , label %exit , label %BB3 BB3: ; (x <= 4, unsigned) = <BB1 , (x < 4, unsigned)> ; || <BB2 , (x == 4, unsigned)> %5 = shl i32 %1 , 2 %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-54
SLIDE 54

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 ; (x > 4, unsigned) br i1 %4 , label %exit , label %BB3 BB3: ; (x <= 4, unsigned) = <BB1 , (x < 4, unsigned)> ; || <BB2 , (x == 4, unsigned)> %5 = shl i32 %1 , 2 ; [4 * x] %6 = add i32 113372 , %5 store i32 %6 , i32* @pc

slide-55
SLIDE 55

BB1: %1 = load i32 , i32* @r1 %2 = sub i32 %1 , 4 ; [x - 4] %3 = icmp uge i32 %1 , 4 ; (x >= 4, unsigned) br i1 %3 , label %BB2 , label %BB3 BB2: ; (x >= 4, unsigned) %4 = icmp ne i32 %2 , 0 ; (x > 4, unsigned) br i1 %4 , label %exit , label %BB3 BB3: ; (x <= 4, unsigned) = <BB1 , (x < 4, unsigned)> ; || <BB2 , (x == 4, unsigned)> %5 = shl i32 %1 , 2 ; [4 * x] %6 = add i32 113372 , %5 ; [113372 + 4 * x] store i32 %6 , i32* @pc

slide-56
SLIDE 56

Index

Introduction Our solution Evaluation

slide-57
SLIDE 57

Implementation overview

  • 7000 C++ SLOCs
  • Using LLVM 3.8 and QEMU 2.5.0
  • We support:
  • ARM: GCC 5.3.0 + uClibc
  • MIPS: GCC 5.3.0 + musl
  • x86-64: GCC 4.9.2 + musl
  • Only static binaries are supported
slide-58
SLIDE 58

Functional testing

  • We considered coreutils

md5sum, ls, base64...

  • Translate them and run its testsuite
slide-59
SLIDE 59

Coverage and basic block size

Coverage Covered Unused NOPs Other Extra IPB MIPS 95.37% 4.61% 0.00% 0.02% 12.51% 5.17 ARM 89.56% 8.91% 0.13% 1.40% 14.16% 3.98 x86-64 94.84% 4.70% 0.46% 0.00% 12.87% 4.22

slide-60
SLIDE 60

Test suite results

Tests Skip Pass Fail No JT MIPS 128 409 43 3 ARM 132 361 87 x86-64 127 419 34

slide-61
SLIDE 61

rev.ng

slide-62
SLIDE 62

Thanks for your attention

slide-63
SLIDE 63

License

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/

  • r send a letter to Creative Commons, 444 Castro Street, Suite

900, Mountain View, California, 94041, USA.