Binarylevel program analysis: A discussion of x8664 Gang Tan CSE - - PowerPoint PPT Presentation

binary level program analysis a discussion of x86 64
SMART_READER_LITE
LIVE PREVIEW

Binarylevel program analysis: A discussion of x8664 Gang Tan CSE - - PowerPoint PPT Presentation

Binarylevel program analysis: A discussion of x8664 Gang Tan CSE 597 Spring 2019 Penn State University * These slides follow Sec 3.13 of the book CSAPP Computer Systems: A Programmers Perspective; Figures and slides are


slide-1
SLIDE 1

Binary‐level program analysis: A discussion of x86‐64

Gang Tan

CSE 597 Spring 2019 Penn State University

2

* These slides follow Sec 3.13 of the book CSAPP “Computer Systems: A Programmer’s Perspective”; Figures and slides are borrowed/adapted from that book

slide-2
SLIDE 2

Intel’s 64‐Bit History

  • 2001: Intel Attempts Radical Shift from IA32 to IA64

– Totally different architecture (Itanium) – Executes IA32 code only as legacy – Performance disappointing

  • 2003: AMD Steps in with Evolutionary Solution

– x86‐64 (now called “AMD64”)

  • Intel Felt Obligated to Focus on IA64

– Hard to admit mistake or that AMD is better

  • 2004: Intel Announces EM64T extension to IA32

– Extended Memory 64‐bit Technology – Almost identical to x86‐64!

  • All but low‐end x86 processors support x86‐64

– But, lots of code still runs in 32‐bit mode

3

slide-3
SLIDE 3

Overview of x86‐64

  • Pointers and long integers are 64 bits long

– Integer arithmetic operations support 8, 16, 32, and 64 bits

  • 16 general‐purpose registers; each 64‐bit long
  • Calling conventions pass more parameters via registers

– System V AMD64 ABI: passes the first 6 parameters in registers – As a result, some procedures do not need to access the stack at all.

  • Conditional operations are implemented using conditional

move instructions when possible

– Better performance than using branches

  • Floating‐point operations are implemented using the

register‐oriented instruction set in SSE version 2

– Rather than the stack‐based approach in IA32

4

slide-4
SLIDE 4

x86‐64 Data Types

5

Fig 3.34 of CSAPP

slide-5
SLIDE 5

16 64‐bit GP Registers

6

Fig 3.35 of CSAPP

slide-6
SLIDE 6

Instruction Operands

  • Similar to IA32

– Except that the base and index registers must use the r‐version of registers

  • In addition, PC‐relative addressing

– “add rax, 0x200ad1[rip]” accesses mem at address rip+0x200ad1

7

slide-7
SLIDE 7

Function Calling: Argument Passing

8

  • The following slides assume the System V AMD64 ABI
  • Arguments (up to the first six) are passed to procedures via

registers

– This reduces the overhead of storing and retrieving values on the stack

  • callq stores a 64‐bit return address on the stack.
slide-8
SLIDE 8

Example of Argument Passing

9

long myfunc(long a, long b, long c, long d, long e, long f, long g, long h) { long xx = a * b * c * d * e * f * g * h; long yy = a + b + c + d + e + f + g + h; long zz = utilfunc(xx, yy, xx % yy); return zz + 20; }

* Example from https://eli.thegreenplace.net/2011/09/06/stack‐frame‐layout‐on‐x86‐64/

slide-9
SLIDE 9

Function Calling: Stack Frame

  • A function may not require a stack frame, if

– all local variables can be held in registers, and – no array/structure local variables, and – no address‐of operator (&) is used on local variables, and – It does not call another function that requires argument passing on the stack, and – It does not need to save some callee‐save regs

10

slide-10
SLIDE 10

Function Calling: Red‐Zone Optimization

  • Red‐zone optimization for leaf functions

(functions that do not call other funs)

– 128 bytes below rsp can be used by a leaf function without stack allocation – Red‐zone will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data

11

slide-11
SLIDE 11

Function Calling: the Base Pointer Optimization

  • Two options for functions that need a stack frame
  • Option 1: the traditional approach (default for gcc without
  • ptimizations)

– Function prologue: save the base pointer; create the new base pointer – Function body: References to stack location are made relative to the base pointer – Function epilogue: restore the base pointer

  • Option 2: faster (default for gcc with optimizations)

– Do not save/restore the base pointer; rbp used as a GP register – References to stack locations are made relative to the stack pointer – Stack allocation at the beginning; rsp remains at a fixed position during a call

12

slide-12
SLIDE 12

Example

13

long int simple_l (long int *xp, long int y) { long int t = *xp + y; *xp = t; return t; } C source code

slide-13
SLIDE 13

Example

14

simple_l: pushl %ebp ; Save frame pointer movl %esp, %ebp ; New frame pointer movl 8(%ebp), %edx ; Retrieve xp movl 12(%ebp), %eax ; Retrieve yp addl (%edx), %eax ; Add *xp to get t movl %eax, (%edx) ; Store t at xp popl %ebp ; Restore frame pointer ret

Optimized x86‐32 Assembly

slide-14
SLIDE 14

Example

15

Optimized x86‐64 Assembly

simple_l: movq %rsi, %rax ; Copy y addq (%rdi), %rax ; Add *xp to get t movq %rax, (%rdi) ; Store t at xp ret

Unoptimized x86‐64 Assembly

simple_l: pushq %rbp movq %rsp, %rbp movq %rdi, ‐24(%rbp) movq %rsi, ‐32(%rbp) movq ‐24(%rbp), %rax movq (%rax), %rax addq ‐32(%rbp), %rax movq %rax, ‐8(%rbp) movq ‐24(%rbp), %rax movq ‐8(%rbp), %rdx movq %rdx, (%rax) movq ‐8(%rbp), %rax leave ret

slide-15
SLIDE 15

Function Calling: Caller/Callee‐Save Registers

  • Callee‐saved regs: rbx, rbp, and r12 to r15
  • Caller‐saved regs: r10 and r11

16

slide-16
SLIDE 16

x86‐64 Assembly Code Example

long plus(long x, long y); void sumstore(long x, long y, long *dest) { long t = plus(x, y); *dest = t; }

Optimized x86‐64 Assembly

sumstore: pushq %rbx movq %rdx, %rbx call plus movq %rax, (%rbx) popq %rbx ret

C source code