ROSE-CIRM Detecting C-Style Errors in UPC Code Peter Pirkelbauer 1 - - PowerPoint PPT Presentation

rose cirm detecting c style errors in upc code
SMART_READER_LITE
LIVE PREVIEW

ROSE-CIRM Detecting C-Style Errors in UPC Code Peter Pirkelbauer 1 - - PowerPoint PPT Presentation

Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory ROSE-CIRM Detecting C-Style Errors in UPC Code Peter Pirkelbauer 1 Chunhua Liao 1 Ch h Li 1 Thomas Panas 2 1 Lawrence Livermore National Laboratory 2 Microsoft


slide-1
SLIDE 1

Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory

ROSE-CIRM Detecting C-Style Errors in UPC Code

Peter Pirkelbauer 1 Ch h Li

1

Chunhua Liao 1 Thomas Panas 2 Daniel Quinlan 1

1 Lawrence Livermore National Laboratory 2 Microsoft Parallel Data Warehouse

UCRL- LLNL-PRES-504931

Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 This work was funded by the Department of Defense and used elements at the Extreme Scale Systems Center, located at Oak Ridge.

slide-2
SLIDE 2

Motivation

Cost of Software Bugs is significant i 2002 0 6% f th GDP

  • in 2002 0.6% of the GDP [NIST02]

Error Detection Support Error Detection Support

  • RTED Benchmark for Compilers and Runtime-

Systems [Lue09a] [Lue09b] [RTED] Bug Detection Tools

  • Static and Dynamic Analysis
  • Source Code and Binary Code

2

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-3
SLIDE 3

Outline

Unified Parallel C and C-Style Errors Implementation

  • Code Instrumentation and Dynamic Analysis
  • Code Instrumentation and Dynamic Analysis

Evaluation Conclusion

3

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-4
SLIDE 4

Unified Parallel C (UPC)

Extends C99 with: P titi d Gl b l Add S

  • Partitioned Global Address Space
  • Language constructs for Parallelism

− e.g., shared pointers, parallel for loop, memory consistency models

4

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-5
SLIDE 5

Error Categories

C-Style Errors t f b d i iti li d i bl

  • out of bounds accesses, uninitialized variables,

dangling pointers C-Style Errors in UPC’s shared memory space UPC Library Functions

  • upc_memput with wrong length

Parallelism Related Errors deadlock livelock race conditions

5

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

  • deadlock, livelock, race conditions
slide-6
SLIDE 6

UPC – Bug Example 1

UPC Code

int upc main() { int upc_main() { shared [] int *ptr; if (MYTHREAD 0) { if (MYTHREAD == 0) { ptr = upc_alloc(…); } upc_barrier; if (MYTHREAD 1) { if (MYTHREAD == 1) { upc_free(ptr); } }

6

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-7
SLIDE 7

UPC – Bug Example 1 (cont’d)

UPC Code

int upc main() { int upc_main() { shared [] int *ptr; if (MYTHREAD 0) { Thread 0 allocates local shared memory.

Bug uninitialized pointer

if (MYTHREAD == 0) { ptr = upc_alloc(…); } y ptr in Thread 1 remains uninitialized.

uninitialized pointer access

upc_barrier; if (MYTHREAD 1) { Thread 1 accesses if (MYTHREAD == 1) { upc_free(ptr); } } Thread 1 accesses uninitialized ptr.

7

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-8
SLIDE 8

UPC – Bug Example 2

UPC Code

int upc main() int upc_main() { shared [] int *ptr; ptr = upc_all_alloc(…); upc_barrier; ptr[MYTHREAD] = …; if (MYTHREAD == 0) { upc_free(ptr); } }

8

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-9
SLIDE 9

UPC – Bug Example 2 (cont’d)

UPC Code

int upc main() int upc_main() { shared [] int *ptr; ptr = upc_all_alloc(…);

Bug potential early memory

upc_barrier; Collective memory allocation

potential early memory release

ptr[MYTHREAD] = …; if (MYTHREAD == 0) { Missing barrier: { upc_free(ptr); } } Thread 0 might free the memory early.

9

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-10
SLIDE 10

Dynamic Analysis

Thread 0 allocates

Original Code

int upc_main() { int upc_main() {

Instrumented Code

allocates local shared memory. Leaves ptr in shared [] int *ptr; if (MYTHREAD == 0) { ptr = upc_alloc(…); cirm CreateHeapPtr(ptr ); shared [] int *ptr; if (MYTHREAD == 0) { ptr = upc alloc( ); Leaves ptr in Thread 1 uninitialized. cirm_CreateHeapPtr(ptr, …); cirm_InitVariable(&ptr, …); } cirm_ExitWorkzone(); b i ptr = upc_alloc(…); } upc_barrier; Thread 1 accesses i iti li d upc_barrier; cirm_EnterWorkzone(); if (MYTHREAD == 1) { cirm FreeMem(&ptr); if (MYTHREAD == 1) { uninitialized ptr. cirm_FreeMem(&ptr); upc_free(ptr); } } if (MYTHREAD 1) { upc_free(ptr); } }

10

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-11
SLIDE 11

Dynamic Analysis (Scheme)

Original Code

int upc_main() { int upc_main() {

Instrumented Code

Updates shadow memory and shared [] int *ptr; if (MYTHREAD == 0) { ptr = upc_alloc(…); cirm CreateHeapPtr(ptr ); shared [] int *ptr; if (MYTHREAD == 0) { ptr = upc alloc( ); notifies other UPC threads about the heap allocation. cirm_CreateHeapPtr(ptr, …); cirm_InitVariable(&ptr, …); } cirm_ExitWorkzone(); b i ptr = upc_alloc(…); } upc_barrier; allocation. Marks the location of the ptr as initialized upc_barrier; cirm_EnterWorkzone(); if (MYTHREAD == 1) { cirm FreeMem(&ptr); if (MYTHREAD == 1) { as initialized. Note: ptr in Thread 0 != ptr in Thread 1. cirm_FreeMem(&ptr); upc_free(ptr); } } if (MYTHREAD 1) { upc_free(ptr); } } Thread 1 accesses uninitialized ptr.

11

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

uninitialized ptr.

slide-12
SLIDE 12

The ROSE Compiler Infrastructure

12

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-13
SLIDE 13

ROSE-CIRM Toolchain

ROSE - Code Instrumentation and Runtime Monitor

13

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-14
SLIDE 14

Runtime Architecture (1)

14

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-15
SLIDE 15

Runtime Architecture (2)

shared[] int *values = upc_all_alloc(…); cirm CreateHeap(values );

Instrumented Code

cirm_CreateHeap(values, …); cirm_InitVariable(&values); if (MYTHREAD == 1) { values[1] = 7;

15

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

cirmInitVar(&values[1], …); }

slide-16
SLIDE 16

Runtime Monitor Coordination (1)

Concurrent Access

// shared int val;

Instrumented Code

Sends update on initialization to other if (MYTHREAD==0) { val = comp(…); cirm_InitVariable(&val, …); } initialization to other runtime managers. } cirm_EnterBarrier(); upc_barrier; Messages are processed after barrier. cirm_ExitBarrier(); cirm_AccessVar(&val, …); printf(“%d\n”, val); Test succeeds

16

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

printf( %d\n , val);

slide-17
SLIDE 17

Runtime Monitor Coordination (2)

Concurrent Access

If the input program contains race conditions, ROSE-CIRM race conditions, ROSE CIRM may spuriously report an error.

// shared int val;

Instrumented Code

if (MYTHREAD==0) { val = comp(…); cirm_InitVariable(&val, …); } Sends update on initialization to other } // upc_barrier; runtime managers. Missing barrier. Test fails if messages are not processed in cirm_AccessVar(&val, …); printf(“%d\n”, val);

17

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

p time. printf( %d\n , val);

slide-18
SLIDE 18

Coordination – Early Release Problem (1)

shared[] int *values = upc_all_alloc(…);

Instrumented Code

Heap-memory access Missing barrier cirm_ArrayAccess(&values[0], &values[idx]); values[idx] = useful_computation(idx); cirm_InitVariable(&values[…], …); p y // upc_barrier; if (MYTHREAD == 0) { cirm_ExitWorkzone(); Thread 0 might free the memory early. cirm_FreeMem(&ptr); upc_free(ptr); cirm_EnterWorkzone(); } y y

18

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-19
SLIDE 19

Coordination – Early Release Problem (2)

Isolate Destructive Updates

shared[] int *values = upc_all_alloc(…);

Instrumented Code

cirm_ArrayAccess(&values[0], &values[idx]); values[idx] = useful_computation(idx); cirm_InitVariable(&values[…], …); // upc_barrier; if (MYTHREAD == 0) { cirm_ExitWorkzone(); cirm_FreeMem(&ptr); upc_free(ptr); cirm_EnterWorkzone(); }

19

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-20
SLIDE 20

Coordination – Early Release Problem (3)

Isolate Destructive Updates

shared[] int *values = upc_all_alloc(…);

Instrumented Code

cirm_ArrayAccess(&values[0], &values[idx]); values[idx] = useful_computation(idx); cirm_InitVariable(&values[…], …); // upc_barrier; if (MYTHREAD == 0) { cirm_ExitWorkzone(); cirm_FreeMem(&ptr); upc_free(ptr); cirm_EnterWorkzone(); }

20

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-21
SLIDE 21

Coordination – Early Release Problem (4)

Isolate Destructive Updates

shared[] int *values = upc_all_alloc(…);

Instrumented Code

cirm_ArrayAccess(&values[0], &values[idx]); values[idx] = useful_computation(idx); cirm_InitVariable(&values[…], …); // upc_barrier; if (MYTHREAD == 0) { cirm_ExitWorkzone(); cirm_FreeMem(&ptr); upc_free(ptr); cirm_EnterWorkzone(); }

21

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-22
SLIDE 22

Coordination – Early Release Problem (5)

Isolate Destructive Updates

shared[] int *values = upc_all_alloc(…);

Instrumented Code

cirm_ArrayAccess(&values[0], &values[idx]); values[idx] = useful_computation(idx); cirm_InitVariable(&values[…], …); // upc_barrier; if (MYTHREAD == 0) { cirm_ExitWorkzone(); cirm_FreeMem(&ptr); upc_free(ptr); cirm_EnterWorkzone(); }

22

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-23
SLIDE 23

Coordination – Early Release Problem (6)

Isolate Destructive Updates

shared[] int *values = upc_all_alloc(…);

Instrumented Code

cirm_ArrayAccess(&values[0], &values[idx]); values[idx] = useful_computation(idx); cirm_InitVariable(&values[…], …); // upc_barrier; if (MYTHREAD == 0) { cirm_ExitWorkzone(); cirm_FreeMem(&ptr); upc_free(ptr); cirm_EnterWorkzone(); }

23

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

}

slide-24
SLIDE 24

Address Abstraction

Implementation for GCCUPC

24

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-25
SLIDE 25

Bounds Checking – C/C++

Instrumented Code

char* ptr = charrarr[1]; cirm_AccessArray(ptr, ptr+2, sizeof(*ptr), cirmWrite, ...); ptr[2] = 8;

25

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-26
SLIDE 26

Bounds Checking – Distributed Array

shared[3] char chararr[THREADS][8];

26

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-27
SLIDE 27

Tests - RTED Benchmark Suite

Luecke et al.: RTED Benchmark Suite for UPC [RTED]

Category Number of Correctly Identified g y Tests y (in percent)

Out of bounds accesses (indices) 726 685 (94%) Out of bounds accesses (pointers) 160 150 (94%) Out of bounds accesses (pointers) 160 150 (94%) Uninitialized memory reads 64 62 (97%) Dynamic memory handling related 10 10 (100%)

27

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-28
SLIDE 28

Tests - Heat-Conduction Code

El-Ghazawi et al.: Distributed Shared Memory Programming [ElG05]

80 elements per dimension 8 Threads Intel X5680, 6x2 cores @ 3.3Ghz Intel X5680, 6x2 cores @ 3.3Ghz 24GByte Memory, Red Hat Linux Client 5.6 4 5 1 2 4 1 2

28

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

gccupc 4.5.1.2, g++ 4.1.2

slide-29
SLIDE 29

Related Tools

UPC Compilers and Runtime Systems

  • GCCUPC Berkeley UPC Cray UPC
  • GCCUPC, Berkeley UPC, Cray UPC, ...

Tools for C/C++ Tools for C/C

  • Commercial Software

− Insure++, Purify

  • Open Source Software

− Valgrind Memory Checkers DMalloc

29

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

  • DMalloc, ...
slide-30
SLIDE 30

Conclusion

ROSE-CIRM d i l i t l f UPC d

  • a dynamic analysis tool for UPC code
  • helps programmers find some bugs
  • works in mixed language projects (C/C++ UPC)
  • works in mixed language projects (C/C++, UPC)
  • performs well on a subset of the RTED benchmark
  • implemented for GCCUPC

p

30

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-31
SLIDE 31

Future Work

Generality C t f bl k i

  • Casts of blocksize
  • Complex array subscript expressions

Scope Scope

  • UPC Library, Parallelism related errors

Scalability

  • Runtime Monitor Design

Performance

  • Elimination of unnecessary checks (ROSE analysis)

31

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-32
SLIDE 32

Thank You!

http://rosecompiler.org

This work was funded by the Department of Defense and used elements at the Extreme Scale Systems Center located at Oak Ridge 32

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 This work was funded by the Department of Defense and used elements at the Extreme Scale Systems Center, located at Oak Ridge.

slide-33
SLIDE 33

References

[BUPC] Berkeley UPC, http://upc.lbl.gov/ [DMalloc] http://dmalloc.com/ [ElG05] El-Ghazawi et al: UPC: Distributed Shared-Memory Programming, 2005. [GCCUPC] GCCUPC, http://www.gccupc.org/ [Insure] Insure++, http://www.parasoft.com/ [Lue09a] Luecke et al: Evaluating error detection capabilities of UPC run-time systems. PGAS’09. [Lue09b] Luecke et al: The importance of run-time error detection. 3rd Parallel Tools Workshop’09. [NIST02] National Institute of Standards & Technology: The Economic Impacts of Inadequate Infrastructure for Software Testing, May 2002. [Purify] Purify, http://www.ibm.com/software/awdtools/purify/ [Pin] Pin - A Dynamic Binary Instrumentation Tool [RTED] RTED Benchmark Suite, http://rted.public.iastate.edu/ [UPC] UPC Language Specification v1.2, June 2005. [Valgrind] Valgrind, http://valgrind.org/ [ g ] g , p g g

33

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-34
SLIDE 34

Appendix

34

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-35
SLIDE 35

Runtime Error Detection (RTED): Introduction

Shadow memory stores:

  • Information on memory state

y Instruments source code:

  • Updates shadow memory

when memory is allocated, freed or initialized freed, or initialized.

  • Checks memory operations

for consistency.

RTED is a tool that detects software flaws and helps pinpoint their origin. RTED consists of a runtime system and a source-to-source transformation system. The runtime system utilizes a shadow memory to keep track of memory state (allocations, initializations, …). The source-to-source transformation adds statements to the original source code that inform the RTED runtime system about memory operations

35

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory statements to the original source code that inform the RTED runtime system about memory operations.

slide-36
SLIDE 36

RTED for Unified Parallel C (UPC)

Shadow memory: 1 UPC th d

  • 1x per UPC thread
  • Stores state of UPC

process process Instrumented Code:

  • Notifies other UPC

threads of updates.

In addition to local storage, such as Stack and Heap, UPC defines a shared memory region, which can be accessed from any UPC thread. In order to safeguard memory operations, each RTED runtime systems requires access to the memory state To do so each UPC thread keeps a local copy Any update of the

36

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory requires access to the memory state. To do so, each UPC thread keeps a local copy. Any update of the memory state is communicated to all other UPC threads.

slide-37
SLIDE 37

RTED for UPC: Address Representation

  • UPC Thread ID

Relative position (GCCUPC)

  • base:__upc_vm_map

addr _addr

  • relative position for

shared pointers: GUPCR_PTS_OFFSET

To uniquely identify a memory position, RTED’s runtime systems communicate addresses as a tuple containing the thread-id and the relative p g position to the shared memory base. The thread-id is determined by MYTHREAD for local pointers (they can also point into the shared memory region) and upc_threadof for shared pointers. Finding the relative position is implementation

37

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory Finding the relative position is implementation dependent; this slide uses the GCCUPC interface.

slide-38
SLIDE 38

Runtime Monitor - Coordination Issues?

Thread 1 Thread 2 // shared int[] values[THREADS]; W: values[idx] = ...; B: cirm_InitVariable(&values[…], …); // shared int[] values[THREADS]; ... upc_barrier; upc_barrier; ... P: cirmAccessArray(&values[…], …); R: sum += values[…];

38

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-39
SLIDE 39

RTED: Runtime Error Detection

Due to performance and

  • ther concerns,

programming languages / Supported Languages: p g g g g compilers / runtime systems do not (always) guarantee safe execution

  • f code.

C, C++, UPC Undetected software defects are the source for costly problems, such as unstable code, security vulnerabilities etc Comparison with vulnerabilities, etc. RTED instruments potentially unsafe

  • perations with calls to a

runtime checking system

  • ther tools

(Valgrind): + type information + Higher level runtime checking system, thereby providing a safety envelop for executable code. + Higher level abstractions

  • Requires whole

program

39

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory

slide-40
SLIDE 40

Tests - Heat-Conduction Code

El-Ghazawi et al.: Distributed Shared Memory Programming [ElG05]

80 elements per dimension 8 Threads Intel X5680, 6x2 cores @ 3.3Ghz 24GByte Memory, Red Hat Linux Client 5.6 gccupc 4.5.1.2, g++ 4.1.2

40

Option:UCRL# Option:Additional Information

Lawrence Livermore National Laboratory