Blended Program Analysis Barbara G. Ryder Virginia Tech - - PDF document

blended program analysis
SMART_READER_LITE
LIVE PREVIEW

Blended Program Analysis Barbara G. Ryder Virginia Tech - - PDF document

Blended Program Analysis Barbara G. Ryder Virginia Tech Collaborators: Bruno Dufour (Rutgers) & Gary Sevitsky (IBM Research); Funded by IBM Open Collaborative Research Program and NSF 08-0811518 DCS@VT 040309 B.G. Ryder 1 Framework-based


slide-1
SLIDE 1

1

DCS@VT 040309 B.G. Ryder 1

Blended Program Analysis

Collaborators: Bruno Dufour (Rutgers) & Gary Sevitsky (IBM Research); Funded by IBM Open Collaborative Research Program and NSF 08-0811518

Barbara G. Ryder Virginia Tech

DCS@VT 040309 B.G. Ryder 2

Framework-based Applications

  • Application is an iceberg
  • Bulk of the code in libraries and

frameworks

  • Genre not commonly addressed by

research community

  • E.g., financial planning services, e-

commerce sites, online reservation systems, Tomcat-based systems software

  • Programs are not just large, but

are more complex in interactions between frameworks

  • Performance problems span multiple

layers

Libraries and Frameworks App

Middleware

slide-2
SLIDE 2

2

DCS@VT 040309 B.G. Ryder 3

Framework-based Applications

  • Software characteristics
  • Not amenable to static analyses
  • Not scalable -- too complex
  • Not amenable to dynamic

analysis

  • Too intrusive to execution for

production codes

  • Applications main function
  • ften is data transformation
  • Goal: design analyses for

performance diagnosis of these systems

Libraries and Frameworks App

Middleware

DCS@VT 040309 B.G. Ryder 4

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary and future work
slide-3
SLIDE 3

3

DCS@VT 040309 B.G. Ryder 5

Initial Goals

  • Devise new analyses to aid performance

diagnosis

  • Gather data about the characteristics of

these important practical applications

  • To enable code specialization, better benchmark

selection, establishment of API ‘best practices’

  • Design initial experiments to test ideas
  • Problem: overuse of temporaries or object churn
  • Q: can we identify object churn through analysis?

DCS@VT 040309 B.G. Ryder 6

Eliminating Object Churn

  • Identify temporary objects
  • Need to approximate “object lifetime”
  • Identify execution contexts with excessive

use of temporaries

  • Based on total number of instances
  • Not same as finding often-executed allocation sites
  • Elimination strategies
  • Optimize the use of frameworks and libraries together
  • Introduce caching for temporary data structures
  • Code specialization
  • Can help understand construction of longer-

lived data

slide-4
SLIDE 4

4

DCS@VT 040309 B.G. Ryder 7

Current Practice: Jinsight Trace of

HoldingDataBean_Ser.serialize()

Tens of thousands of calls How to find churn locality?

DCS@VT 040309 B.G. Ryder 8

Optimized calling tree of trace from HoldingDataBean_Ser.serialize()

Our analysis will offer something better!

slide-5
SLIDE 5

5

DCS@VT 040309 B.G. Ryder 9

Blended Analysis - Scalability

1473 18267 8089 25012 348 3919 2223 5848 17 322 71 373 1 10 100 1000 10000 100000 Dct-Std Dct-WS EJB-Std EJB-WS Calling contexts

2 orders of magnitude! Looking at the entire trace Approximating contexts that use temporaries Identifying contexts that truly use temporaries

DCS@VT 040309 B.G. Ryder 10

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary and future work
slide-6
SLIDE 6

6

Method Representation

Entry Exit x = new B() y = D.m() z = C.m() w = new A() (FSE’08)

DCS@VT 040309 B.G. Ryder 11

What type of objects may be created when this method is called?

DCS@VT 040309 B.G. Ryder 12

Blended Analysis Paradigm

Java Application Profile Loaded Classes Reflection Specification + Templates Dynamic Calling Structure Static Analysis Models of methods

slide-7
SLIDE 7

7

Pruning Code in Methods

Entry Exit x = new B() y = D.m() z = C.m() w = new A()

Allocated types: {B} Observed targets: {D.m}

(FSE’08)

DCS@VT 040309 B.G. Ryder 13 DCS@VT 040309 B.G. Ryder 14

Blended Analysis Paradigm

Java Application Profile Loaded Classes Reflection Specification + Templates Dynamic Calling Structure Static Analysis Pruned models

  • f methods
slide-8
SLIDE 8

8

DCS@VT 040309 B.G. Ryder 15

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary
  • Future work

DCS@VT 040309 B.G. Ryder 16

Escape Analysis

  • Determines escape property of an object (i.e.,

an allocation site):

  • Captured (not escaping)
  • Arg-escaping (escaping through an argument)
  • Globally escaping
  • Builds connection graph for each method
  • Shows points-to relations between object fields and

references

  • Shows escape state of each object

Choi et. al, TOPLAS’03

slide-9
SLIDE 9

9

C D E

17

Escape analysis

zag() foo() bar() baz() A B C D E Captured Arg-escaping Globally escaping void bar() { a = new A(); a.x = new B(); } C baz() { c = new C(); c.y = new D(); c.z = new E(); return c; } void foo(F f) { c = baz(); f.w = c.z; } void zag() { F f = new F(); foo(f); G.global = f; } C D E F E F E G F E

A B C D E F G Disposition

DCS@VT 040309 B.G. Ryder DCS@VT 040309 B.G. Ryder 18

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary and future work
slide-10
SLIDE 10

10

DCS@VT 040309 B.G. Ryder 19

Calling Contexts with Lots of Temporaries

9 54 27 10 9 9 108 9 9 9 9 HoldingDataBean_Ser.serialize() Formats stock holding records into SOAP response DateSerializer.getValueAsString() Formats data field of record

Paths

DCS@VT 040309 B.G. Ryder 20

Reduced Connection Graph for DateSerializer.getValueAsString()

9 63 int[ ] 18 bool [ ] 9 long[ ] 9 int[ ] 108 captured instances from 8 alloc sites as many as 6 calls away from the uses! Gregorian calendars

From Calendar.createCalendar()

from Calendar() 3 sites from Calendar() 2 sites from Calendar() from GregorianCalendar()

slide-11
SLIDE 11

11

Visualized Results

21

DateSerializer.getValueAsString()

DCS@VT 040309 B.G. Ryder DCS@VT 040309 B.G. Ryder 22

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Retrieving explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary and future work
slide-12
SLIDE 12

12

DCS@VT 040309 B.G. Ryder 23

Experiments [ISSTA’07, FSE’08]

  • Elude
  • Prototype built in WALA, uses Jinsight traces
  • Benchmarks -Trade 6.0.1; Websphere

Application Server 6.0.0.1; DB2 v8.2.0

  • Traced a single transaction
  • 4 configurations of Trade 6 depend on mode

choices

  • Run-time mode (DB): Direct, EJB
  • Access mode: Standard, WebServices
  • Eclipse JDT Compiler 3.1.0
  • Machine: Intel Core Duo 1.8Ghz, 3GB RAM,

Linux 2.6 kernel

DCS@VT 040309 B.G. Ryder 24

Size Comparison for Benchmarks

Benchmark (First 4 rows are Trade) Allocated Types Allocated Instances Methods Calls Max Stack Depth Direct/Std 30 186 710 4 484 26 Direct/WS 166 5 522 3 308 127 794 53 EJB/Std 82 1 751 1 978 60 936 62 EJB/WS 210 7 088 4 479 184 288 72 JDT Compiler 168 53 191 1 411 1 081 927 53

slide-13
SLIDE 13

13

Metrics

Designed new metrics for blended escape analysis

  • Measure effectiveness of pruning
  • Scalability of analysis – % of blocks in methods

pruned

  • Precision improvement not observed in disposition

metric

DCS@VT 040309 B.G. Ryder 25

Scalability: %blocks pruned

Benchmark Pruned BBs Running Time (h:m:s) Speedup Orig Pruned Direct/Std 42.9% 0:00:19 0:00:16 1.2 Direct/WS 36.1% 0:06:38 0:03:17 2.0 EJB/Std 41.1% 0:02:40 0:02:02 1.18 EJB/WS 38.3% N/A 18:33:13 N/A Eclipse JDT 25.5% N/A 6:09:15 N/A Average 36.8% 1.6

DCS@VT 040309 B.G. Ryder 26

slide-14
SLIDE 14

14

Metrics

  • Measure usage of temporaries
  • Disposition- categorizes instances as globally:

escaping, captured, mixed

  • Concentration- measures locality of temporary

usage

  • Capturing depth- # calls between temporary

creation and capture

DCS@VT 040309 B.G. Ryder 27 28

Disposition of Instances

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Direct/Std Direct/WS EJB/Std EJB/WS Eclipse

Percentage of Instances in each escape state

escaped mixed captured

DCS@VT 040309 B.G. Ryder

slide-15
SLIDE 15

15

29

Concentration of Instances

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Direct/Std Direct/WS EJB/Std Eclipse

Percentage of instances explained by x%

  • f methods

x = 20% x = 10% x = 5%

DCS@VT 040309 B.G. Ryder

Metrics

  • Estimate temporary data structure

complexity

  • # of types in data structures
  • # of allocating methods for objects in a data

structure

  • Height of data structure
  • Maximum capturing distance

DCS@VT 040309 B.G. Ryder 30

slide-16
SLIDE 16

16

31

# of Types in Data Structures

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 3 4 5-11

% of Data Structures # of Types

Direct/Std Direct/WS EJB/Std EJB/WS Eclipse

DCS@VT 040309 B.G. Ryder DCS@VT 040309 B.G. Ryder 32

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary and future work
slide-17
SLIDE 17

17

DCS@VT 040309 B.G. Ryder 33

Related Work on Framework-based Systems

  • Framework-based systems
  • Profiling execution, Ammons et. al. [ECOOP’04]
  • Characterizing where execution time is spent,

Srinivas & Srinivasan [FSE ’05]

  • Characterizing data structures in Java, Mitchell &

Sevitsky [ECOOP ’03], Mitchell [ECOOP ’06], Blackburn et. al. [OOPSLA’06], Buytaert et. al. [ACES’05]

  • Characterizing data transformations, Mitchell et.
  • al. [ECOOP ’06]

DCS@VT 040309 B.G. Ryder 34

Related Work on Analysis

  • Often static analysis used to direct

placement of instrumentation for dynamic analysis for efficiency

  • Some previous uses of dynamic analysis to

“direct” static:

  • Hybrid slicing, Gupta et.al. [TOSEM 1997]
  • Optimize model checking, Groce et.al. [TACAS’06]
  • Dynamic points-to in slicing, Mock et.al. [FSE’02]
  • Parameter mutability analysis, Artzi et.al. [MIT

TR, 9/2006]

slide-18
SLIDE 18

18

DCS@VT 040309 B.G. Ryder 35

Outline

  • Motivation
  • Blended analysis paradigm
  • Blended escape analysis
  • Example
  • Explanations of performance problems
  • Newest empirical results
  • Related work
  • Summary and future work

DCS@VT 040309 B.G. Ryder 36

Summary

  • New blended analysis paradigm
  • Combines dynamic program info with static analysis
  • Aimed at framework-intensive applications
  • Obtains high precision at reasonable cost
  • Algorithm aimed at greater scalability & precision,

and better data structure characterization

  • Problem studied: performance understanding
  • f object churn
  • Novel use of escape analysis
  • Algorithm plus empirical results
  • New metrics to characterize usage of temporaries
slide-19
SLIDE 19

19

DCS@VT 040309 B.G. Ryder 37

Future Work

  • Enhanced tooling
  • Visualization of connection graphs
  • Experiments with more precise calling

structure representations

  • Integration into interactive tools

Future Work

  • Explore wider applicability of blended

analyses

  • Blended security analyses
  • Permissions
  • Information flow (i.e., taint)
  • Blended value-flow
  • Semantic exploration of specific test executions
  • Help with debugging

DCS@VT 040309 B.G. Ryder 38