[PPT] - Carol V. Alexandru , Sebastiano Panichella, Sebastian Proksch, Harald PowerPoint Presentation

SLIDE 1

Carol V. Alexandru, Sebastiano Panichella, Sebastian Proksch, Harald C. Gall

Software Evolution and Architecture Lab University of Zurich, Switzerland {alexandru,panichella,proksch,gall}@ifi.uzh.ch 26.03.2018 59th CREST Open Workshop Centre for Research on Evolution, Search and Testing University College London, London, United Kingdom

SLIDE 2

The Problem Domain

Static analysis (e.g. #Attr., McCabe, coupling...)

1

SLIDE 3

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0 2

Static analysis (e.g. #Attr., McCabe, coupling...)

SLIDE 4

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0 2

Static analysis (e.g. #Attr., McCabe, coupling...)
Many revisions, fine-grained historical data

SLIDE 5

A Typical Analysis Process

www

clone

select project

3

SLIDE 6

A Typical Analysis Process

www

clone checkout

select project select revision

3

SLIDE 7

A Typical Analysis Process

www

clone checkout Purpose-built, language specific tool

Res

select project select revision

apply tool store analysis results

3

SLIDE 8

A Typical Analysis Process

www

clone checkout Purpose-built, language specific tool

Res

select project select revision

apply tool store analysis results more revisions?

3

SLIDE 9

A Typical Analysis Process

www

clone checkout Purpose-built, language specific tool

Res

select project select revision

apply tool store analysis results more revisions? more projects?

3

SLIDE 10

Redundancies all over...

4

Redundancies in historical code analysis Impact on Code Study Tools

SLIDE 11

Redundancies all over...

Redundancies in historical code analysis Few files change Only small parts of a file change Across Revisions Impact on Code Study Tools

4

SLIDE 12

Redundancies all over...

Redundancies in historical code analysis Few files change Only small parts of a file change Across Revisions Impact on Code Study Tools Repeated analysis

f "known" code

4

SLIDE 13

Redundancies all over...

Redundancies in historical code analysis Few files change Only small parts of a file change Changes may not even affect results Across Revisions Impact on Code Study Tools Repeated analysis

f "known" code

Storing redundant results

4

SLIDE 14

Redundancies all over...

4

Redundancies in historical code analysis Few files change Across Languages Only small parts of a file change Changes may not even affect results Each language has their own toolchain Yet they share many metrics Across Revisions Impact on Code Study Tools Repeated analysis

f "known" code

Storing redundant results

SLIDE 15

Redundancies all over...

Redundancies in historical code analysis Few files change Across Languages Only small parts of a file change Changes may not even affect results Each language has their own toolchain Yet they share many metrics Across Revisions Impact on Code Study Tools Repeated analysis

f "known" code

Storing redundant results Re-implementing identical analyses Generalizability is expensive

4

SLIDE 16

Redundancies all over...

Redundancies in historical code analysis Few files change Across Languages Only small parts of a file change Changes may not even affect results Each language has their own toolchain Yet they share many metrics Across Revisions Impact on Code Study Tools Repeated analysis

f "known" code

Storing redundant results Re-implementing identical analyses Generalizability is expensive

5

SLIDE 17

#1: Avoid Checkouts

SLIDE 18

Avoid checkouts

7

clone

SLIDE 19

Avoid checkouts

7

clone checkout

read write

SLIDE 20

Avoid checkouts

7

clone checkout

read read write

analyze

SLIDE 21

Avoid checkouts

7

clone checkout

read

For every file: 2 read ops + 1 write op Checkout includes irrelevant files Need 1 CWD for every revision to be analyzed in parallel

read write

analyze

SLIDE 22

Avoid checkouts

8

clone analyze

read

SLIDE 23

Avoid checkouts

8

clone analyze Only read relevant files in a single read op No write ops No overhead for parallization

read

SLIDE 24

Avoid checkouts

8

clone analyze Only read relevant files in a single read op No write ops No overhead for parallization

Git Analysis Tool File Abstraction Layer

read

SLIDE 25

Avoid checkouts

8

clone analyze Only read relevant files in a single read op No write ops No overhead for parallization

Git Analysis Tool File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer) extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {

verride def getCharContent(): CharSequence = code

} read

SLIDE 26

Avoid checkouts

clone analyze Only read relevant files in a single read op No write ops No overhead for parallization

Git Analysis Tool File Abstraction Layer

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer) extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {

verride def getCharContent(): CharSequence = code

} read 9

SLIDE 27

#2: Use a multi-revision representation

f your sources

SLIDE 28

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

10

SLIDE 29

rev. 1
rev. 2
rev. 3
rev. 4

11

rev. 1

Merge ASTs

SLIDE 30

rev. 1
rev. 2
rev. 3
rev. 4

12

rev. 2

Merge ASTs

SLIDE 31

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

13

rev. 3

SLIDE 32

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

14

rev. 4

SLIDE 33

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

15

SLIDE 34

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

16

rev. range [1-4]
rev. range [1-2]

SLIDE 35

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

AspectJ (~440k LOC): 1 commit: 2.2M nodes All >7000 commits: 6.5M nodes

17

SLIDE 36

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

AspectJ (~440k LOC): 1 commit: 2.2M nodes All >7000 commits: 6.5M nodes

18

SLIDE 37

Merge ASTs

rev. 1
rev. 2
rev. 3
rev. 4

AspectJ (~440k LOC): 1 commit: 2.2M nodes All >7000 commits: 6.5M nodes

19

SLIDE 38

#3: Store AST nodes only if they're needed for analysis

SLIDE 39

public class Demo { public void run() { for (int i = 1; i< 100; i++) { if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) }} } }

What's the complexity (1+#forks) and name for each method and class?

20

SLIDE 40

140 AST nodes (using ANTLR)

parse

What's the complexity (1+#forks) and name for each method and class?

20 public class Demo { public void run() { for (int i = 1; i< 100; i++) { if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) }} } }

SLIDE 41

140 AST nodes (using ANTLR)

parse

CompilationUnit TypeDeclaration Modifiers public Members Method Modifiers public Name run Name Demo Parameters ReturnType PrimitiveType VOID Body Statements

... ... What's the complexity (1+#forks) and name for each method and class?

20 public class Demo { public void run() { for (int i = 1; i< 100; i++) { if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) }} } }

SLIDE 42

What's the complexity (1+#forks) and name for each method and class? 140 AST nodes (using ANTLR)

parse filtered parse

TypeDeclaration Method Name run Name Demo ForStatement IfStatement ConditionalExpression

7 AST nodes (using ANTLR)

21 public class Demo { public void run() { for (int i = 1; i< 100; i++) { if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) }} } }

SLIDE 43

public class Demo { public void run() { for (int i = 1; i< 100; i++) { if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) } } }

What's the complexity (1+#forks) and name for eachmethod and class? 140 AST nodes (using ANTLR)

parse filtered parse

TypeDeclaration Method Name run Name Demo ForStatement IfStatement ConditionalExpression

7 AST nodes (using ANTLR)

22

SLIDE 44

public class Demo { public void run() { for (int i = 1; i< 100; i++) { if (i % 3 == 0 || i % 5 == 0) { System.out.println(i) } } }

What's the complexity (1+#forks) and name for eachmethod and class? 140 AST nodes (using ANTLR)

parse filtered parse

TypeDeclaration Method Name run Name Demo ForStatement IfStatement ConditionalExpression

7 AST nodes (using ANTLR)

23

SLIDE 45

#4: Use non-duplicative data structures to store your results

SLIDE 46

rev. 1
rev. 2
rev. 3
rev. 4

24

SLIDE 47

rev. 1
rev. 2
rev. 3
rev. 4

24

SLIDE 48

rev. 1
rev. 2
rev. 3
rev. 4

24

[1-1] label #attr mcc [4-4] label #attr mcc InnerClass 4 1 2 4 [2-3] label #attr mcc

SLIDE 49

rev. 1
rev. 2
rev. 3
rev. 4

[1-1] label #attr mcc [4-4] label #attr mcc InnerClass 4 1 2 4 [2-3] label #attr mcc

25

SLIDE 50

LISA also does: #5: Parallel Parsing #6: Asynchronous graph computation #7: Generic graph computations applying to ASTs from compatible languages

26

SLIDE 51

A light-weight view on multi-language analysis

SLIDE 52

Typical solutions

Toolchains / Frameworks
Integrate language-specific tooling
Lots of engineering required
Meta-models
Translate language code to some common

representation

Significant overhead / rigid models

52

SLIDE 53

Structure matters most

Complexity?
# of Functions / Attributes etc.
Coupling between Classes
Call graphs

53

if (true) { if (true) { } } # CYCLO: 3 if (true) { } if (true) { } # CYCLO: 4

SLIDE 54