Empirical analysis of the relationship between CC and SLOC in a - - PowerPoint PPT Presentation

empirical analysis of the relationship between cc and
SMART_READER_LITE
LIVE PREVIEW

Empirical analysis of the relationship between CC and SLOC in a - - PowerPoint PPT Presentation

Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods Davy Landman Alexander Serebrenik Jurgen Vinju Metrics Lines of Code (SLOC) Cyclomatic Complexity (CC) Popular in practice and research


slide-1
SLIDE 1

Empirical analysis of the relationship between CC and SLOC

in a large corpus of Java methods

Davy Landman Alexander Serebrenik Jurgen Vinju

slide-2
SLIDE 2

Metrics

  • Lines of Code (SLOC)
  • Cyclomatic Complexity (CC)
  • Popular in practice and research
slide-3
SLIDE 3

Metrics

  • Lines of Code (SLOC)
  • Cyclomatic Complexity (CC)

public ¡double ¡sqrt(int ¡n){ ¡ ¡ ¡ ¡ ¡// ¡Newton-­‑Raphson ¡method ¡ ¡ ¡ ¡ ¡double ¡r ¡= ¡n ¡/ ¡2.0; ¡ ¡ ¡ ¡ ¡while ¡(abs(r ¡– ¡(n ¡/ ¡r)) ¡> ¡0.00001) ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡r ¡= ¡0.5 ¡* ¡(r ¡+ ¡(n ¡/ ¡r)); ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡ ¡ ¡return ¡r; ¡ } ¡ 1 ¡ ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡

= 7 = 2

slide-4
SLIDE 4
  • M. Shepperd. "A critique of cyclomatic complexity as a software metric." Software Engineering Journal 3.2 (1988)
slide-5
SLIDE 5

Citations Total 218 Last 5 years 90

slide-6
SLIDE 6

CC redundant?

  • Shepperd’s was based on 8 papers(1979-1987)
  • 7 papers followed(1991-2013)
  • Fortran, PL/1, Pascal, COBOL, C, C++, and Java
  • SLOC & CC correlate linearly

R2 = 0.65 - 0.95

slide-7
SLIDE 7

Our research

  • Identify differences in 15 papers
  • Get data
  • Reproduce!
slide-8
SLIDE 8

we do not conclude that CC is redundant with SLOC

  • Our result: R2 = 0.43
  • Difference related work:
  • Aggregation
  • Power transform
  • Larger methods correlate even less
  • Differing variance
slide-9
SLIDE 9

Corpus

1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000 10000

SLOC of a Method Frequency

1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000

CC of a Method Frequency

  • 13K Open Source Java Projects (14GB of Java)
  • 17M methods in 362M SLOC
  • E. Linstead, S. K. Bajracharya, T. C. Ngo, P. Rigor, C. V. Lopes, and P. Baldi, “Sourcerer: mining and searching

internet-scale software repositories,” Data Mining and Knowledge Discovery, 18.2 (2009).

slide-10
SLIDE 10

First result

  • Correlation (R2) : 0.43
  • Lower than other papers: 0.65 - 0.95
  • Why?
slide-11
SLIDE 11
slide-12
SLIDE 12
  • Correlation (R2) : 0.43
  • Lower than other papers: 0.65 - 0.95

Other explanations

Yes No Power transform 4 12 File level (sum) 9 6

slide-13
SLIDE 13

Power transform

0e+00 2e+06 4e+06 6e+06 8e+06 50 100 150 200 250

SLOC of a Method Frequency

1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000 10000

SLOC of a Method Frequency

slide-14
SLIDE 14

R2 = 0.70 R2 = 0.43

Method level

slide-15
SLIDE 15

File level

  • Example: 1 fjle, 30 “small” methods.
  • File SLOC = 30 * avg(SLOCm) = 30 * 2.5
  • File CC = 30 * avg(CCm) = 30 * 2
  • Volume factor causes high correlation[1]

[1] K. El Emam, S. Benlarbi, N. Goel, S.N. Rai. "The confounding effect of class size on the validity

  • f object-oriented metrics." IEEE Transactions on Software Engineering 27.7 (2001)
slide-16
SLIDE 16

R2 = 0.87 R2 = 0.65

Aggrega&on ¡causing ¡it? ¡

File level

slide-17
SLIDE 17

we do not conclude that CC is redundant with SLOC

  • Our result: R2 = 0.43
  • Difference related work:
  • Aggregation
  • Power transform
  • Larger methods correlate even less
  • Differing variance
slide-18
SLIDE 18

1e+01 1e+03 1e+05 1e+07 1 10 100 100 1000 10000

SLOC of a Method Frequency

50% 25% 10% 1% 0.1%

Israel Herraiz and Ahmed E. Hassan, “Beyond lines of code: Do we need more complexity metrics?” Making Software What Really Works, and Why We Believe It. (2010)

slide-19
SLIDE 19

Tail

  • min. SLOC # Methods

R2 “power” R2 100% 1 17.8M 0.43 0.70 50% 3 8.9M 0.45 0.62 25% 9 4.5M 0.42 0.44 10% 20 1.8M 0.38 0.27 1% 77 179K 0.29 0.05 0.1% 230 18K 0.21 0.00

Statistics

slide-20
SLIDE 20

Large Methods

slide-21
SLIDE 21

we do not conclude that CC is redundant with SLOC

  • Our result: R2 = 0.43
  • Difference related work:
  • Aggregation
  • Power transform
  • Larger methods correlate even less
  • Differing variance
slide-22
SLIDE 22

Variance

  • R2 = 0.43 means 57% variance not explained
  • Variance = actual CC – predicted CC
slide-23
SLIDE 23

Method level

slide-24
SLIDE 24

Method level log10(Method level)

slide-25
SLIDE 25

Method level log10(Method level) File level

slide-26
SLIDE 26

Method level log10(Method level) File level log10(File level)

slide-27
SLIDE 27

Differing variance complicate interpretation of linear models

slide-28
SLIDE 28

we do not conclude that CC is redundant with SLOC

  • Our result: R2 = 0.43
  • Difference related work:
  • Aggregation
  • Power transform
  • Larger methods correlate even less
  • Differing variance
slide-29
SLIDE 29

Method Level File Level Large Methods Differing variance

Summary

( data, scripts & preprint: http://is.gd/icsme_cc )