An Empirical Study of Code Clone Genealogies - - PowerPoint PPT Presentation

an empirical study of code clone genealogies
SMART_READER_LITE
LIVE PREVIEW

An Empirical Study of Code Clone Genealogies - - PowerPoint PPT Presentation

An Empirical Study of Code Clone Genealogies MiryungKim,VibhaSazawal,DavidNotkin,andGailMurphy UniversityofWashington UniversityofBritishColumbia ESEC/FSESept2005 Conventional Wisdom


slide-1
SLIDE 1

An Empirical Study of Code Clone Genealogies

MiryungKim,VibhaSazawal,DavidNotkin,andGailMurphy UniversityofWashington UniversityofBritishColumbia ESEC/FSESept2005

slide-2
SLIDE 2

Conventional Wisdom

!

  • "
  • !##$

% " % & ' !

  • "
  • !##$

% " %

slide-3
SLIDE 3

Our Previous Study of Copy and Paste Programming Practices at IBM

  • Evenskilledprogrammersoften

codecloneswithclearintent.

– Programmerscannotrefactorclonesbecauseof

  • – Programmers and untilthey

realizehowtoabstractthecommonpartofclones. – Programmersoften toclones.

[Kim et al. ISESE2004]

slide-4
SLIDE 4

Research Questions

Howdoclonesevolveovertime?

  • consistentlychanged?
  • long1lived(orshort1lived)?
  • easilyrefactorable?
slide-5
SLIDE 5

Previous Studies of Code Clones

  • automaticclonedetection

  • studiesofclonecoverageratio

– !"#$%!&$$ !

  • studiesofclonecoveragechange

– '(&)*+$,)&*+-,

..( .

slide-6
SLIDE 6

motivation q clonegenealogy:modelandtool q studyprocedureandresults

Outline

slide-7
SLIDE 7

Model of Clone Evolution

  • !"

A B A B C D A B C D D A B

slide-8
SLIDE 8
  • /
  • /
  • /
  • /
  • 1
  • 2

1 2 1 2

  • copied,

pasted, and modified consistently changed lineage lineage

slide-9
SLIDE 9

Clone Genealogy Extractor (CGE)

Givenmultipleversionsofaprogram,Vk for1≤k≤n.

  • findclonegroupsineachversionusingCCFinder.
  • findcloningrelationshipsamongclonegroupsofVi

andVi+1usingCCFinder.

  • mapclonesofVi andVi+1usingdiffbasedalgorithm.
  • separateeachconnectedcomponentofcloning

relationships(aclonegenealogy).

  • identifycloneevolutionpatternsineachgenealogy.
slide-10
SLIDE 10
slide-11
SLIDE 11

motivation clonegenealogy:modelandtool q studyprocedureandresults

Outline

slide-12
SLIDE 12

Two Java Subject Programs

224 37

(

5years8months 2years2months

  • 5756~21188

7878~23731

&30

  • versions: a set of check-in snapshots that increased or decreased the total lines
  • f code clones
slide-13
SLIDE 13

Running CGE on Java Programs

  • CCFindersetting

– minimumtokenlength=30 – longestsequencematching

  • CGEsetting

– textsimilaritythreshold=0.3

  • falsepositives

– repetitivefielddeclaration – repetitivestaticmethodinvocation – aseriesofcaseswitchstatements – etc.

slide-14
SLIDE 14

Consistently Changing Clones

456.' 7 85

  • Agenealogyhasa#$ pattern

iff alllineagesincludeatleastoneconsistent changepattern.

  • Wecountedgenealogieswitha#

$ pattern.

slide-15
SLIDE 15

Consistently Changing Clones

95

  • 38%and36%ofgenealogiesincludea

pattern.

slide-16
SLIDE 16

Volatile Clones

456.(( :'. 7 85

  • Agenealogyis#$ ifitdoesnotinclude

clonesofthefinalversion.

  • Wemeasuredtheage(lifespanorlength)of

deadgenealogies.

slide-17
SLIDE 17

Volatile Clones

95

  • 26%and34%ofclonelineageswerediscontinuedbecauseof

divergentchangesintheclonegroup. 35% 52% 2versions 48% 79% 10versions 36% 75% 5versions %

  • disappearedwithin
slide-18
SLIDE 18

How do lineages disappear?

34% 26% divergentchanges 21% 7% cutoffbythe threshold 45% 67% refactoringor removal %

  • reasons
slide-19
SLIDE 19

Locally Unfactorable Clones

456.'' '7 85

  • Aclonegroupislocallyunfactorableif

– programmerscannotusestandardrefactoringtechniques,or – programmermustdealwithcascadingnon1localchanges,or – programmerscannotremoveduplicationduetoprogramming languagelimitations.

  • Wemanuallyinspectedallgenealogiesand

countedlocallyunfactorablegenealogies.

slide-20
SLIDE 20

:(3:;9 :; . 92< '0:90< 0:90 =8933:;= > < '< '2? (@ 3:;23:; :;@ > > >2< ?=3:;9 :; '=@ 0@ .. 92@ > :(3:;9 :; . A3:;2 < '0:90< 0:90 =8933:;= > < '< '2? (@ 3:;23:; :;@ > > >2< ? =3:;9 :; '=@ 0@ ..A3:;2@ >

Locally Unfactorable Clones

slide-21
SLIDE 21

Locally Unfactorable Clones

95

  • 64%and49%ofgenealogiesarelocallyunfactorable.
slide-22
SLIDE 22

Long-Lived Clones

451(' . :'7 85

  • Wemeasuredcumulativeproportionoflocally

unfactorableandconsistentlychanged genealogies.

slide-23
SLIDE 23

Long-Lived Clones

95

  • 51%and61%ofgenealogiesthatlastedmore

thanhalfofprograms’ lifetimearelocally unfactorableandconsistentlychanging.

  • Theproportionoflocallyunfactorableyet

consistentlychangedgenealogiesincreaseswith theageofgenealogies.

slide-24
SLIDE 24

Study Limitations

  • clonedetectiontechniques

– CCFindervs.otherclonedetectiontechniques.

  • locationtrackingtechniques

– diffvs.otherlocationtrackingtechniques.

  • subjectprograms

– 20KLOCvs.largescaleprojects

  • timegranularity

– versionsvs.editingoperations

  • languagedependency

– Javavs.otherlanguages

slide-25
SLIDE 25

Summary

  • Wehavebuiltatoolthatextractshistoryof

codeclonesfromasetofprogramversions.

  • Ourstudyofclonegenealogycontradictssome

conventionalwisdomaboutcodeclones.

– Immediateandaggressiverefactoringmaybe unnecessaryforvolatileanddivergingclones. – Refactoringmaynothelpmanylong1livedand consistentlychangingclones.

  • Ourstudyopensupopportunitiesfor

complementaryclonemaintenancetools.