Similar code fragment A code fragment that has similar part to it in - - PowerPoint PPT Presentation

similar code fragment
SMART_READER_LITE
LIVE PREVIEW

Similar code fragment A code fragment that has similar part to it in - - PowerPoint PPT Presentation

Similar code fragment A code fragment that has similar part to it in source code introduced in source code because of various reasons. e.g. copy-and-paste makes software maintenance difficult. It is necessary to It is


slide-1
SLIDE 1

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1

Similar code fragment

 A code fragment that has similar part to it in source

code

 introduced in source code because of various reasons.

 e.g. “copy-and-paste”

 makes software maintenance difficult.

Similar code fragment CF1 If CF1 is defective… It is necessary to check a2. It is necessary to check CF2 and CF3 CF2 CF3 Source file Source file

slide-2
SLIDE 2

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 2

Similar defects in Linux 2.6.6

for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1];

}

for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = (char *) prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = (unsigned long) prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1]; }

slide-3
SLIDE 3

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 3

Similar defects in Linux 2.6.6

for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1];

}

for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = (char *) prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = (unsigned long) prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1]; }

Type cast operations are inserted. Clone detection tools cannot treat the code fragments as a clone pair.

slide-4
SLIDE 4

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 4

An overview of proposed method

Input code fragment (Query) Target source files Lexical Analysis

Ii[0] Ii[ni]

Input identifier list Lexical Analysis

It1[0] It1[nt1] It2[0] It2[nt2] Itn[0] Itn[ntn]

Target identifier lists Comparison Similar sublists

Is1[0] Is1[ns1] Is2[0] Is2[ns2]

Ranking

Isn[0] Isn[nsn] Rank Start line # End line # Similarity 1 Lines1 Linee1 Sim1 2 Lines2 Linee2 Sim2

Similarity Ranking

The method retrieves code fragments similar to an input code fragment.

slide-5
SLIDE 5

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 5

Comparison

 Scan a target identifier list with a sliding window

 We compare identifiers in the sliding window with the

input identifier list.

 Extract a code fragment corresponding to the

sliding window if the window involves one or more identifiers in the input list

It[3] It[0] It[1] It[2] It[n] It[n-1]

Input identifier list

Ii[0] Ii[1] Ii[2]

Sliding Window (fixed length) The direction of movement

  • f the sliding window

Target identifier list

slide-6
SLIDE 6

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 6

Similarity-based ranking

 The extracted code fragments are sorted

according to the following similarity.

 Si : a set of elements in an input identifier list  Sw: a set of elements in a sliding window

 Developers investigate the resultant

similarity-based ranking.

slide-7
SLIDE 7

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 7

Case Study

 Target open source software systems

 arch/ directory in Linux 2.6.6

 Architecture-specific implementations in OS  2 incorrect pointer accesses

 server/ directory in Canna 3.6

 Japanese input system  19 buffer overflow errors

 Procedure

  • 1. extract code fragments sharing similar defects
  • 2. enter each code fragment into the tool implementing
  • ur method
  • 3. inspect if the similarity ranking ranks highly code

fragments involving defects

slide-8
SLIDE 8

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 8

Result

 Linux 2.6.6

 We used 2 code fragments as queries.

 Each code fragment involves an incorrect pointer access.

 In both of those queries, the 2 code fragments are

the top 2.

 Canna 3.6

 We used 19 code fragments as queries.

 Each code fragment involves a buffer overflow error.

 In all of those queries, 18 or 19 code fragments are

the top 30.

In our case studies, we could detect most of similar defects.

slide-9
SLIDE 9

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 9

Summary & Future work

 We proposed a method to retrieve similar

code fragments based on identifier similarity.

 Sliding window comparison  Similarity-based ranking

 We need further case studies.

 Application to similar defects in other software

systems

 Effects from changing “similarity” definition