Query Construction Patterns in PHP David Anderson and Mark Hills - - PowerPoint PPT Presentation

query construction patterns in php
SMART_READER_LITE
LIVE PREVIEW

Query Construction Patterns in PHP David Anderson and Mark Hills - - PowerPoint PPT Presentation

Query Construction Patterns in PHP David Anderson and Mark Hills 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017), ERA Track February 21-24, 2017 Klagenfurt, Austria http://www.rascal-mpl.org 1


slide-1
SLIDE 1

Query Construction Patterns in PHP http://www.rascal-mpl.org

David Anderson and Mark Hills 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2017), ERA Track February 21-24, 2017 Klagenfurt, Austria

1

slide-2
SLIDE 2

Obligatory Cute Kitties!

slide-3
SLIDE 3

Context: PHP and MySQL

  • Original MySQL API introduced in PHP 2
  • Widely used
  • No support for object-oriented language features
  • No support for prepared statements, stored procedures
  • Deprecated in PHP 5.5, dropped in PHP 7

3

slide-4
SLIDE 4

Our goal: program transformation!

  • We want to replace uses of MySQL API with


either MySQL Improved (mysqli) or PDO, based


  • n user preference
  • Want to move more towards features like prepared statements,

provides better protection again SQL injection vulnerabilities (like this one!):

4

$query = mysql_query("
 SELECT title 
 FROM semesters 
 WHERE semesterid = $_POST[semester]
 ");

slide-5
SLIDE 5

So, what’s the problem?

  • Safe transformations challenging in PHP!
  • Dynamic features, inclusion model, heavy use of


strings and implicit type coercions all make this harder

  • So, focus our efforts:
  • Can we exploit common usage patterns?
  • What additional analysis do we need?
  • Do we hit a point of diminishing returns? Where?

5

slide-6
SLIDE 6

Query construction patterns

  • How are queries typically built in PHP scripts?
  • What parts of a query tend to be dynamic?
  • What features are used to build these dynamic query parts?

6

slide-7
SLIDE 7

Corpus & methodology

  • Starting small, see paper for details…
  • Analysis scripted in Rascal for reproducibility

7

slide-8
SLIDE 8

Which patterns were found?

  • Literal query strings (QCP-1)
  • Cascading concatenating assignments (QCP-2)
  • Assignments distributed over control flow (QCP-3)
  • Dynamic query strings (QCP-4)

8

slide-9
SLIDE 9

How often did they appear?

  • Literal strings: surprisingly often
  • Dynamic queries: most common
  • Most dynamic pieces are variables or array lookups
  • Several are function calls or ternary operations, but

comparatively few

  • Almost all used as parameters
  • Other two: not very common (may mean patterns are too specific!)

9

slide-10
SLIDE 10

What does this mean?

  • Assuming reasonable queries being built (an assumption we are

still validating), results are encouraging — many dynamic parts used as parameters, others often variables or arrays versus unusual features

  • But, need more powerful analysis in general, especially to ensure

soundness for transformations

10

slide-11
SLIDE 11

Threats to validity

  • Results could be very specific to the


selected systems

  • Some systems older, no longer


maintained

11

slide-12
SLIDE 12

Threats to validity

  • Results could be very specific to the


selected systems

  • Some systems older, no longer


maintained

  • Mitigation: focus on evolution, so older


systems are normal; have included a
 variety of systems

  • Still an issue, though: a more extensive evaluation is ongoing

12

slide-13
SLIDE 13

What have we learned? What’s left?

  • Queries appear to be built in predictable patterns
  • Dynamic parts are mainly in the “right” places, making a

transformation to prepared statements possible

  • We need a more extensive analysis with more systems and more

precise and sound analysis algorithms

  • We need better models of the queries themselves (current work) for

more precise pattern identification

  • We need to build the transformation!

13

slide-14
SLIDE 14

Questions for the audience

  • What about our results is unexpected, or would 


maybe be invalidated by a more extensive analysis?

  • What analysis are we missing? What should we add?
  • Do you think newer systems would show much different results?
  • Can you think of other applications (e.g., program comprehension)

that we could apply this to?

14

slide-15
SLIDE 15
  • Rascal: http://www.rascal-mpl.org
  • Me: http://www.cs.ecu.edu/hillsma

15

Thank you! Any Questions? Discussion