Staging Parser Combinators for Efficient Data Processing Parsing @ - - PowerPoint PPT Presentation

staging parser combinators for efficient data processing
SMART_READER_LITE
LIVE PREVIEW

Staging Parser Combinators for Efficient Data Processing Parsing @ - - PowerPoint PPT Presentation

Staging Parser Combinators for Efficient Data Processing Parsing @ SLE, 14 September 2014 Manohar Jonnalagedda What are they good for? Composable Each combinator builds a new parser from a previous one Context-sensitive We can


slide-1
SLIDE 1

Staging Parser Combinators for Efficient Data Processing

Manohar Jonnalagedda

Parsing @ SLE, 14 September 2014

slide-2
SLIDE 2

What are they good for?

  • Composable

○ Each combinator builds a new parser from a previous one

  • Context-sensitive

○ We can make decisions based on a specific parse result

  • Easy to Write

○ DSL-style of writing ○ Tight integration with host language

2

slide-3
SLIDE 3

Example: HTTP Response

HTTP/1.1 200 OK Date: Mon, 23 May 2013 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2012 23:11:55 GMT Etag: "3f80f-1b6-3e1cb03b" Content-Type: text/html; charset=UTF-8 Content-Length: 129 Connection: close ... payload ...

3

slide-4
SLIDE 4

Example: HTTP Response

HTTP/1.1 200 OK Date: Mon, 23 May 2013 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2012 23:11:55 GMT Etag: "3f80f-1b6-3e1cb03b" Content-Type: text/html; charset=UTF-8 Content-Length: 129 Connection: close ... payload ...

Status Headers Content

4

slide-5
SLIDE 5

Example: HTTP Response

def status = ( ("HTTP/" ~ decimalNumber) ~> wholeNumber <~ (text ~ crlf) ) map (_.toInt) Transform parse results on the fly

5

slide-6
SLIDE 6

Example: HTTP Response

def status = ( ("HTTP/" ~ decimalNumber) ~> wholeNumber <~ (text ~ crlf) ) map (_.toInt) def header = (headerName <~ ":") flatMap { key => (valueParser(key) <~ crlf) map { value => (key, value) } } Transform parse results on the fly Make decision based on parse result

6

slide-7
SLIDE 7

Example: HTTP Response

def status = ( ("HTTP/" ~ decimalNumber) ~> wholeNumber <~ (text ~ crlf) ) map (_.toInt) def header = (headerName <~ ":") flatMap { key => (valueParser(key) <~ crlf) map { value => (key, value) } } def respWithPayload = response flatMap { r => body(r.contentLength) } Transform parse results on the fly Make decision based on parse result Make decision based on parse result

7

slide-8
SLIDE 8

Parser combinators are slow

Topic of this talk.

Standard Parser Combinators Staged Parser Combinators 20x Throughput

9

slide-9
SLIDE 9

Parser Combinators are slow

def status: Parser[Int] = ( ("HTTP/" ~ decimalNumber) ~> wholeNumber <~ (text ~ crlf) ) map (_.toInt) def header = (headerName <~ ":") flatMap { key => (valueParser(key) <~ crlf) map { value => (key, value) } } def respWithPayload = response flatMap { r => body(r.contentLength) }

class Parser[T] extends (Input => ParseResult[T]) ...

10

slide-10
SLIDE 10

Parser Combinators are slow

def status: Parser[Int] = ( ("HTTP/" ~ decimalNumber) ~> wholeNumber <~ (text ~ crlf) ) map (_.toInt) def header = (headerName <~ ":") flatMap { key => (valueParser(key) <~ crlf) map { value => (key, value) } } def respWithPayload = response flatMap { r => body(r.contentLength) }

class Parser[T] extends (Input => ParseResult[T]) ... def ~[U](that: Parser[U]) = new Parser[(T,U)] { def apply(i: Input) = ... }

11

slide-11
SLIDE 11
  • Prohibitive composition overhead
  • But: composition is mostly static

○ Let us systematically remove it!

Parser Combinators are slow

12

slide-12
SLIDE 12

Staged Parser Combinators

Composition of Parsers

12

slide-13
SLIDE 13

Staged Parser Combinators

Composition of Parsers Composition of Code Generators

13

slide-14
SLIDE 14

Staging (LMS)

def add3(a: Int, b: Int, c: Int) = a + b + c add3(1, 2, 3)

6

‘Classic’ evaluation

14

slide-15
SLIDE 15

Staging (LMS)

def add3(a: Int, b: Int, c: Int) = a + b + c add3(1, 2, 3)

6

def add3(a: Rep[Int], b: Int, c: Int) = a + b + c Adding Rep types ‘Classic’ evaluation Expression in the next stage Executed at staging time Constant in the next stage Executed at staging time Constant in the next stage

15

slide-16
SLIDE 16

Staging (LMS)

def add3(a: Int, b: Int, c: Int) = a + b + c add3(1, 2, 3)

6

def add3(a: Rep[Int], b: Int, c: Int) = a + b + c Adding Rep types add3(x, 2, 3) def add$3$2$3(a:Int) = a + 5 add$3$2$3(1) ‘Classic’ evaluation Expression in the next stage Executed at staging time Constant in the next stage Executed at staging time Constant in the next stage Code generation Evaluation of generated code

16

slide-17
SLIDE 17

LMS

User-written code, may contain Rep types LMS runtime code generation Generated/optimized code.

17

slide-18
SLIDE 18

Staging Parser Combinators

class Parser[T] extends (Input => ParseResult [T])

Composition of Code Generators

class Parser[T] extends (Rep[Input] => Rep[ParseResult[T]])

static function: application == inlining for free dynamic inputs dynamic input/output

18

slide-19
SLIDE 19

Staging Parser Combinators

class Parser[T] extends (Input => ParseResult [T])

Composition of Code Generators

class Parser[T] extends (Rep[Input] => Rep[ParseResult[T]])

dynamic inputs

def ~[U](that: Parser [U]) def ~[U](that: Parser [U]) def map[U](f: T => U): Parser [U] def map[U](f: Rep[T] => Rep[U]): Parser[U]

dynamic input/output static function: application == inlining for free still a code generator

19

slide-20
SLIDE 20

Staging Parser Combinators

class Parser[T] extends (Input => ParseResult [T])

Composition of Code Generators

class Parser[T] extends (Rep[Input] => Rep[ParseResult[T]])

dynamic inputs

def ~[U](that: Parser [U]) def ~[U](that: Parser [U]) def map[U](f: T => U): Parser [U] def map[U](f: Rep[T] => Rep[U]): Parser[U] def flatMap[U](f: T => Parser[U]) : Parser[U] def flatMap[U](f: Rep[T] => Parser [U]) : Parser[U]

still a code generator dynamic input/output static function: application == inlining for free still a code generator

20

slide-21
SLIDE 21

A closer look

def respWithPayload: Parser[..] = response flatMap { r => body(r.contentLength) } // code for parsing response val response = parseHeaders() val n = response.contentLength //parsing body var i = 0 while (i < n) { readByte() i += 1 } User-written parser Generated code code generation

21

slide-22
SLIDE 22

Gotchas

  • Recursion

○ explicit recursion combinator (fix-point like)

  • Diamond control flow

○ code generation blowup

General solution

○ generate staged functions (Rep[Input => ParseResult])

22

slide-23
SLIDE 23

Performance: Parsing JSON

  • 20 times faster than Scala’s

parser combinators

  • 3 times faster than Parboiled2

23

slide-24
SLIDE 24

Performance

HTTP Response CSV

24

slide-25
SLIDE 25

If you want to know more

  • Parser Combinators for Dynamic Programming [OOPSLA ‘14]

○ based on ADP ○ code gen for GPU

  • Using Scala Macros [Scala ‘14]

25

slide-26
SLIDE 26

Desirable Parser Properties

Hand-written Parser Generators Staged Parser Combinators Composable X ✓ ✓ Customizable X X ✓ Context-Sensitive ✓ ~ ✓ Fast ✓ ✓

Easy to write X ✓ ✓

26

slide-27
SLIDE 27

The people

  • Eric Béguet
  • Thierry Coppey
  • Sandro Stucki
  • Tiark Rompf
  • Martin Odersky

27

slide-28
SLIDE 28

Tack!

Fråga?

slide-29
SLIDE 29

Staging all the way down

  • Staged structs

○ boxing of temporary results eliminated

  • Staged strings

○ substring not computed all the time

slide-30
SLIDE 30

Optimizing String handling

class InputWindow[Input](val in: Input, val start: Int, val end: Int){

  • verride def equals(x: Any) = x match {

case s : InputWindow[Input] => s.in == in && s.start == start && s.end == end case _ => super.equals(x) } }

slide-31
SLIDE 31

Beware!

  • String.substring is in linear time ( >= Java 1.6).
  • Parsers on Strings are inefficient.
  • Need to use a FastCharSequence which mimics original behaviour of substring.

Key performance impactors

Standard Parser Combinators

slide-32
SLIDE 32

Key performance impactors

Standard Parser Combinators with FastCharSequence Standard Parser Combinators

slide-33
SLIDE 33

Key performance impactors

Standard Parser Combinators with FastCharSequence Standard Parser Combinators ~7-8x FastParsers with error reporting and without inlining

slide-34
SLIDE 34

Key performance impactors

Standard Parser Combinators with FastCharSequence Standard Parser Combinators ~ 2x ~7-8x FastParsers with error reporting and without inlining FastParsers without error reporting without inlining

slide-35
SLIDE 35

Key performance impactors

Standard Parser Combinators with FastCharSequence Standard Parser Combinators FastParsers with error reporting and without inlining FastParsers without error reporting without inlining FastParsers without error reporting with inlining ~ 30% ~ 2x ~7-8x