A Transducer-Based XML Query Processor Bertram Ludscher, SDSC/CSE - - PowerPoint PPT Presentation

a transducer based xml query processor
SMART_READER_LITE
LIVE PREVIEW

A Transducer-Based XML Query Processor Bertram Ludscher, SDSC/CSE - - PowerPoint PPT Presentation

A Transducer-Based XML Query Processor Bertram Ludscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine)


slide-1
SLIDE 1

A Transducer-Based XML Query Processor

Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD

slide-2
SLIDE 2

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

slide-3
SLIDE 3

Efficient Processing of Sequentially Accessed XML Data

XML Message Transformer

Transformed XML message

Web Service

XML message

Web Service Implementations & RMI

slide-4
SLIDE 4

Web Front-End

Efficient Processing of Sequentially Accessed XML Data

XML-to-XHTML Transformer XML file Web Development XHTML page

slide-5
SLIDE 5

Efficient Processing of Sequentially Accessed XML Data

Archive Transformation & ETL (Extraction Transformation & Loading) Applications XML Processor XML archive file XML target file

slide-6
SLIDE 6

Efficient Processing of Sequentially Accessed XML Data

Sensor Data Processor

Stream Stream

Acting/ Mining Software XML Sensor Data Analysis

slide-7
SLIDE 7

Bandwidth & Connectivity will Increase the Amount of Data …

XML Sensor Data Processor

XML stream XML stream

XML

XML stream XML stream

X M L XML XML

slide-8
SLIDE 8

…Hardware Advances do not Favor Conventional Architectures

Magnitude Year CPU Speed CPU2Memory Speed Bandwidth

slide-9
SLIDE 9

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

slide-10
SLIDE 10

Transducer-Based Processing: On-the-Fly & Minimal Memory

Condition | Action

… … Buffers XML Stream Machine … … Input buffer Output buffer

Condition | Action

slide-11
SLIDE 11

XML Stream Machine (XSM) High-Level Architecture

XQuery Compiler XSM-to-C Compiler XSM XQuery C program Optional Input DTD

slide-12
SLIDE 12

Components of the XQuery Compiler

XQuery-to-Network Translation XSM Composition XSM Network Single XSM XQuery Schema Optimization Optional Input DTD

slide-13
SLIDE 13

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

slide-14
SLIDE 14

for-where-return Expressions

XQuery Subset

Path Expressions Element Construction Concatenation for $X in $R/a return for $Y in $X/b return <res> $Y, $X </res>

slide-15
SLIDE 15

XML Stream: Tags, Data & Control Tokens

<r> <a> <b> 5 </b> <b> 1 </b></a>

XML Stream is Sequence of Data Open Tag & Close Tag Tokens Control Tokens S$R E$R

slide-16
SLIDE 16

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

slide-17
SLIDE 17

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C Concatenation of bindings of Y, X into bindings of Z

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy

<b> 5 </b>

x y Input Buffer Y Input Buffer X Sz Output Buffer Z

<b> 5 </b><a>5 </b> <b> 1 </b> </a>Ez

slide-18
SLIDE 18

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

slide-19
SLIDE 19

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

slide-20
SLIDE 20

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy Sz

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

slide-21
SLIDE 21

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy Sz

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

<b>

slide-22
SLIDE 22

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy Sz

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

<b> 5 </b>

slide-23
SLIDE 23

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy Sz

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

<b> 5 </b>

slide-24
SLIDE 24

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy Sz

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

<b> 5 </b><a>

slide-25
SLIDE 25

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy Sz

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Output Buffer Z

<b> 5 </b><a>5 </b> <b> 1 </b> </a>

slide-26
SLIDE 26

XML Stream Machine (XSM)

1 2 3

*y=S

y

| y++ *x=S

x

| w(z,S

z

), x++ *y=E

y

| y++ *y!=Ey | w(z,*y), y++ *x!=Ex | w(z,*x), x++ *x=E

x

| w(z,E

z

), x++

C

<a> <b> 5 </b> <b> 1 </b></a>

Sx Ex

Sx

<b> 1 </b>

Sy

Ey Sy Ey Sy

<b> 5 </b>

x y z Input Buffer Y Input Buffer X Sz Output Buffer Z

<b> 5 </b><a>5 </b> <b> 1 </b> </a>Ez

slide-27
SLIDE 27

Comparison of XSM against State Automata & Transducers

State Automata

Do not construct Do not store

intermediate results

Sufficient for

XPath only Transducers

Finite alphabets State is the

  • nly memory

No reset of

input pointers XSM

Unbounded

alphabet

Buffers Pointer reset

slide-28
SLIDE 28

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

slide-29
SLIDE 29

XSM Networks: Intermediate Step in Translating Queries to XSMs

XQuery-to-Network Translation XSM Composition XSM Network Single XSM XQuery

slide-30
SLIDE 30

XSM Network

for $X in $R/a return for $Y in $X/b return <res> $Y, $X </res> $R $R/a $X $X/b $Y For $Y [$Y,$X] → [$Y’,$X’] $X’ $Y’

<res> $Z </res>

$O $Y’,$X’ $Z

slide-31
SLIDE 31

From XQueries to XSM Networks: Non-FLWR Expressions

<res> $Y, $X </res> $X $Y $O $Z $Y,$X $X $Y

<res> $Z </res>

$O

slide-32
SLIDE 32

From XQueries to XSM Networks: FLWRs without Free Variables

for $X in G return expr($X) $X $R G expr($X) $O

slide-33
SLIDE 33

From XQueries to XSM Networks: FLWRs with Free Variables

for $Y in $X/b return <res> $Y, $X </res> free variable $X $X $X/b $Y For $Y [$Y,$X] → [$Y’,$X’] $X’ $Y’ <res> $Y’, $X’ </res> $O

slide-34
SLIDE 34

Overview

Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

slide-35
SLIDE 35

Composition Merges Two XSMs Into One

$R $R/a $X $X/b $Y For $Y [$Y,$X] → [$Y’,$X’] $X’ $Y’

<res> $Z </res>

$O $Y’,$X’ $Z

slide-36
SLIDE 36

Composition Merges Two XSMs into One

$R $R/a $X $X/b $Y For $Y [$Y,$X] → [$Y’,$X’] $X’ $Y’ $O <res> $Y’, $X’ </res>

slide-37
SLIDE 37

XSM Composition: “State Product” Emulates Producer-Consumer

Producer M1 Consumer M2

q1 q1 q2

“State Product” M3 = (M2 o M1)

q2

slide-38
SLIDE 38

M1 M2

Naive Composition

q1 q1’

ϕ1|A1

... ...

q2 q2’

ϕ2|A2

... ...

q1 q2 q1 q2’

ψ∧ϕ2|A2

... ...

q1 q2 q1’ q2

¬ψ∧ϕ1|A1

... ...

M3 = (M2 o M1) M2 step if ψ(q2) M1 step if ¬ψ(q2) ψ(q2) = ¬AE(r1) ∧ ... ∧ ¬AE(rn)

= “no shared read-pointer ri of q2 is At End”

r1 ... rn

slide-39
SLIDE 39

Smart Composition

Normalization Assumptions:

#( read-pointers-into-shared-buffer(q2) ) ≤ 1 Atomic actions only

Basic idea:

avoid runtime tests (“At-End”) whenever

  • utcome can be determined at compile-

Different “modes”:

go: consumer M2 proceeds (full buffer) no: producer M1 proceeds (empty buffer) may be consumer can follow immediately ae: do runtime check AE:

slide-40
SLIDE 40

Smart Composition: no Case (shared buffer is empty)

A1 does not

write to the shared buffer M2 does not wait

  • n shared buffer

Transition inserted Case

no q’1 q2

ϕ1|A1

q1 q2 no

ϕ2|A2

no q’2 q1 q2 no

M1 M2

q1 q1’

ϕ1|A1

... ...

q2 q2’

ϕ2|A2

... ...

q1

slide-41
SLIDE 41

Smart Composition: Producer fills buffer

If A1 writes to the shared buffer, but M2 doesn’t advance its read pointer If A1 writes token to the shared buffer and M2 consumes token

Transition inserted Case

no q’1 q’2

ϕ12|A12

q1 q2 no go q’1 q’2

ϕ12|A12

q1 q2 no Combination of A1 with A2 Combination of ϕ1 with ϕ 2

slide-42
SLIDE 42

Smart Composition: go - ae - no

no q1 q’2 go q1 q’2 go q1 q2

ϕ2|A2 ϕ2|A2

if A2 advances the read pointer into shared buffer in go mode if A2 does not advance read pointer into shared buffer go q1 q2

slide-43
SLIDE 43

Smart Composition: go - ae - no

in ae mode: insert transitions for M2 step if

possible ...

If ø2 ; A2 has no read from the shared buffer if ø2 ; A2 has a read from the shared buffer ae q1 q’2

ϕ2|A2

q1 q2 ae ae q1 q’2

¬ AE(r)∧ϕ2|A2

q1 q2 ae

slide-44
SLIDE 44

Smart Composition: go - ae - no

q’1 q2 ae q’1 q2

AE(r)∧ϕ1|A1 AE(r)∧ϕ1|A1

if A1 has one write into the shared buffer AND transitions corresponding to M1 step ... if A1 has more than

  • ne write into the

shared buffer q1 q2 ae no q’1 q2

AE(r)∧ϕ1|A1

if A1 has no write into the shared buffer q1 q2 ae q1 q2 ae go

slide-45
SLIDE 45

Performance Datapoint (Transformation Query on DBLP)

4640 32078 80000 1156 8266 102710 20000 312 2360 7031 5000 30 266 663 4 XSM C XSM Java Xalan (ms) Data Size (KB)

slide-46
SLIDE 46

Conclusions & Future Work

Novel query processor model Success in filtering & transformation To be extended for joins & aggregations Memory footprint questions

Facilitated by model’s simplicity

slide-47
SLIDE 47

Related Work

Relational Data Streams & Sequence

Data Models

Pipelined Join Operators Aggregates & Approximations Fast XPath on streams Memory requirements of validating XML

slide-48
SLIDE 48

Smart Composition: go - ae - no

ae q’1 q2

ϕ1|A1

if A1 does not advance shared write pointer in no mode: execute M1 step ... if A1 does advance shared write pointer q1 q2 no if A2 advances shared read pointer if A2 does not advance shared read pointer go q’1 q’2

ϕ12|A12

q1 q2 no ... AND possibly M2 step simplified composed ϕ1∧ϕ2 and (A1;A2) no q’1 q2

ϕ1|A1

q1 q2 no no q’1 q’2

ϕ12|A12

q1 q2 no