Take Everything From Me, But Leave Me The Comprehension DBPL - - PowerPoint PPT Presentation

take everything from me but leave me the comprehension
SMART_READER_LITE
LIVE PREVIEW

Take Everything From Me, But Leave Me The Comprehension DBPL - - PowerPoint PPT Presentation

Take Everything From Me, But Leave Me The Comprehension DBPL September 2017 Torsten Grust db.inf.uni-tuebingen.de Apologies, I am only a database person Torsten Grust U Tbingen 2 Apologies, I am only a database person It


slide-1
SLIDE 1

Torsten Grust

db.inf.uni-tuebingen.de

Take Everything From Me, But Leave Me The Comprehension

DBPL — September 2017

slide-2
SLIDE 2

Torsten Grust U Tübingen 2

Apologies, I am only
 a database person

slide-3
SLIDE 3

Torsten Grust U Tübingen 2

It is in this connection worth noticing that in the Comm.ACM the papers

  • n data bases […] are of markedly lower

linguistic quality than the others.

—Edsger Dijkstra (EWD691)

Apologies, I am only
 a database person

slide-4
SLIDE 4

Torsten Grust U Tübingen

The point is that the way in which the database management experts tackle the problems seems to be so grossly inadequate. They seem to form an inbred crowd with very little knowledge of computing science in general, who tackle their problems primarily politically instead of scientifically. Often they seemed to be mentally trapped by the intricacies of early, rather ad hoc solutions to rather accidental problems; as soon as such a technique has received a name, it becomes "a database concept".

3

—Edsger Dijkstra (EWD577)

Torsten Grust U Tübingen

Apologies, I am only
 a database person

slide-5
SLIDE 5

Torsten Grust U Tübingen

so grossly inadequate. They seem to form an inbred crowd with very little knowledge of computing science in general, who tackle their problems primarily politically instead of scientifically. Often they seemed to be mentally trapped by the intricacies of early, rather ad hoc solutions to rather accidental problems; as soon as such a technique has received a name, it becomes "a database concept". And a totally inadequate use of language, sharpening their pencils with a blunt axe. I learned a few things about Databases. I learned —or: had my tentative impression confirmed— that the term "Database Technology", although sometimes used, is immature, for there is hardly any underlying

3

—Edsger Dijkstra (EWD577)

Torsten Grust U Tübingen

Apologies, I am only
 a database person

slide-6
SLIDE 6

Torsten Grust U Tübingen

accidental problems; as soon as such a technique has received a name, it becomes "a database concept". And a totally inadequate use of language, sharpening their pencils with a blunt axe. I learned a few things about Databases. I learned —or: had my tentative impression confirmed— that the term "Database Technology", although sometimes used, is immature, for there is hardly any underlying "science" that could justify the use of the term "technology".

3

—Edsger Dijkstra (EWD577)

Torsten Grust U Tübingen

Apologies, I am only
 a database person

slide-7
SLIDE 7

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

slide-8
SLIDE 8

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

  • 1. Successively draw bindings for x from domain xs,
  • 2. for those bindings that pass filter p,
  • 3. evaluate head h,

head filter generator

slide-9
SLIDE 9

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

  • 1. Successively draw bindings for x from domain xs,
  • 2. for those bindings that pass filter p,
  • 3. evaluate head h,
  • 4. collect results to form a list.

head filter generator

slide-10
SLIDE 10

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

  • 1. Successively draw bindings for x from domain xs,
  • 2. for those bindings that pass filter p,
  • 3. evaluate head h,
  • 4. collect results to form

M

an M.

head filter generator

slide-11
SLIDE 11

Torsten Grust U Tübingen

Extension vs. Intension

5

{ }

slide-12
SLIDE 12

Torsten Grust U Tübingen

Extension vs. Intension

5

I, III, V, VII, IX

{ }

slide-13
SLIDE 13

Torsten Grust U Tübingen

Extension vs. Intension

5

I, III, V, VII, IX [ roman x | x ← [1…10], odd x ]

set

{ }

slide-14
SLIDE 14

Torsten Grust U Tübingen

In the Beginning …

6

Relational Completeness of Data Base Sublanguages

  • E. F. Codd, IBM Research Report RJ987, 1972
slide-15
SLIDE 15

Torsten Grust U Tübingen

In the Beginning …

6

Relational Completeness of Data Base Sublanguages

  • E. F. Codd, IBM Research Report RJ987, 1972

head filter generator

slide-16
SLIDE 16

Torsten Grust U Tübingen

Today’s XQuery 3.0

7

Core of XQuery: versatile FLWOR expression

XQuery 3.0: An XML Query Language

  • D. Chamberlin et al., W3C Recommendation, April 2014
slide-17
SLIDE 17

Torsten Grust U Tübingen

Today’s XQuery 3.0

7

Core of XQuery: versatile FLWOR expression

XQuery 3.0: An XML Query Language

  • D. Chamberlin et al., W3C Recommendation, April 2014

head generator filter

slide-18
SLIDE 18

Torsten Grust U Tübingen

Today’s XQuery 3.0

7

Core of XQuery: versatile FLWOR expression

XQuery 3.0: An XML Query Language

  • D. Chamberlin et al., W3C Recommendation, April 2014

head generator generator head filter

slide-19
SLIDE 19

Torsten Grust U Tübingen

Early XQuery

8

A Data Model and Algebra for XQuery

  • M. Fernandez et al., October 2003
slide-20
SLIDE 20

Torsten Grust U Tübingen

Early XQuery

8

A Data Model and Algebra for XQuery

  • M. Fernandez et al., October 2003

generator head filter

slide-21
SLIDE 21

Torsten Grust U Tübingen 1 module Query where 2 import Prelude hiding (elem,index) 3 4 -- Data Model: Constructors ---------------------------------------------------- 5 6 text :: String -> Node 7 elem :: Tag -> [Node] -> Node 8 ref :: Node -> Node 9 10 year0 :: Node 11 year0 = elem "@year" [ text "1999" ] 12 13 book0 :: Node 14 book0 = elem "book" [ 15 elem "@year" [ text "1999" ], 16 elem "title" [ text "Data on the Web" ], 17 elem "author" [ text "Abiteboul" ], 18 elem "author" [ text "Buneman" ], 19 elem "author" [ text "Suciu" ]] 20 21 bib0 :: Node 22 bib0 = elem "bib" [ 23 elem "book" [ 24 elem "@year" [ text "1999" ], 25 elem "title" [ text "Data on the Web" ],

An XQuery Nucleus

9 Torsten Grust U Tübingen

slide-22
SLIDE 22

Torsten Grust U Tübingen

An XQuery Nucleus

9 Torsten Grust U Tübingen

  • ≈ 430 lines of Haskell (300+ lines of examples)
  • Implements a complete XQuery core, 


including tree construction and traversal

  • List comprehensions express path navigation, FLOWR,

grouping/aggregation, quantification

slide-23
SLIDE 23

Torsten Grust U Tübingen

LINQ

10

Comprehension syntax deeply embedded into C#, with monad-based semantics organized around SelectMany (aka >>=, flatmap)

The World According to LINQ

  • E. Meijer, October 201

1

slide-24
SLIDE 24

Torsten Grust U Tübingen

LINQ

10

Comprehension syntax deeply embedded into C#, with monad-based semantics organized around SelectMany (aka >>=, flatmap)

The World According to LINQ

  • E. Meijer, October 201

1

generator head filter

slide-25
SLIDE 25

Torsten Grust U Tübingen

Emma

1 1

Implicit Parallelism through Deep Language Embedding

  • A. Alexandrov et al., SIGMOD 2015

Deep embedding of comprehensions in Scala, compiles to Apache Flink / Spark

slide-26
SLIDE 26

Torsten Grust U Tübingen

Emma

1 1

Implicit Parallelism through Deep Language Embedding

  • A. Alexandrov et al., SIGMOD 2015

Deep embedding of comprehensions in Scala, compiles to Apache Flink / Spark

generator head filter

slide-27
SLIDE 27

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs

slide-28
SLIDE 28

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs

generator head filter

slide-29
SLIDE 29

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs

generator head filter

slide-30
SLIDE 30

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs Told you so.

generator head filter

slide-31
SLIDE 31

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

slide-32
SLIDE 32

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

generator head filter

slide-33
SLIDE 33

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

generator head filter head generator filter

slide-34
SLIDE 34

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

generator head filter head generator filter generator head head filter generator head filter generator

slide-35
SLIDE 35

Torsten Grust U Tübingen 14

slide-36
SLIDE 36

Torsten Grust U Tübingen 14

SQL

slide-37
SLIDE 37

Torsten Grust U Tübingen

One Way to Teach SQL

15

slide-38
SLIDE 38

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S

slide-39
SLIDE 39

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; ∅ ⨄

slide-40
SLIDE 40

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; ∅ ⨄

slide-41
SLIDE 41

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; ∅ ⨄ –∞ max2

slide-42
SLIDE 42

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S 0 < ALL(SELECT A FROM S) c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; ∅ ⨄ –∞ max2

slide-43
SLIDE 43

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S 0 < ALL(SELECT A FROM S) c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; c ← true; foreach x ∈ S do c ← c ∧ (0 < x.A); return c; ∅ ⨄ –∞ max2 true ∧

slide-44
SLIDE 44

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S 0 < ALL(SELECT A FROM S) c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; c ← true; foreach x ∈ S do c ← c ∧ (0 < x.A); return c;

∅ ⨄

–∞ max2 true

slide-45
SLIDE 45

Torsten Grust U Tübingen

One Program Form for SQL

16

slide-46
SLIDE 46

Torsten Grust U Tübingen

One Program Form for SQL

16

c ← z; foreach x ∈ xs do c ← f (c ,x); return c;

fold(z,f,xs) ≡

slide-47
SLIDE 47

Torsten Grust U Tübingen

One Program Form for SQL

16

M carrier liftM zM ⊕M bag bag t {⋅} ∅ ⊎ set set t {⋅} ∅ ∪ list list t [⋅] [ ] + + all bool id TRUE ∧ some bool id FALSE ∨ sum num id + max t (ordered) id

max2 min t (ordered) id ∞ min2

true false

c ← z; foreach x ∈ xs do c ← f (c ,x); return c;

fold(z,f,xs) ≡

slide-48
SLIDE 48

Torsten Grust U Tübingen

One Program Form for SQL

17

slide-49
SLIDE 49

Torsten Grust U Tübingen

One Program Form for SQL

17

SELECT A FROM S WHERE A > B

fold(∅,⊕,S) with ⊕(c,x) = c ⊎ (if (x.A > x.B) {x.A} else ∅)

slide-50
SLIDE 50

Torsten Grust U Tübingen

One Program Form for SQL

17

SELECT A FROM S WHERE A > B SELECT x.A,y.B FROM R x,S y

fold(∅,⊕,S) with ⊕(c,x) = c ⊎ (if (x.A > x.B) {x.A} else ∅) fold(∅,⊕,R) with ⊕(c,x) = c ⊎ fold(∅,⊗,S) with ⊗(d,y) = d ⊎ {(x.A,y.B)}

slide-51
SLIDE 51

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

slide-52
SLIDE 52

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B)

fold(0,⊕,fold(∅,⊗,R)) with with ⊕(c,_) = c + 1 ⊗(d,x) = d ⊎ if (fold(false,⊙,S)) {x} else ∅ with ⊙(e,y) = e ∨ (x.A = y.B)

slide-53
SLIDE 53

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B)

fold(0,⊕,fold(∅,⊗,R)) with with ⊕(c,_) = c + 1 ⊗(d,x) = d ⊎ if (fold(false,⊙,S)) {x} else ∅ with ⊙(e,y) = e ∨ (x.A = y.B)

Algebraic Wonderland.

slide-54
SLIDE 54

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B)

fold(0,⊕,fold(∅,⊗,R)) with with ⊕(c,_) = c + 1 ⊗(d,x) = d ⊎ if (fold(false,⊙,S)) {x} else ∅ with ⊙(e,y) = e ∨ (x.A = y.B)

Algebraic Wonderland. REJECT!

slide-55
SLIDE 55

Torsten Grust U Tübingen

Comprehension Semantics

19

slide-56
SLIDE 56

Torsten Grust U Tübingen

Comprehension Semantics

19

[ e | v1 ← e1, q ]M [ e | ]M [ e | p, q ]M

slide-57
SLIDE 57

Torsten Grust U Tübingen

Comprehension Semantics

19

[ e | v1 ← e1, q ]M [ e | ]M [ e | p, q ]M liftM(e)

≡ ≡ ≡

slide-58
SLIDE 58

Torsten Grust U Tübingen

Comprehension Semantics

19

[ e | v1 ← e1, q ]M [ e | ]M [ e | p, q ]M liftM(e) if (p) [ e | q ]M else zM fold(zM,⊗,e1) with ⊗(c,v1) = c ⊕M [ e | q ]M

≡ ≡ ≡

slide-59
SLIDE 59

Torsten Grust U Tübingen

Comprehensible SQL

20

slide-60
SLIDE 60

Torsten Grust U Tübingen

Comprehensible SQL

20

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

slide-61
SLIDE 61

Torsten Grust U Tübingen

Comprehensible SQL

20

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag

slide-62
SLIDE 62

Torsten Grust U Tübingen

Comprehensible SQL

20

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag [ true | _ ← ]some

slide-63
SLIDE 63

Torsten Grust U Tübingen

Comprehensible SQL

20

[ 1 | x ← R, ]sum

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag [ true | _ ← ]some

slide-64
SLIDE 64

Torsten Grust U Tübingen

Comprehensible SQL

20

[ 1 | x ← R, [ x.A = y.B | y ← S ]some ]sum [ 1 | x ← R, ]sum

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag [ true | _ ← ]some

slide-65
SLIDE 65

Torsten Grust U Tübingen

Comprehension Unnesting

21

[ e | qs1, v ← [ ]N, qs3 ]M [ e | qs1, v ← [ e2 ]N, qs3 ]M [ e | qs1, v ← [ e2 | qs2 ]N, qs3 ]M [ e | qs1, [ e2 | qs2 ]some, qs3 ]M

slide-66
SLIDE 66

Torsten Grust U Tübingen

Comprehension Unnesting

21

[ e | qs1, v ← [ ]N, qs3 ]M [ e | qs1, v ← [ e2 ]N, qs3 ]M [ e | qs1, v ← [ e2 | qs2 ]N, qs3 ]M [ e | qs1, [ e2 | qs2 ]some, qs3 ]M [ ]M [ e[e2/v] | qs1, qs3[e2/v] ]M [ e[e2/v] | qs1, qs2, qs3[e2/v] ]M [ e | qs1, qs2, e2, qs3 ]M

(⊕M idempotent)

slide-67
SLIDE 67

Torsten Grust U Tübingen 22

On Optimizing an SQL-like Nested Query

  • W. Kim, ACM TODS, 1982

When Syntax Distracts

slide-68
SLIDE 68

Torsten Grust U Tübingen 22

On Optimizing an SQL-like Nested Query

  • W. Kim, ACM TODS, 1982

When Syntax Distracts

slide-69
SLIDE 69

Torsten Grust U Tübingen 22

On Optimizing an SQL-like Nested Query

  • W. Kim, ACM TODS, 1982

When Syntax Distracts

Implemented in most RDBMSs to this day

slide-70
SLIDE 70

Torsten Grust U Tübingen

When Syntax Distracts

23

  • Syntactic classification of nested SQL queries into


types N, Nx, D, J, A, JA, JA(NA), JA(AA), JA(AN), …

  • Classes are associated with their particular


SQL–level unnesting rewrites.

slide-71
SLIDE 71

Torsten Grust U Tübingen

When Syntax Distracts

23

  • Syntactic classification of nested SQL queries into


types N, Nx, D, J, A, JA, JA(NA), JA(AA), JA(AN), …

  • Classes are associated with their particular


SQL–level unnesting rewrites.

slide-72
SLIDE 72

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y))

slide-73
SLIDE 73

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y))

[ f(x) | x ← R, [ p(x) = v | v ← [ g(y)| y ← S, q(x,y) ]bag ]some ]set

slide-74
SLIDE 74

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y))

[ f(x) | x ← R, [ p(x) = g(y) | y ← S, q(x,y) ]some ]set

slide-75
SLIDE 75

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y)) SELECT DISTINCT f(x) FROM R AS x, S AS y WHERE q(x,y) AND p(x) = g(y)

[ f(x) | x ← R, y ← S, q(x,y), p(x) = g(y) ]set

slide-76
SLIDE 76

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

  • f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department
  • f Computer Science, Columbia University
damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particu- lar class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic
  • f a
group query is that it can be executed in a group- by-group
  • fashion. In other words, the underlying
relation(s) can be partitioned (based on some set
  • f attributes)
into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some per- formance results for group queries expressed in standard SQL, comparing a commercial database system with
  • ur optimized
plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title
  • f the publication
and its date appear, and notice is given that copying is by permission
  • f the Very Large Data Base
Endowment. To copy otherwise,
  • r to republish,
requires a fee and/or special permission from the Endowment. Proceedings
  • f the 23rd VLDB
Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))
  • r complex
(e.g. include joins, selections, fur- ther aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places
  • f a complex
piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently
  • n each partition.
Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition
  • f the database, and (c)
taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict
  • urselves to a special syntax.
Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically gener- ated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

  • D. Chatziantoniou, K.A. Ross, VLDB 1997
slide-77
SLIDE 77

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

  • f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department
  • f Computer Science, Columbia University
damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particu- lar class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic
  • f a
group query is that it can be executed in a group- by-group
  • fashion. In other words, the underlying
relation(s) can be partitioned (based on some set
  • f attributes)
into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some per- formance results for group queries expressed in standard SQL, comparing a commercial database system with
  • ur optimized
plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title
  • f the publication
and its date appear, and notice is given that copying is by permission
  • f the Very Large Data Base
Endowment. To copy otherwise,
  • r to republish,
requires a fee and/or special permission from the Endowment. Proceedings
  • f the 23rd VLDB
Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))
  • r complex
(e.g. include joins, selections, fur- ther aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places
  • f a complex
piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently
  • n each partition.
Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition
  • f the database, and (c)
taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict
  • urselves to a special syntax.
Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically gener- ated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

  • D. Chatziantoniou, K.A. Ross, VLDB 1997

SELECT f(x), agg(g(x)) FROM R AS x GROUP BY f(x)

slide-78
SLIDE 78

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

  • f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department
  • f Computer Science, Columbia University
damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particu- lar class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic
  • f a
group query is that it can be executed in a group- by-group
  • fashion. In other words, the underlying
relation(s) can be partitioned (based on some set
  • f attributes)
into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some per- formance results for group queries expressed in standard SQL, comparing a commercial database system with
  • ur optimized
plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title
  • f the publication
and its date appear, and notice is given that copying is by permission
  • f the Very Large Data Base
Endowment. To copy otherwise,
  • r to republish,
requires a fee and/or special permission from the Endowment. Proceedings
  • f the 23rd VLDB
Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))
  • r complex
(e.g. include joins, selections, fur- ther aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places
  • f a complex
piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently
  • n each partition.
Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition
  • f the database, and (c)
taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict
  • urselves to a special syntax.
Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically gener- ated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

  • D. Chatziantoniou, K.A. Ross, VLDB 1997

[ ⟨f(x), [ g(y) | y ← R, f(y) = f(x) ]agg⟩ | x ← R ]set

slide-79
SLIDE 79

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

  • f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department
  • f Computer Science, Columbia University
damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particu- lar class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic
  • f a
group query is that it can be executed in a group- by-group
  • fashion. In other words, the underlying
relation(s) can be partitioned (based on some set
  • f attributes)
into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some per- formance results for group queries expressed in standard SQL, comparing a commercial database system with
  • ur optimized
plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title
  • f the publication
and its date appear, and notice is given that copying is by permission
  • f the Very Large Data Base
Endowment. To copy otherwise,
  • r to republish,
requires a fee and/or special permission from the Endowment. Proceedings
  • f the 23rd VLDB
Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))
  • r complex
(e.g. include joins, selections, fur- ther aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places
  • f a complex
piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently
  • n each partition.
Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition
  • f the database, and (c)
taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict
  • urselves to a special syntax.
Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically gener- ated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

  • D. Chatziantoniou, K.A. Ross, VLDB 1997

Q f g agg R ≡

[ ⟨f(x), [ g(y) | y ← R, f(y) = f(x) ]agg⟩ | x ← R ]set

slide-80
SLIDE 80

Torsten Grust U Tübingen 26

A Zoo of Query Representations

slide-81
SLIDE 81

Torsten Grust U Tübingen 26

A Zoo of Query Representations

slide-82
SLIDE 82

Torsten Grust U Tübingen 26

A Zoo of Query Representations

slide-83
SLIDE 83

Torsten Grust U Tübingen 26

A Zoo of Query Representations

slide-84
SLIDE 84

Torsten Grust U Tübingen 26

A Zoo of Query Representations

slide-85
SLIDE 85

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SELECT agg(g(x)) FROM P AS x

P P P

slide-86
SLIDE 86

Torsten Grust U Tübingen 26

A Zoo of Query Representations

[ g(y) | y ← P ]agg

Q’ g agg P ≡ P P P

slide-87
SLIDE 87

Torsten Grust U Tübingen 26

A Zoo of Query Representations

[ g(y) | y ← P ]agg

Q’ g agg P ≡ P P P

slide-88
SLIDE 88

Torsten Grust U Tübingen 27

A Zoo of Query Representations

slide-89
SLIDE 89

Torsten Grust U Tübingen 27

A Zoo of Query Representations

a p p l i e d s e p a r a t e l y t

  • p

a r t i t i

  • n

s

  • f

F Y I J J G b a s e d i n

i d ,

a n d t h e p a r t i a l r e s u l t s c

  • u

l d b e u n i

  • n

e d t

  • g

e t h e r . Q u e r y Q 3 i s e x p r e s s e d i n s t a n d a r d S Q L a s f

  • l

l

  • w

s . V i e w V l c

  • n

t a i n s t h e a v e r a g e d u r a t i

  • n
  • f

t h e s e c t i

  • n

s r e a d i n t h e d a y s b e f

  • r

e e a c h a c c e s s t

  • t

h e w

  • r

l d s e c t i

  • n

, a n d v i e w V 2 t h e a v e r a g e d u r a t i

  • n
  • f

t h e s e c t i

  • n

s r e a d i n t h e d a y s a f t e r . W e j

  • i

n V l a n d V 2 i n

  • r

d e r t

  • g

e t t h e r e s u l t i n s i n g l e t u p l e s . T h e a c t u a l S Q L f

  • r

m u l a t i

  • n

i s

  • m

i t t e d d u e t

  • l

a c k

  • f

s p a c e a n d a p p e a r s i n [ C h a 9 7 ] . O n c e a g a i n , F Y I

  • L

O G c a n b e p a r t i t i

  • n

e d

  • n

i d

a t t r i b u t e a n d a s i m p l e r q u e r y c a n b e p

  • s

e d s e p a r a t e l y

  • n

e a c h p a r t i t i

  • n

. W e s h a l l d i s c u s s Q u e r y Q 4 i n S e c t i

  • n

2 . 4 .

2 Theoretical Framework

I n t h i s s e c t i

  • n

, w e i n t r

  • d

u c e

  • u

r t e r m i n

  • l
  • g

y a n d d e f i n e w h a t w e m e a n b y a g r

  • u

p q u e r y . W e g i v e a s y n t a c t i c c r i t e r i

  • n

f

  • r

i d e n t i f y i n g g r

  • u

p q u e r i e s a n d p r

  • v

e t h a t t h i s c

  • n

d i t i

  • n

i s s u f f i c i e n t . W e a l s

  • s

h

  • w

t h a t e v e r y g r

  • u

p q u e r y c a n b e e x p r e s s e d i n a f

  • r

m t h a t s a t i s f i e s

  • u

r c r i t e r i

  • n

. 2 . 1 A s s u m p t i

  • n

s a n d T e r m i n

  • l
  • g

y W e a s s u m e t h a t q u e r i e s a r e w r i t t e n i n t e r m s

  • f

v i e w s , w i t h n

  • s

u b q u e r i e s . T h i s i s a v a l i d a s s u m p t i

  • n

s i n c e t h e r e a r e m a n y w a y s t

  • r

e w r i t e a s u b q u e r y a s a j

  • i

n

  • f

t w

  • (
  • r

m

  • r

e ) v i e w s [ K i m 8 2 , D a y 8 7 , S P L S G ] . W e i n i t i a l l y a s s u m e t h a t t h e d a t a b a s e c

  • n

t a i n s a s i n g l e r e l a t i

  • n

R . ( M u l t i p l e r e l a t i

  • n

s w i l l b e c

  • n

s i d e r e d i n S e c t i

  • n

2 . 4 . ) R m a y i t s e l f b e a v i e w

  • r

t h e r e s u l t

  • f

a n

  • t

h e r q u e r y , b u t f r

  • m
  • u

r p

  • i

n t

  • f

v i e w i t i s t r e a t e d a s a n e n c a p s u l a t e d t a b l e . ( I . e . , i f R w a s

a v i e w ,

t h e n w e d

  • n

’ t c

  • n

s i d e r u n f

  • l

d i n g t h e d e f i n i t i

  • n
  • f

R i n t

  • q

u e r i e s

  • v

e r R . ) W e s h a l l d e f i n e b e l

  • w

t h e n

  • t

i

  • n
  • f

a q u e r y g r a p h . A q u e r y g r a p h h a s n

  • d

e s t h a t a r e r e l a t i

  • n

a l

  • p

e r a t i

  • n

s . W e c

  • n

s i d e r t h r e e k i n d s

  • f

r e l a t i

  • n

a l

  • p

e r a t i

  • n

s : B a s i c B l

  • c

k s A b a s i c b l

  • c

k i s s

  • m

e c

  • m

b i n a t i

  • n
  • f

p r

  • j

e c t i

  • n

s a n d s e l e c t i

  • n

s a p p l i e d t

  • a

j

  • i

n

  • f

r e l a t i

  • n

s . I n S Q L s u c h

  • p

e r a t i

  • n

s a r e e x p r e s s e d a s S E L E C T

  • F

R O M

  • W

H E R E q u e r i e s w i t h

  • u

t a g g r e g a t e s

  • r

a t t r i b u t e r e n a m i n g . A b a s e r e l a t i

  • n

i s a l s

  • t

r e a t e d a s a b a s i c b l

  • c

k . A g g r e g a t i

  • n

B l

  • c

k s A n a g g r e g a t i

  • n

b l

  • c

k i s a s i n

  • g

l e a g g r e g a t i

  • n
  • p

e r a t i

  • n

s p e c i f y i n g a s e t

  • f

g r

  • u

p i n g a t t r i b u t e s a n d a l i s t

  • f

a g g r e g a t e f u n c t i

  • n

s t

  • b

e c

  • m
  • p

u t e d

  • v

e r t h e g r

  • u

p s . T h e a g g r e g a t i

  • n

c a n

  • p

t i

  • n

a l l y b e f

  • l

l

  • w

e d b y a s e l e c t i

  • n

. I n S Q L , s u c h

  • p

e r a t i

  • n

s a r e e x p r e s s e d

a s S E L E C T

  • F

R O M

  • G

R O U P B Y

  • H

A V I N G

q u e r i e s w h e r e t h e

S E L E C T

c l a u s e i n c l u d e s a l l g r

  • u

p i n g a t t r i b u t e s ( p l u s s

  • m

e a g g r e g a t e s ) , a n d t h e F R O M c l a u s e c

  • n

t a i n s a s i n g l e r e l a t i

  • n

. S e t B l

  • c

k s S e t b l

  • c

k s e x p r e s s t h e s e t

  • r

i e n t e d

  • p

e r a

  • t

i

  • n

s , n a m e l y u n i

  • n

, i n t e r s e c t i

  • n

a n d d i f f e r e n c e

  • f

r e l a t i

  • n

s w i t h t h e s a m e s c h e m a . T h e c

  • r

r e s p

  • n

d i n g S Q L c

  • n

s t r u c t s

a r e U N I O N , I N T E R S E C T

a n d E X C E P T . A n y r e l a t i

  • n

a l q u e r y c a n b e s p e c i f i e d u s i n g t h e s e t h r e e

  • p

e r a t i

  • n

s . A q u e r y c a n b e s p l i t i n t

  • t

w

  • b

l

  • c

k s i f n e c e s

  • s

a r y , f

  • r

e x a m p l e i f t h e q u e r y h a s b

  • t

h a W B E R E c l a u s e a n d a G R O U P B Y c l a u s e t h e n w e c a n r e w r i t e i t a s a n a g g r e g a t i

  • n

b l

  • c

k a p p l i e d t

  • a

b a s i c b l

  • c

k . N

  • t

e t h a t w e d

  • n
  • t

a l l

  • w

c

  • n

s t a n t r e l a t i

  • n

s i n q u e r i e s , n

  • r

d

  • w

e a l l

  • w

a t t r i b u t e r e n a m i n g . ( I n [ C h a 9 7 ] w e d e s c r i b e t h e e x t e n s i

  • n

s n e e d e d t

  • h

a n d l e t h e r e n a m i n g

  • p

e r a t

  • r

. ) D e f i n i t i

  • n

2 . 1 : ( Q u e r y G r a p h ) S u p p

  • s

e t h a t w e a r e g i v e n a q u e r y Q . T h e n

  • d

e s

  • f

t h e q u e r y g r a p h f

  • r

Q a r e t h e r e l a t i

  • n

a l

  • p

e r a t i

  • n

s ( a s d e f i n e d a b

  • v

e ) u s e d i n Q a n d i t s s u b v i e w s . B a s i c b l

  • c

k s a r e w r i t t e n a s r e c t a n g l e s , a g g r e g a t i

  • n

b l

  • c

k s a s c i r c l e s , a n d s e t b l

  • c

k s a s d i a m

  • n

d s . E v e r y b l

  • c

k i s g i v e n a l a b e l c

  • r

r e s p

  • n

d i n g t

  • t

h e v i e w (

  • r

r e l a t i

  • n

) n a m e

  • f

t h a t b l

  • c

k . T h e r e i s a n e d g e f r

  • m

n

  • d

e l a b e l e d N i t

  • n
  • d

e l a b e l e d N s i f N s i s m e n t i

  • n

e d i n t h e F R O M c l a u s e

  • f

N i ,

  • r

i f N s i s a n a r g u m e n t

  • f

a s e t

  • p

e r a t i

  • n

i n N i . A n e d g e w h

  • s

e s

  • u

r c e i s a n a g g r e g a t i

  • n

b l

  • c

k N i s l a b e l e d w i t h t h e g r

  • u

p i n g a t t r i b u t e s

  • f

N . E d g e s c

  • m

i n g

  • u

t

  • f

a s i n g l e b a s i c b l

  • c

k N a r e l i n k e d t

  • g

e t h e r w i t h a n a r c , a n d j

  • i

n t l y l a b e l e d b y t h e j

  • i

n c

  • n

d i t i

  • n
  • f

N . E d g e s c

  • m

i n g

  • u

t

  • f

s e t b l

  • c

k s a r e n

  • t

l a b e l e d . A q u e r y g r a p h i s a d i r e c t e d a c y c l i c g r a p h w i t h a s i n g l e s

  • u

r c e ( r

  • t

) n

  • d

e r e p r e s e n t i n g t h e q u e r y r e s u l t . W e s h a l l c

  • n

v e n t i

  • n

a l l y d r a w t h e g r a p h w i t h e d g e s “ p

  • i

n t i n g u p . ” F

  • r

s i n g l e

  • r

e l a t i

  • n

d a t a b a s e s , t h e r e w i l l a l w a y s b e a s i n g l e s i n k n

  • d

e a t t h e t

  • p
  • f

t h e p i c t u r e . F i g u r e 1 s h

  • w

s t h e q u e r y g r a p h s f

  • r

q u e r i e s Q l , a n d Q 2 . N

  • t

i c e t h e s e p a r a t i

  • n
  • f

V 2 a n d V 3 f r

  • m

q u e r y Q l i n t

  • b

a s i c b l

  • c

k s ( V 2 B , V 3 B ) a n d a g g r e g a t i

  • n

b l

  • c

k s ( V 2 A , V 3 A ) .

E ) , , O 2 : j

  • i

n c

  • n

d i t i

  • n

s . : j

  • i

n c

  • n

d i t i

  • n

.

(4 query Ql (b) query Q2

F i g u r e 1 : Q u e r y g r a p h s D e f i n i t i

  • n

2 . 2 : ( P a r t i t i

  • n

i n g ) L e t S b e a s e t

  • f

a t t r i b u t e s f r

  • m

t h e u n d e r l y i n g r e l a t i

  • n

R , a n d l e t D r e p r e s e n t t h e d

  • m

a i n

  • v

e r w h i c h t h e t u p l e s

  • f

S v a l u e s r a n g e . F

  • r

e v e r y n

  • d

e N i n t h e q u e r y g r a p h a n d f

  • r

x E D w e w r i t e N ( x ) t

  • r

e p r e s e n t t h e a n s w e r a t n

  • d

e N w h e n t h e e x t e n s i

  • n
  • f

R i s r e p l a c e d b y c r s = = ( R ) . O b s e r v e t h a t t h e a t t r i b u t e s i n S d

  • n
  • t

h a v e t

  • b

e a t

  • t

r i b u t e s

  • f

t h e a n s w e r a t n

  • d

e N f

  • r

N ( x ) t

  • b

e w e l l

  • d

e f i n e d . A l s

  • ,
  • b

s e r v e t h a t R = U z E D R ( z ) . 4 7 8

slide-90
SLIDE 90

Torsten Grust U Tübingen 27

A Zoo of Query Representations

a p p l i e d s e p a r a t e l y t

  • p

a r t i t i

  • n

s

  • f

F Y I J J G b a s e d i n

i d ,

a n d t h e p a r t i a l r e s u l t s c

  • u

l d b e u n i

  • n

e d t

  • g

e t h e r . Q u e r y Q 3 i s e x p r e s s e d i n s t a n d a r d S Q L a s f

  • l

l

  • w

s . V i e w V l c

  • n

t a i n s t h e a v e r a g e d u r a t i

  • n
  • f

t h e s e c t i

  • n

s r e a d i n t h e d a y s b e f

  • r

e e a c h a c c e s s t

  • t

h e w

  • r

l d s e c t i

  • n

, a n d v i e w V 2 t h e a v e r a g e d u r a t i

  • n
  • f

t h e s e c t i

  • n

s r e a d i n t h e d a y s a f t e r . W e j

  • i

n V l a n d V 2 i n

  • r

d e r t

  • g

e t t h e r e s u l t i n s i n g l e t u p l e s . T h e a c t u a l S Q L f

  • r

m u l a t i

  • n

i s

  • m

i t t e d d u e t

  • l

a c k

  • f

s p a c e a n d a p p e a r s i n [ C h a 9 7 ] . O n c e a g a i n , F Y I

  • L

O G c a n b e p a r t i t i

  • n

e d

  • n

i d

a t t r i b u t e a n d a s i m p l e r q u e r y c a n b e p

  • s

e d s e p a r a t e l y

  • n

e a c h p a r t i t i

  • n

. W e s h a l l d i s c u s s Q u e r y Q 4 i n S e c t i

  • n

2 . 4 .

2 Theoretical Framework

I n t h i s s e c t i

  • n

, w e i n t r

  • d

u c e

  • u

r t e r m i n

  • l
  • g

y a n d d e f i n e w h a t w e m e a n b y a g r

  • u

p q u e r y . W e g i v e a s y n t a c t i c c r i t e r i

  • n

f

  • r

i d e n t i f y i n g g r

  • u

p q u e r i e s a n d p r

  • v

e t h a t t h i s c

  • n

d i t i

  • n

i s s u f f i c i e n t . W e a l s

  • s

h

  • w

t h a t e v e r y g r

  • u

p q u e r y c a n b e e x p r e s s e d i n a f

  • r

m t h a t s a t i s f i e s

  • u

r c r i t e r i

  • n

. 2 . 1 A s s u m p t i

  • n

s a n d T e r m i n

  • l
  • g

y W e a s s u m e t h a t q u e r i e s a r e w r i t t e n i n t e r m s

  • f

v i e w s , w i t h n

  • s

u b q u e r i e s . T h i s i s a v a l i d a s s u m p t i

  • n

s i n c e t h e r e a r e m a n y w a y s t

  • r

e w r i t e a s u b q u e r y a s a j

  • i

n

  • f

t w

  • (
  • r

m

  • r

e ) v i e w s [ K i m 8 2 , D a y 8 7 , S P L S G ] . W e i n i t i a l l y a s s u m e t h a t t h e d a t a b a s e c

  • n

t a i n s a s i n g l e r e l a t i

  • n

R . ( M u l t i p l e r e l a t i

  • n

s w i l l b e c

  • n

s i d e r e d i n S e c t i

  • n

2 . 4 . ) R m a y i t s e l f b e a v i e w

  • r

t h e r e s u l t

  • f

a n

  • t

h e r q u e r y , b u t f r

  • m
  • u

r p

  • i

n t

  • f

v i e w i t i s t r e a t e d a s a n e n c a p s u l a t e d t a b l e . ( I . e . , i f R w a s

a v i e w ,

t h e n w e d

  • n

’ t c

  • n

s i d e r u n f

  • l

d i n g t h e d e f i n i t i

  • n
  • f

R i n t

  • q

u e r i e s

  • v

e r R . ) W e s h a l l d e f i n e b e l

  • w

t h e n

  • t

i

  • n
  • f

a q u e r y g r a p h . A q u e r y g r a p h h a s n

  • d

e s t h a t a r e r e l a t i

  • n

a l

  • p

e r a t i

  • n

s . W e c

  • n

s i d e r t h r e e k i n d s

  • f

r e l a t i

  • n

a l

  • p

e r a t i

  • n

s : B a s i c B l

  • c

k s A b a s i c b l

  • c

k i s s

  • m

e c

  • m

b i n a t i

  • n
  • f

p r

  • j

e c t i

  • n

s a n d s e l e c t i

  • n

s a p p l i e d t

  • a

j

  • i

n

  • f

r e l a t i

  • n

s . I n S Q L s u c h

  • p

e r a t i

  • n

s a r e e x p r e s s e d a s S E L E C T

  • F

R O M

  • W

H E R E q u e r i e s w i t h

  • u

t a g g r e g a t e s

  • r

a t t r i b u t e r e n a m i n g . A b a s e r e l a t i

  • n

i s a l s

  • t

r e a t e d a s a b a s i c b l

  • c

k . A g g r e g a t i

  • n

B l

  • c

k s A n a g g r e g a t i

  • n

b l

  • c

k i s a s i n

  • g

l e a g g r e g a t i

  • n
  • p

e r a t i

  • n

s p e c i f y i n g a s e t

  • f

g r

  • u

p i n g a t t r i b u t e s a n d a l i s t

  • f

a g g r e g a t e f u n c t i

  • n

s t

  • b

e c

  • m
  • p

u t e d

  • v

e r t h e g r

  • u

p s . T h e a g g r e g a t i

  • n

c a n

  • p

t i

  • n

a l l y b e f

  • l

l

  • w

e d b y a s e l e c t i

  • n

. I n S Q L , s u c h

  • p

e r a t i

  • n

s a r e e x p r e s s e d

a s S E L E C T

  • F

R O M

  • G

R O U P B Y

  • H

A V I N G

q u e r i e s w h e r e t h e

S E L E C T

c l a u s e i n c l u d e s a l l g r

  • u

p i n g a t t r i b u t e s ( p l u s s

  • m

e a g g r e g a t e s ) , a n d t h e F R O M c l a u s e c

  • n

t a i n s a s i n g l e r e l a t i

  • n

. S e t B l

  • c

k s S e t b l

  • c

k s e x p r e s s t h e s e t

  • r

i e n t e d

  • p

e r a

  • t

i

  • n

s , n a m e l y u n i

  • n

, i n t e r s e c t i

  • n

a n d d i f f e r e n c e

  • f

r e l a t i

  • n

s w i t h t h e s a m e s c h e m a . T h e c

  • r

r e s p

  • n

d i n g S Q L c

  • n

s t r u c t s

a r e U N I O N , I N T E R S E C T

a n d E X C E P T . A n y r e l a t i

  • n

a l q u e r y c a n b e s p e c i f i e d u s i n g t h e s e t h r e e

  • p

e r a t i

  • n

s . A q u e r y c a n b e s p l i t i n t

  • t

w

  • b

l

  • c

k s i f n e c e s

  • s

a r y , f

  • r

e x a m p l e i f t h e q u e r y h a s b

  • t

h a W B E R E c l a u s e a n d a G R O U P B Y c l a u s e t h e n w e c a n r e w r i t e i t a s a n a g g r e g a t i

  • n

b l

  • c

k a p p l i e d t

  • a

b a s i c b l

  • c

k . N

  • t

e t h a t w e d

  • n
  • t

a l l

  • w

c

  • n

s t a n t r e l a t i

  • n

s i n q u e r i e s , n

  • r

d

  • w

e a l l

  • w

a t t r i b u t e r e n a m i n g . ( I n [ C h a 9 7 ] w e d e s c r i b e t h e e x t e n s i

  • n

s n e e d e d t

  • h

a n d l e t h e r e n a m i n g

  • p

e r a t

  • r

. ) D e f i n i t i

  • n

2 . 1 : ( Q u e r y G r a p h ) S u p p

  • s

e t h a t w e a r e g i v e n a q u e r y Q . T h e n

  • d

e s

  • f

t h e q u e r y g r a p h f

  • r

Q a r e t h e r e l a t i

  • n

a l

  • p

e r a t i

  • n

s ( a s d e f i n e d a b

  • v

e ) u s e d i n Q a n d i t s s u b v i e w s . B a s i c b l

  • c

k s a r e w r i t t e n a s r e c t a n g l e s , a g g r e g a t i

  • n

b l

  • c

k s a s c i r c l e s , a n d s e t b l

  • c

k s a s d i a m

  • n

d s . E v e r y b l

  • c

k i s g i v e n a l a b e l c

  • r

r e s p

  • n

d i n g t

  • t

h e v i e w (

  • r

r e l a t i

  • n

) n a m e

  • f

t h a t b l

  • c

k . T h e r e i s a n e d g e f r

  • m

n

  • d

e l a b e l e d N i t

  • n
  • d

e l a b e l e d N s i f N s i s m e n t i

  • n

e d i n t h e F R O M c l a u s e

  • f

N i ,

  • r

i f N s i s a n a r g u m e n t

  • f

a s e t

  • p

e r a t i

  • n

i n N i . A n e d g e w h

  • s

e s

  • u

r c e i s a n a g g r e g a t i

  • n

b l

  • c

k N i s l a b e l e d w i t h t h e g r

  • u

p i n g a t t r i b u t e s

  • f

N . E d g e s c

  • m

i n g

  • u

t

  • f

a s i n g l e b a s i c b l

  • c

k N a r e l i n k e d t

  • g

e t h e r w i t h a n a r c , a n d j

  • i

n t l y l a b e l e d b y t h e j

  • i

n c

  • n

d i t i

  • n
  • f

N . E d g e s c

  • m

i n g

  • u

t

  • f

s e t b l

  • c

k s a r e n

  • t

l a b e l e d . A q u e r y g r a p h i s a d i r e c t e d a c y c l i c g r a p h w i t h a s i n g l e s

  • u

r c e ( r

  • t

) n

  • d

e r e p r e s e n t i n g t h e q u e r y r e s u l t . W e s h a l l c

  • n

v e n t i

  • n

a l l y d r a w t h e g r a p h w i t h e d g e s “ p

  • i

n t i n g u p . ” F

  • r

s i n g l e

  • r

e l a t i

  • n

d a t a b a s e s , t h e r e w i l l a l w a y s b e a s i n g l e s i n k n

  • d

e a t t h e t

  • p
  • f

t h e p i c t u r e . F i g u r e 1 s h

  • w

s t h e q u e r y g r a p h s f

  • r

q u e r i e s Q l , a n d Q 2 . N

  • t

i c e t h e s e p a r a t i

  • n
  • f

V 2 a n d V 3 f r

  • m

q u e r y Q l i n t

  • b

a s i c b l

  • c

k s ( V 2 B , V 3 B ) a n d a g g r e g a t i

  • n

b l

  • c

k s ( V 2 A , V 3 A ) .

E ) , , O 2 : j

  • i

n c

  • n

d i t i

  • n

s . : j

  • i

n c

  • n

d i t i

  • n

.

(4 query Ql (b) query Q2

F i g u r e 1 : Q u e r y g r a p h s D e f i n i t i

  • n

2 . 2 : ( P a r t i t i

  • n

i n g ) L e t S b e a s e t

  • f

a t t r i b u t e s f r

  • m

t h e u n d e r l y i n g r e l a t i

  • n

R , a n d l e t D r e p r e s e n t t h e d

  • m

a i n

  • v

e r w h i c h t h e t u p l e s

  • f

S v a l u e s r a n g e . F

  • r

e v e r y n

  • d

e N i n t h e q u e r y g r a p h a n d f

  • r

x E D w e w r i t e N ( x ) t

  • r

e p r e s e n t t h e a n s w e r a t n

  • d

e N w h e n t h e e x t e n s i

  • n
  • f

R i s r e p l a c e d b y c r s = = ( R ) . O b s e r v e t h a t t h e a t t r i b u t e s i n S d

  • n
  • t

h a v e t

  • b

e a t

  • t

r i b u t e s

  • f

t h e a n s w e r a t n

  • d

e N f

  • r

N ( x ) t

  • b

e w e l l

  • d

e f i n e d . A l s

  • ,
  • b

s e r v e t h a t R = U z E D R ( z ) . 4 7 8

applied separately to partitions

  • f FYIJJG

based in id, and the partial results could be unioned together. Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration

  • f the sections read in the

days before each access to the world section, and view V2 the average duration

  • f the sections read in the days after.

We join Vl and V2 in order to get the result in single

  • tuples. The actual SQL formulation

is omitted due to lack

  • f space and appears in [Cha97].

Once again, FYI-LOG can be partitioned

  • n id attribute

and a simpler query can be posed separately

  • n each partition.

We shall discuss Query Q4 in Section 2.4.

2 Theoretical Framework

In this section, we introduce

  • ur terminology

and define what we mean by a group query. We give a syntactic criterion for identifying group queries and prove that this condition is sufficient. We also show that every group query can be expressed in a form that satisfies our criterion. 2.1 Assumptions and Terminology We assume that queries are written in terms of views, with no subqueries. This is a valid assumption since there are many ways to rewrite a subquery as a join of two (or more) views [Kim82, Day87, SPLSG]. We initially assume that the database contains a single relation

  • R. (Multiple

relations will be considered in Section 2.4.) R may itself be a view

  • r the result of another query, but from our point of view it

is treated as an encapsulated table. (I.e., if R was a view, then we don’t consider unfolding the definition

  • f R into

queries over R.) We shall define below the notion of a query graph. A query graph has nodes that are relational

  • perations.

We consider three kinds of relational

  • perations:

Basic Blocks A basic block is some combination

  • f

projections and selections applied to a join of relations. In SQL such operations are expressed as SELECT-FROM-WHERE queries without aggregates or attribute renaming. A base relation is also treated as a basic block. Aggregation Blocks An aggregation block is a sin- gle aggregation

  • peration

specifying a set of grouping attributes and a list of aggregate functions to be com- puted over the groups. The aggregation can optionally be followed by a selection. In SQL, such operations are expressed as SELECT-FROM-GROUPBY-HAVING queries where the SELECT clause includes all grouping attributes (plus some aggregates), and the FROM clause contains a single relation. Set Blocks Set blocks express the set-oriented

  • pera-

tions, namely union, intersection and difference of relations with the same schema. The corresponding SQL constructs

are UNION, INTERSECT and EXCEPT.

Any relational query can be specified using these three

  • perations.

A query can be split into two blocks if neces- sary, for example if the query has both a WBERE clause and a GROUPBY clause then we can rewrite it as an aggregation block applied to a basic block. Note that we do not allow constant relations in queries, nor do we allow attribute renaming. (In [Cha97] we describe the extensions needed to handle the renaming

  • perator.)

Definition 2.1: (Query Graph) Suppose that we are given a query Q. The nodes of the query graph for Q are the relational

  • perations

(as defined above) used in Q and its subviews. Basic blocks are written as rectangles, aggregation blocks as circles, and set blocks as diamonds. Every block is given a label corresponding to the view (or relation) name of that block. There is an edge from node labeled Ni to node labeled Ns if Ns is mentioned in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation block N is labeled with the grouping attributes

  • f N. Edges coming
  • ut of a single basic block N are linked together with an

arc, and jointly labeled by the join condition

  • f N. Edges

coming out of set blocks are not labeled. A query graph is a directed acyclic graph with a single source (root) node representing the query result. We shall conventionally draw the graph with edges “pointing up.” For single-relation databases, there will always be a single sink node at the top of the picture. Figure 1 shows the query graphs for queries Ql, and Q2. Notice the separation

  • f V2 and V3 from query Ql into basic blocks (V2B,

V3B) and aggregation blocks (V2A, V3A).

E), , O2 : join conditions. 0 : join condition.

(4 query Ql (b) query Q2

Figure 1: Query graphs Definition 2.2: (Partitioning) Let S be a set of attributes from the underlying relation R, and let D represent the domain over which the tuples of S values range. For every node N in the query graph and for x E D we write N(x) to represent the answer at node N when the extension of R is replaced by crs== (R). 0 Observe that the attributes in S do not have to be at- tributes

  • f the answer at node N for N(x)

to be well-

  • defined. Also, observe that R = UzED R(z).

478

slide-91
SLIDE 91

Torsten Grust U Tübingen 27

A Zoo of Query Representations

a p p l i e d s e p a r a t e l y t

  • p

a r t i t i

  • n

s

  • f

F Y I J J G b a s e d i n

i d ,

a n d t h e p a r t i a l r e s u l t s c

  • u

l d b e u n i

  • n

e d t

  • g

e t h e r . Q u e r y Q 3 i s e x p r e s s e d i n s t a n d a r d S Q L a s f

  • l

l

  • w

s . V i e w V l c

  • n

t a i n s t h e a v e r a g e d u r a t i

  • n
  • f

t h e s e c t i

  • n

s r e a d i n t h e d a y s b e f

  • r

e e a c h a c c e s s t

  • t

h e w

  • r

l d s e c t i

  • n

, a n d v i e w V 2 t h e a v e r a g e d u r a t i

  • n
  • f

t h e s e c t i

  • n

s r e a d i n t h e d a y s a f t e r . W e j

  • i

n V l a n d V 2 i n

  • r

d e r t

  • g

e t t h e r e s u l t i n s i n g l e t u p l e s . T h e a c t u a l S Q L f

  • r

m u l a t i

  • n

i s

  • m

i t t e d d u e t

  • l

a c k

  • f

s p a c e a n d a p p e a r s i n [ C h a 9 7 ] . O n c e a g a i n , F Y I

  • L

O G c a n b e p a r t i t i

  • n

e d

  • n

i d

a t t r i b u t e a n d a s i m p l e r q u e r y c a n b e p

  • s

e d s e p a r a t e l y

  • n

e a c h p a r t i t i

  • n

. W e s h a l l d i s c u s s Q u e r y Q 4 i n S e c t i

  • n

2 . 4 .

2 Theoretical Framework

I n t h i s s e c t i

  • n

, w e i n t r

  • d

u c e

  • u

r t e r m i n

  • l
  • g

y a n d d e f i n e w h a t w e m e a n b y a g r

  • u

p q u e r y . W e g i v e a s y n t a c t i c c r i t e r i

  • n

f

  • r

i d e n t i f y i n g g r

  • u

p q u e r i e s a n d p r

  • v

e t h a t t h i s c

  • n

d i t i

  • n

i s s u f f i c i e n t . W e a l s

  • s

h

  • w

t h a t e v e r y g r

  • u

p q u e r y c a n b e e x p r e s s e d i n a f

  • r

m t h a t s a t i s f i e s

  • u

r c r i t e r i

  • n

. 2 . 1 A s s u m p t i

  • n

s a n d T e r m i n

  • l
  • g

y W e a s s u m e t h a t q u e r i e s a r e w r i t t e n i n t e r m s

  • f

v i e w s , w i t h n

  • s

u b q u e r i e s . T h i s i s a v a l i d a s s u m p t i

  • n

s i n c e t h e r e a r e m a n y w a y s t

  • r

e w r i t e a s u b q u e r y a s a j

  • i

n

  • f

t w

  • (
  • r

m

  • r

e ) v i e w s [ K i m 8 2 , D a y 8 7 , S P L S G ] . W e i n i t i a l l y a s s u m e t h a t t h e d a t a b a s e c

  • n

t a i n s a s i n g l e r e l a t i

  • n

R . ( M u l t i p l e r e l a t i

  • n

s w i l l b e c

  • n

s i d e r e d i n S e c t i

  • n

2 . 4 . ) R m a y i t s e l f b e a v i e w

  • r

t h e r e s u l t

  • f

a n

  • t

h e r q u e r y , b u t f r

  • m
  • u

r p

  • i

n t

  • f

v i e w i t i s t r e a t e d a s a n e n c a p s u l a t e d t a b l e . ( I . e . , i f R w a s

a v i e w ,

t h e n w e d

  • n

’ t c

  • n

s i d e r u n f

  • l

d i n g t h e d e f i n i t i

  • n
  • f

R i n t

  • q

u e r i e s

  • v

e r R . ) W e s h a l l d e f i n e b e l

  • w

t h e n

  • t

i

  • n
  • f

a q u e r y g r a p h . A q u e r y g r a p h h a s n

  • d

e s t h a t a r e r e l a t i

  • n

a l

  • p

e r a t i

  • n

s . W e c

  • n

s i d e r t h r e e k i n d s

  • f

r e l a t i

  • n

a l

  • p

e r a t i

  • n

s : B a s i c B l

  • c

k s A b a s i c b l

  • c

k i s s

  • m

e c

  • m

b i n a t i

  • n
  • f

p r

  • j

e c t i

  • n

s a n d s e l e c t i

  • n

s a p p l i e d t

  • a

j

  • i

n

  • f

r e l a t i

  • n

s . I n S Q L s u c h

  • p

e r a t i

  • n

s a r e e x p r e s s e d a s S E L E C T

  • F

R O M

  • W

H E R E q u e r i e s w i t h

  • u

t a g g r e g a t e s

  • r

a t t r i b u t e r e n a m i n g . A b a s e r e l a t i

  • n

i s a l s

  • t

r e a t e d a s a b a s i c b l

  • c

k . A g g r e g a t i

  • n

B l

  • c

k s A n a g g r e g a t i

  • n

b l

  • c

k i s a s i n

  • g

l e a g g r e g a t i

  • n
  • p

e r a t i

  • n

s p e c i f y i n g a s e t

  • f

g r

  • u

p i n g a t t r i b u t e s a n d a l i s t

  • f

a g g r e g a t e f u n c t i

  • n

s t

  • b

e c

  • m
  • p

u t e d

  • v

e r t h e g r

  • u

p s . T h e a g g r e g a t i

  • n

c a n

  • p

t i

  • n

a l l y b e f

  • l

l

  • w

e d b y a s e l e c t i

  • n

. I n S Q L , s u c h

  • p

e r a t i

  • n

s a r e e x p r e s s e d

a s S E L E C T

  • F

R O M

  • G

R O U P B Y

  • H

A V I N G

q u e r i e s w h e r e t h e

S E L E C T

c l a u s e i n c l u d e s a l l g r

  • u

p i n g a t t r i b u t e s ( p l u s s

  • m

e a g g r e g a t e s ) , a n d t h e F R O M c l a u s e c

  • n

t a i n s a s i n g l e r e l a t i

  • n

. S e t B l

  • c

k s S e t b l

  • c

k s e x p r e s s t h e s e t

  • r

i e n t e d

  • p

e r a

  • t

i

  • n

s , n a m e l y u n i

  • n

, i n t e r s e c t i

  • n

a n d d i f f e r e n c e

  • f

r e l a t i

  • n

s w i t h t h e s a m e s c h e m a . T h e c

  • r

r e s p

  • n

d i n g S Q L c

  • n

s t r u c t s

a r e U N I O N , I N T E R S E C T

a n d E X C E P T . A n y r e l a t i

  • n

a l q u e r y c a n b e s p e c i f i e d u s i n g t h e s e t h r e e

  • p

e r a t i

  • n

s . A q u e r y c a n b e s p l i t i n t

  • t

w

  • b

l

  • c

k s i f n e c e s

  • s

a r y , f

  • r

e x a m p l e i f t h e q u e r y h a s b

  • t

h a W B E R E c l a u s e a n d a G R O U P B Y c l a u s e t h e n w e c a n r e w r i t e i t a s a n a g g r e g a t i

  • n

b l

  • c

k a p p l i e d t

  • a

b a s i c b l

  • c

k . N

  • t

e t h a t w e d

  • n
  • t

a l l

  • w

c

  • n

s t a n t r e l a t i

  • n

s i n q u e r i e s , n

  • r

d

  • w

e a l l

  • w

a t t r i b u t e r e n a m i n g . ( I n [ C h a 9 7 ] w e d e s c r i b e t h e e x t e n s i

  • n

s n e e d e d t

  • h

a n d l e t h e r e n a m i n g

  • p

e r a t

  • r

. ) D e f i n i t i

  • n

2 . 1 : ( Q u e r y G r a p h ) S u p p

  • s

e t h a t w e a r e g i v e n a q u e r y Q . T h e n

  • d

e s

  • f

t h e q u e r y g r a p h f

  • r

Q a r e t h e r e l a t i

  • n

a l

  • p

e r a t i

  • n

s ( a s d e f i n e d a b

  • v

e ) u s e d i n Q a n d i t s s u b v i e w s . B a s i c b l

  • c

k s a r e w r i t t e n a s r e c t a n g l e s , a g g r e g a t i

  • n

b l

  • c

k s a s c i r c l e s , a n d s e t b l

  • c

k s a s d i a m

  • n

d s . E v e r y b l

  • c

k i s g i v e n a l a b e l c

  • r

r e s p

  • n

d i n g t

  • t

h e v i e w (

  • r

r e l a t i

  • n

) n a m e

  • f

t h a t b l

  • c

k . T h e r e i s a n e d g e f r

  • m

n

  • d

e l a b e l e d N i t

  • n
  • d

e l a b e l e d N s i f N s i s m e n t i

  • n

e d i n t h e F R O M c l a u s e

  • f

N i ,

  • r

i f N s i s a n a r g u m e n t

  • f

a s e t

  • p

e r a t i

  • n

i n N i . A n e d g e w h

  • s

e s

  • u

r c e i s a n a g g r e g a t i

  • n

b l

  • c

k N i s l a b e l e d w i t h t h e g r

  • u

p i n g a t t r i b u t e s

  • f

N . E d g e s c

  • m

i n g

  • u

t

  • f

a s i n g l e b a s i c b l

  • c

k N a r e l i n k e d t

  • g

e t h e r w i t h a n a r c , a n d j

  • i

n t l y l a b e l e d b y t h e j

  • i

n c

  • n

d i t i

  • n
  • f

N . E d g e s c

  • m

i n g

  • u

t

  • f

s e t b l

  • c

k s a r e n

  • t

l a b e l e d . A q u e r y g r a p h i s a d i r e c t e d a c y c l i c g r a p h w i t h a s i n g l e s

  • u

r c e ( r

  • t

) n

  • d

e r e p r e s e n t i n g t h e q u e r y r e s u l t . W e s h a l l c

  • n

v e n t i

  • n

a l l y d r a w t h e g r a p h w i t h e d g e s “ p

  • i

n t i n g u p . ” F

  • r

s i n g l e

  • r

e l a t i

  • n

d a t a b a s e s , t h e r e w i l l a l w a y s b e a s i n g l e s i n k n

  • d

e a t t h e t

  • p
  • f

t h e p i c t u r e . F i g u r e 1 s h

  • w

s t h e q u e r y g r a p h s f

  • r

q u e r i e s Q l , a n d Q 2 . N

  • t

i c e t h e s e p a r a t i

  • n
  • f

V 2 a n d V 3 f r

  • m

q u e r y Q l i n t

  • b

a s i c b l

  • c

k s ( V 2 B , V 3 B ) a n d a g g r e g a t i

  • n

b l

  • c

k s ( V 2 A , V 3 A ) .

E ) , , O 2 : j

  • i

n c

  • n

d i t i

  • n

s . : j

  • i

n c

  • n

d i t i

  • n

.

(4 query Ql (b) query Q2

F i g u r e 1 : Q u e r y g r a p h s D e f i n i t i

  • n

2 . 2 : ( P a r t i t i

  • n

i n g ) L e t S b e a s e t

  • f

a t t r i b u t e s f r

  • m

t h e u n d e r l y i n g r e l a t i

  • n

R , a n d l e t D r e p r e s e n t t h e d

  • m

a i n

  • v

e r w h i c h t h e t u p l e s

  • f

S v a l u e s r a n g e . F

  • r

e v e r y n

  • d

e N i n t h e q u e r y g r a p h a n d f

  • r

x E D w e w r i t e N ( x ) t

  • r

e p r e s e n t t h e a n s w e r a t n

  • d

e N w h e n t h e e x t e n s i

  • n
  • f

R i s r e p l a c e d b y c r s = = ( R ) . O b s e r v e t h a t t h e a t t r i b u t e s i n S d

  • n
  • t

h a v e t

  • b

e a t

  • t

r i b u t e s

  • f

t h e a n s w e r a t n

  • d

e N f

  • r

N ( x ) t

  • b

e w e l l

  • d

e f i n e d . A l s

  • ,
  • b

s e r v e t h a t R = U z E D R ( z ) . 4 7 8

applied separately to partitions

  • f FYIJJG

based in id, and the partial results could be unioned together. Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration

  • f the sections read in the

days before each access to the world section, and view V2 the average duration

  • f the sections read in the days after.

We join Vl and V2 in order to get the result in single

  • tuples. The actual SQL formulation

is omitted due to lack

  • f space and appears in [Cha97].

Once again, FYI-LOG can be partitioned

  • n id attribute

and a simpler query can be posed separately

  • n each partition.

We shall discuss Query Q4 in Section 2.4.

2 Theoretical Framework

In this section, we introduce

  • ur terminology

and define what we mean by a group query. We give a syntactic criterion for identifying group queries and prove that this condition is sufficient. We also show that every group query can be expressed in a form that satisfies our criterion. 2.1 Assumptions and Terminology We assume that queries are written in terms of views, with no subqueries. This is a valid assumption since there are many ways to rewrite a subquery as a join of two (or more) views [Kim82, Day87, SPLSG]. We initially assume that the database contains a single relation

  • R. (Multiple

relations will be considered in Section 2.4.) R may itself be a view

  • r the result of another query, but from our point of view it

is treated as an encapsulated table. (I.e., if R was a view, then we don’t consider unfolding the definition

  • f R into

queries over R.) We shall define below the notion of a query graph. A query graph has nodes that are relational

  • perations.

We consider three kinds of relational

  • perations:

Basic Blocks A basic block is some combination

  • f

projections and selections applied to a join of relations. In SQL such operations are expressed as SELECT-FROM-WHERE queries without aggregates or attribute renaming. A base relation is also treated as a basic block. Aggregation Blocks An aggregation block is a sin- gle aggregation

  • peration

specifying a set of grouping attributes and a list of aggregate functions to be com- puted over the groups. The aggregation can optionally be followed by a selection. In SQL, such operations are expressed as SELECT-FROM-GROUPBY-HAVING queries where the SELECT clause includes all grouping attributes (plus some aggregates), and the FROM clause contains a single relation. Set Blocks Set blocks express the set-oriented

  • pera-

tions, namely union, intersection and difference of relations with the same schema. The corresponding SQL constructs

are UNION, INTERSECT and EXCEPT.

Any relational query can be specified using these three

  • perations.

A query can be split into two blocks if neces- sary, for example if the query has both a WBERE clause and a GROUPBY clause then we can rewrite it as an aggregation block applied to a basic block. Note that we do not allow constant relations in queries, nor do we allow attribute renaming. (In [Cha97] we describe the extensions needed to handle the renaming

  • perator.)

Definition 2.1: (Query Graph) Suppose that we are given a query Q. The nodes of the query graph for Q are the relational

  • perations

(as defined above) used in Q and its subviews. Basic blocks are written as rectangles, aggregation blocks as circles, and set blocks as diamonds. Every block is given a label corresponding to the view (or relation) name of that block. There is an edge from node labeled Ni to node labeled Ns if Ns is mentioned in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation block N is labeled with the grouping attributes

  • f N. Edges coming
  • ut of a single basic block N are linked together with an

arc, and jointly labeled by the join condition

  • f N. Edges

coming out of set blocks are not labeled. A query graph is a directed acyclic graph with a single source (root) node representing the query result. We shall conventionally draw the graph with edges “pointing up.” For single-relation databases, there will always be a single sink node at the top of the picture. Figure 1 shows the query graphs for queries Ql, and Q2. Notice the separation

  • f V2 and V3 from query Ql into basic blocks (V2B,

V3B) and aggregation blocks (V2A, V3A).

E), , O2 : join conditions. 0 : join condition.

(4 query Ql (b) query Q2

Figure 1: Query graphs Definition 2.2: (Partitioning) Let S be a set of attributes from the underlying relation R, and let D represent the domain over which the tuples of S values range. For every node N in the query graph and for x E D we write N(x) to represent the answer at node N when the extension of R is replaced by crs== (R). 0 Observe that the attributes in S do not have to be at- tributes

  • f the answer at node N for N(x)

to be well-

  • defined. Also, observe that R = UzED R(z).

478

applied separately to partitions
  • f FYIJJG
based in i d , and the partial results could be unioned together. Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration
  • f the sections read in the
days before each access to the world section, and view V2 the average duration
  • f the sections read in the days after.
We join Vl and V2 in order to get the result in single
  • tuples. The actual SQL formulation
is omitted due to lack
  • f space and appears in [Cha97].
Once again, FYI-LOG can be partitioned
  • n
i d attribute and a simpler query can be posed separately
  • n each partition.
We shall discuss Query Q4 in Section 2.4.

2 Theoretical Framework

In this section, we introduce
  • ur terminology
and define what we mean by a group query. We give a syntactic criterion for identifying group queries and prove that this condition is sufficient. We also show that every group query can be expressed in a form that satisfies our criterion. 2.1 Assumptions and Terminology We assume that queries are written in terms of views, with no subqueries. This is a valid assumption since there are many ways to rewrite a subquery as a join of two (or more) views [Kim82, Day87, SPLSG]. We initially assume that the database contains a single relation
  • R. (Multiple
relations will be considered in Section 2.4.) R may itself be a view
  • r the result of another query, but from our point of view it
is treated as an encapsulated table. (I.e., if R was a v i e w , then we don’t consider unfolding the definition
  • f R into
queries over R.) We shall define below the notion of a query graph. A query graph has nodes that are relational
  • perations.
We consider three kinds of relational
  • perations:
Basic Blocks A basic block is some combination
  • f
projections and selections applied to a join of relations. In SQL such operations are expressed as SELECT-FROM-WHERE queries without aggregates or attribute renaming. A base relation is also treated as a basic block. Aggregation Blocks An aggregation block is a sin- gle aggregation
  • peration
specifying a set of grouping attributes and a list of aggregate functions to be com- puted over the groups. The aggregation can optionally be followed by a selection. In SQL, such operations are expressed a s S E L E C T
  • F
R O M
  • G
R O U P B Y
  • H
A V I N G queries where the S E L E C T clause includes all grouping attributes (plus some aggregates), and the FROM clause contains a single relation. Set Blocks Set blocks express the set-oriented
  • pera-
tions, namely union, intersection and difference of relations with the same schema. The corresponding SQL constructs a r e U N I O N , I N T E R S E C T and EXCEPT. Any relational query can be specified using these three
  • perations.
A query can be split into two blocks if neces- sary, for example if the query has both a WBERE clause and a GROUPBY clause then we can rewrite it as an aggregation block applied to a basic block. Note that we do not allow constant relations in queries, nor do we allow attribute renaming. (In [Cha97] we describe the extensions needed to handle the renaming
  • perator.)
Definition 2.1: (Query Graph) Suppose that we are given a query Q. The nodes of the query graph for Q are the relational
  • perations
(as defined above) used in Q and its subviews. Basic blocks are written as rectangles, aggregation blocks as circles, and set blocks as diamonds. Every block is given a label corresponding to the view (or relation) name of that block. There is an edge from node labeled Ni to node labeled Ns if Ns is mentioned in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation block N is labeled with the grouping attributes
  • f N. Edges coming
  • ut of a single basic block N are linked together with an
arc, and jointly labeled by the join condition
  • f N. Edges
coming out of set blocks are not labeled. A query graph is a directed acyclic graph with a single source (root) node representing the query result. We shall conventionally draw the graph with edges “pointing up.” For single-relation databases, there will always be a single sink node at the top of the picture. Figure 1 shows the query graphs for queries Ql, and Q2. Notice the separation
  • f V2 and V3 from query Ql into basic blocks (V2B,
V3B) and aggregation blocks (V2A, V3A). E ) , , O 2 : j
  • i
n c
  • n
d i t i
  • n
s . : j
  • i
n c
  • n
d i t i
  • n
.

(4 query Ql (b) query Q2

Figure 1: Query graphs Definition 2.2: (Partitioning) Let S be a set of attributes from the underlying relation R, and let D represent the domain over which the tuples of S values range. For every node N in the query graph and for x E D we write N(x) to represent the answer at node N when the extension of R is replaced by crs== (R). 0 Observe that the attributes in S do not have to be at- tributes
  • f the answer at node N for N(x)
to be well-
  • defined. Also, observe that R = UzED R(z).
478

SQL surface syntax, relational algebra, query graphs + annotations, iteration

slide-92
SLIDE 92

Torsten Grust U Tübingen

A Uniform Query Representation

28

slide-93
SLIDE 93

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

slide-94
SLIDE 94

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

map (λ⟨x,P⟩.⟨x, Q’ g agg P⟩ (partition f xs)

slide-95
SLIDE 95

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

map (λ⟨x,P⟩.⟨x, Q’ g agg P⟩ (partition f xs)

[ ⟨f(x), [ g(y) | y ← R, f(y) = f(x) ]agg⟩ | x ← R ]set

slide-96
SLIDE 96

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

map (λ⟨x,P⟩.⟨x, Q’ g agg P⟩ (partition f xs)

SELECT f(x), agg(g(x)) FROM R AS x GROUP BY f(x)

slide-97
SLIDE 97

Torsten Grust U Tübingen 29

XPath

slide-98
SLIDE 98

Torsten Grust U Tübingen

XPath Comprehensions

30

slide-99
SLIDE 99

Torsten Grust U Tübingen

XPath Comprehensions

30

/descendant::a[following::b]/child::c

slide-100
SLIDE 100

Torsten Grust U Tübingen

XPath Comprehensions

30

/descendant::a[following::b]/child::c

  • 1. Normalize, simplify, flip XPath step expressions
slide-101
SLIDE 101

Torsten Grust U Tübingen

XPath Comprehensions

30

/descendant::a[following::b]/child::c

  • 1. Normalize, simplify, flip XPath step expressions
  • 2. Compile XPath into queries over tabular XML encoding
slide-102
SLIDE 102

Torsten Grust U Tübingen

XPath Comprehensions

30

xpath p (root c) xpath (/p) c = [ n’ | n ← xpath p1 c, n’ ← xpath p2 n ]X xpath (p1/p2) c = [ n | n ← xpath p c, [ true | _ ← xpath q n ]some ]X xpath (p[q]) c = step (ax::t) c xpath (ax::t) c =

/descendant::a[following::b]/child::c

  • 1. Normalize, simplify, flip XPath step expressions
  • 2. Compile XPath into queries over tabular XML encoding
slide-103
SLIDE 103

Torsten Grust U Tübingen

A Tabular XML Encoding

31

<a> <b><c><d/>e</c></b> <f><!--g--> <h><i/><j/></h> </f> </a>

slide-104
SLIDE 104

Torsten Grust U Tübingen

A Tabular XML Encoding

31

a b c d e g i j f h

slide-105
SLIDE 105

Torsten Grust U Tübingen

A Tabular XML Encoding

31

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

slide-106
SLIDE 106

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

slide-107
SLIDE 107

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

slide-108
SLIDE 108

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

[ n | n ← doc, pre c < pre n, post c > post n, tag n = t ]X step (descendant::t) c =

slide-109
SLIDE 109

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

[ n | n ← doc, pre c < pre n, post c > post n, tag n = t ]X step (descendant::t) c =

slide-110
SLIDE 110

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

[ n | n ← doc, pre c < pre n, post c > post n, tag n = t ]X step (descendant::t) c = [ n | n ← doc, pre c > pre n, post c < post n, tag n = t ]X step (ancestor::t) c =

slide-111
SLIDE 111

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

  • D. Olteanu et al., XMLDM (EDBT 2002), March 2002
slide-112
SLIDE 112

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

  • D. Olteanu et al., XMLDM (EDBT 2002), March 2002

a b c d e g i j f h

slide-113
SLIDE 113

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

  • D. Olteanu et al., XMLDM (EDBT 2002), March 2002

/descendant::g/preceding::c

a b c d e g i j f h

slide-114
SLIDE 114

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

  • D. Olteanu et al., XMLDM (EDBT 2002), March 2002

/descendant::g/preceding::c

a b c d e g i j f h

slide-115
SLIDE 115

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

  • D. Olteanu et al., XMLDM (EDBT 2002), March 2002

/descendant::g/preceding::c

a b c d e g i j f h

/descendant::c[following::g]

slide-116
SLIDE 116

Torsten Grust U Tübingen

XPath: Looking Forward

33

slide-117
SLIDE 117

Torsten Grust U Tübingen

XPath: Looking Forward

33

slide-118
SLIDE 118

Torsten Grust U Tübingen

Comprehending XPath

34

slide-119
SLIDE 119

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c

slide-120
SLIDE 120

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c

[ v’ | v ← doc, tag v = ’g’, v’ ← doc, pre v’ < pre v, post v’ < post v, tag v’ = ’c’ ]X

slide-121
SLIDE 121

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c /descendant::c[following::g]

[ v’ | v ← doc, tag v = ’g’, v’ ← doc, pre v’ < pre v, post v’ < post v, tag v’ = ’c’ ]X

slide-122
SLIDE 122

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c

SELECT DISTINCT v’ FROM doc v, doc v’ WHERE tag v = ’g’ AND tag v’ = ’c’ AND pre v’ < pre v AND post v’ < post v ORDER BY pre v’

[ v’ | v ← doc, tag v = ’g’, v’ ← doc, pre v’ < pre v, post v’ < post v, tag v’ = ’c’ ]X

slide-123
SLIDE 123

Torsten Grust U Tübingen 35

BRING BACK MONAD COMPREHENSIONS

slide-124
SLIDE 124

Torsten Grust U Tübingen

Comprehensions in Haskell

36

Haskell

slide-125
SLIDE 125

Torsten Grust U Tübingen

Comprehensions in Haskell

36

Haskell

slide-126
SLIDE 126

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011

Haskell

slide-127
SLIDE 127

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011

Haskell

slide-128
SLIDE 128

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads

Haskell

slide-129
SLIDE 129

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads

Haskell

27

slide-130
SLIDE 130

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4

Haskell

27

slide-131
SLIDE 131

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98

Haskell

27

slide-132
SLIDE 132

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98

GHC

27

slide-133
SLIDE 133

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98 Comprehensive
 Comprehensions

GHC

27

slide-134
SLIDE 134

Torsten Grust U Tübingen

Comprehensions in Haskell

36

1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98 Comprehensive
 Comprehensions Bringing Back
 Monad Comprehensions

GHC

27

slide-135
SLIDE 135

Torsten Grust U Tübingen 36

Movie Plot Line

5 min 120 min Meeting Inciting Incident Turning Point Climax Pinch

slide-136
SLIDE 136

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

  • P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007
slide-137
SLIDE 137

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

  • P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

[ (the dept, maximum salary) | (name, dept, salary) <- employees , then group by dept using groupWith , length dept > 10 , then sortWith by Down (sum salary) , then take 5 ]

slide-138
SLIDE 138

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

  • P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

[ (the dept, maximum salary) | (name, dept, salary) <- employees , then group by dept using groupWith , length dept > 10 , then sortWith by Down (sum salary) , then take 5 ]

LIMIT ORDER BY ASC/DESC AGGR FROM HAVING GROUP BY

slide-139
SLIDE 139

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

  • P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

[ sum salary | (name, “MS”, salary) <- employees , then group using runs 3 , then take 5 ]

OVER

row patterns!

Not shown: set operations, joins, WITH…RECURSIVE, …

slide-140
SLIDE 140

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA DATA

slide-141
SLIDE 141

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA DATA

slide-142
SLIDE 142

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA Haskell d DATA

slide-143
SLIDE 143

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS Haskell DATA

slide-144
SLIDE 144

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL

slide-145
SLIDE 145

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL

slide-146
SLIDE 146

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL d

slide-147
SLIDE 147

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL d d

slide-148
SLIDE 148

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL d d haskell

slide-149
SLIDE 149

Torsten Grust U Tübingen 39

Database–Supported Haskell

slide-150
SLIDE 150

Torsten Grust U Tübingen 39

Database–Supported Haskell

  • — rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

  • — margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

  • — our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

  • — best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ] Q Q Q Q Q Q Q Q

slide-151
SLIDE 151

Torsten Grust U Tübingen 39

Database–Supported Haskell

  • — rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

  • — margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

  • — our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

  • — best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ]

i d t s d a y p r i c e A C M E 1 7 / 1 / 1 5 3 . A C M E 2 7 / 1 / 1 5 4 . A C M E 3 7 / 1 / 1 5 1 . A C M E 4 7 / 1 / 1 5 7 . ⠇ ⠇ ⠇ ⠇

Trades

Q Q Q Q Q Q Q Q

slide-152
SLIDE 152

Torsten Grust U Tübingen 39

Database–Supported Haskell

  • — rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

  • — margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

  • — our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

  • — best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ] Q Q Q Q Q Q Q Q

slide-153
SLIDE 153

Torsten Grust U Tübingen 39

Database–Supported Haskell

  • — rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

  • — margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

  • — our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

  • — best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ]

Q Q Q Q Q Q Q Q

slide-154
SLIDE 154

Torsten Grust U Tübingen 39

Database–Supported Haskell

—- SQL code generated from Haskell source SELECT MAX(margins.price - margins.min) FROM (SELECT t.price, MIN(t.price) OVER (ORDER BY t.ts ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM trades AS t WHERE t.id = ‘ACME’ AND t.day = ’07/01/2015’ ) AS margins(price,min)

slide-155
SLIDE 155

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

slide-156
SLIDE 156

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

slide-157
SLIDE 157

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

slide-158
SLIDE 158

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

1 :: [a] → [b]

f

1

slide-159
SLIDE 159

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

1 :: [a] → [b]

f

2 :: [[a]] → [[b]]

f

2

slide-160
SLIDE 160

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

f :: a → b f

1 :: [a] → [b]

f

2 :: [[a]] → [[b]]

f

2 [ g x | x ← xs ]

slide-161
SLIDE 161

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

f :: a → b f

1 :: [a] → [b]

f

2 :: [[a]] → [[b]]

f

2 (g1 xs)

slide-162
SLIDE 162

Torsten Grust U Tübingen

Comprehensions Yield Independent Work


40

[ f n e | x ← xs ] ⇝ f n+1 [ e | x ← xs ]

slide-163
SLIDE 163

Torsten Grust U Tübingen 40

[ f n e | x ← xs ] ⇝ f n+1 [ e | x ← xs ]

Nested Data Parallelism

Implementation of a Portable Nested Data-Parallel Language

  • G. E. Blelloch et al., ACM PPoPP, May 1993
slide-164
SLIDE 164

Torsten Grust U Tübingen

The Flatter, the Better

41

slide-165
SLIDE 165

Torsten Grust U Tübingen

The Flatter, the Better

41

xss +2 yss

slide-166
SLIDE 166

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , , 3 , 1 1 , 1 , 7 ]
 + 
 [ , 4 , 1 2 , 1 3 , 2 , 3 ] [ ] [ ] [ ] 
 
 [ ] [ ] [ ]

2

xss +2 yss

slide-167
SLIDE 167

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , , 3 , 1 1 , 1 , 7 ]
 + 
 [ , 4 , 1 2 , 1 3 , 2 , 3 ] [ ] [ ] [ ] [ ] [ ] [ ]

2

xss +2 yss

slide-168
SLIDE 168

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , , 3 , 1 1 , 1 , 7 ]
 + 
 [ , 4 , 1 2 , 1 3 , 2 , 3 ]

1

forget

[ ] [ ] [ ]

xss +2 yss

slide-169
SLIDE 169

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ]

forget

[ ] [ ] [ ]

xss +2 yss

slide-170
SLIDE 170

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ]

forget imprint

[ ] [ ] [ ]

xss +2 yss

slide-171
SLIDE 171

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ]

forget imprint

[ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

slide-172
SLIDE 172

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

slide-173
SLIDE 173

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

slide-174
SLIDE 174

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

slide-175
SLIDE 175

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

slide-176
SLIDE 176

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42

slide-177
SLIDE 177

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42

+1

slide-178
SLIDE 178

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1

slide-179
SLIDE 179

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 πsum: x+y( )

slide-180
SLIDE 180

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( )

slide-181
SLIDE 181

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( )

slide-182
SLIDE 182

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( ) ⋉

p

slide-183
SLIDE 183

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( ) ⋉

p∧ seg1 = seg2

slide-184
SLIDE 184

Torsten Grust U Tübingen

Database Systems:
 Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( ) ⋉

p∧ seg1 = seg2

slide-185
SLIDE 185

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[(Int,[Str],[(Bool,[(Int,Int)])])]

slide-186
SLIDE 186

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[(Int,[Str],[(Bool,[(Int,Int)])])]

slide-187
SLIDE 187

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[[][[]]]

slide-188
SLIDE 188

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[[][[]]]

Query Plan Bundle

slide-189
SLIDE 189

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[[][[]]]

Query Plan Bundle

slide-190
SLIDE 190

Torsten Grust U Tübingen 44