[PPT] - Take Everything From Me, But Leave Me The Comprehension DBPL PowerPoint Presentation

SLIDE 1

Torsten Grust

db.inf.uni-tuebingen.de

Take Everything From Me, But Leave Me The Comprehension

DBPL — September 2017

SLIDE 2

Torsten Grust U Tübingen 2

Apologies, I am only  a database person

SLIDE 3

Torsten Grust U Tübingen 2

It is in this connection worth noticing that in the Comm.ACM the papers

n data bases […] are of markedly lower

linguistic quality than the others.

—Edsger Dijkstra (EWD691)

Apologies, I am only  a database person

SLIDE 4

Torsten Grust U Tübingen

The point is that the way in which the database management experts tackle the problems seems to be so grossly inadequate. They seem to form an inbred crowd with very little knowledge of computing science in general, who tackle their problems primarily politically instead of scientifically. Often they seemed to be mentally trapped by the intricacies of early, rather ad hoc solutions to rather accidental problems; as soon as such a technique has received a name, it becomes "a database concept".

3

—Edsger Dijkstra (EWD577)

Torsten Grust U Tübingen

Apologies, I am only  a database person

SLIDE 5

Torsten Grust U Tübingen

so grossly inadequate. They seem to form an inbred crowd with very little knowledge of computing science in general, who tackle their problems primarily politically instead of scientifically. Often they seemed to be mentally trapped by the intricacies of early, rather ad hoc solutions to rather accidental problems; as soon as such a technique has received a name, it becomes "a database concept". And a totally inadequate use of language, sharpening their pencils with a blunt axe. I learned a few things about Databases. I learned —or: had my tentative impression confirmed— that the term "Database Technology", although sometimes used, is immature, for there is hardly any underlying

3

—Edsger Dijkstra (EWD577)

Torsten Grust U Tübingen

Apologies, I am only  a database person

SLIDE 6

Torsten Grust U Tübingen

accidental problems; as soon as such a technique has received a name, it becomes "a database concept". And a totally inadequate use of language, sharpening their pencils with a blunt axe. I learned a few things about Databases. I learned —or: had my tentative impression confirmed— that the term "Database Technology", although sometimes used, is immature, for there is hardly any underlying "science" that could justify the use of the term "technology".

3

—Edsger Dijkstra (EWD577)

Torsten Grust U Tübingen

Apologies, I am only  a database person

SLIDE 7

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

SLIDE 8

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

1. Successively draw bindings for x from domain xs,
2. for those bindings that pass filter p,
3. evaluate head h,

head filter generator

SLIDE 9

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

1. Successively draw bindings for x from domain xs,
2. for those bindings that pass filter p,
3. evaluate head h,
4. collect results to form a list.

head filter generator

SLIDE 10

Torsten Grust U Tübingen

Comprehension Syntax

4

[ h x | x ← xs, p x ]

1. Successively draw bindings for x from domain xs,
2. for those bindings that pass filter p,
3. evaluate head h,
4. collect results to form

M

an M.

head filter generator

SLIDE 11

Torsten Grust U Tübingen

Extension vs. Intension

5

{ }

SLIDE 12

Torsten Grust U Tübingen

Extension vs. Intension

5

I, III, V, VII, IX

{ }

SLIDE 13

Torsten Grust U Tübingen

Extension vs. Intension

5

I, III, V, VII, IX [ roman x | x ← [1…10], odd x ]

set

{ }

SLIDE 14

Torsten Grust U Tübingen

In the Beginning …

6

Relational Completeness of Data Base Sublanguages

E. F. Codd, IBM Research Report RJ987, 1972

SLIDE 15

Torsten Grust U Tübingen

In the Beginning …

6

Relational Completeness of Data Base Sublanguages

E. F. Codd, IBM Research Report RJ987, 1972

head filter generator

SLIDE 16

Torsten Grust U Tübingen

Today’s XQuery 3.0

7

Core of XQuery: versatile FLWOR expression

XQuery 3.0: An XML Query Language

D. Chamberlin et al., W3C Recommendation, April 2014

SLIDE 17

Torsten Grust U Tübingen

Today’s XQuery 3.0

7

Core of XQuery: versatile FLWOR expression

XQuery 3.0: An XML Query Language

D. Chamberlin et al., W3C Recommendation, April 2014

head generator filter

SLIDE 18

Torsten Grust U Tübingen

Today’s XQuery 3.0

7

Core of XQuery: versatile FLWOR expression

XQuery 3.0: An XML Query Language

D. Chamberlin et al., W3C Recommendation, April 2014

head generator generator head filter

SLIDE 19

Torsten Grust U Tübingen

Early XQuery

8

A Data Model and Algebra for XQuery

M. Fernandez et al., October 2003

SLIDE 20

Torsten Grust U Tübingen

Early XQuery

8

A Data Model and Algebra for XQuery

M. Fernandez et al., October 2003

generator head filter

SLIDE 21

Torsten Grust U Tübingen 1 module Query where 2 import Prelude hiding (elem,index) 3 4 -- Data Model: Constructors ---------------------------------------------------- 5 6 text :: String -> Node 7 elem :: Tag -> [Node] -> Node 8 ref :: Node -> Node 9 10 year0 :: Node 11 year0 = elem "@year" [ text "1999" ] 12 13 book0 :: Node 14 book0 = elem "book" [ 15 elem "@year" [ text "1999" ], 16 elem "title" [ text "Data on the Web" ], 17 elem "author" [ text "Abiteboul" ], 18 elem "author" [ text "Buneman" ], 19 elem "author" [ text "Suciu" ]] 20 21 bib0 :: Node 22 bib0 = elem "bib" [ 23 elem "book" [ 24 elem "@year" [ text "1999" ], 25 elem "title" [ text "Data on the Web" ],

An XQuery Nucleus

9 Torsten Grust U Tübingen

SLIDE 22

Torsten Grust U Tübingen

An XQuery Nucleus

9 Torsten Grust U Tübingen

≈ 430 lines of Haskell (300+ lines of examples)
Implements a complete XQuery core,

including tree construction and traversal

List comprehensions express path navigation, FLOWR,

grouping/aggregation, quantification

SLIDE 23

Torsten Grust U Tübingen

LINQ

10

Comprehension syntax deeply embedded into C#, with monad-based semantics organized around SelectMany (aka >>=, flatmap)

The World According to LINQ

E. Meijer, October 201

1

SLIDE 24

Torsten Grust U Tübingen

LINQ

10

Comprehension syntax deeply embedded into C#, with monad-based semantics organized around SelectMany (aka >>=, flatmap)

The World According to LINQ

E. Meijer, October 201

1

generator head filter

SLIDE 25

Torsten Grust U Tübingen

Emma

1 1

Implicit Parallelism through Deep Language Embedding

A. Alexandrov et al., SIGMOD 2015

Deep embedding of comprehensions in Scala, compiles to Apache Flink / Spark

SLIDE 26

Torsten Grust U Tübingen

Emma

1 1

Implicit Parallelism through Deep Language Embedding

A. Alexandrov et al., SIGMOD 2015

Deep embedding of comprehensions in Scala, compiles to Apache Flink / Spark

generator head filter

SLIDE 27

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs

SLIDE 28

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs

generator head filter

SLIDE 29

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs

generator head filter

SLIDE 30

Torsten Grust U Tübingen

Pig Latin

12

Compiles to sequences of Map/Reduce jobs Told you so.

generator head filter

SLIDE 31

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

SLIDE 32

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

generator head filter

SLIDE 33

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

generator head filter head generator filter

SLIDE 34

Torsten Grust U Tübingen

SQL

13

Query Q4 of the TPC-H OLAP benchmark

generator head filter head generator filter generator head head filter generator head filter generator

SLIDE 35

Torsten Grust U Tübingen 14

SLIDE 36

Torsten Grust U Tübingen 14

SQL

SLIDE 37

Torsten Grust U Tübingen

One Way to Teach SQL

15

SLIDE 38

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S

SLIDE 39

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; ∅ ⨄

SLIDE 40

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; ∅ ⨄

SLIDE 41

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; ∅ ⨄ –∞ max2

SLIDE 42

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S 0 < ALL(SELECT A FROM S) c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; ∅ ⨄ –∞ max2

SLIDE 43

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S 0 < ALL(SELECT A FROM S) c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; c ← true; foreach x ∈ S do c ← c ∧ (0 < x.A); return c; ∅ ⨄ –∞ max2 true ∧

SLIDE 44

Torsten Grust U Tübingen

One Way to Teach SQL

15

SELECT A, B FROM S SELECT MAX(A) FROM S 0 < ALL(SELECT A FROM S) c ← ∅; foreach x ∈ S do c ← c ⨄ {(x.A,x.B)}; return c; c ← –∞; foreach x ∈ S do c ← max2(c,x.A); return c; c ← true; foreach x ∈ S do c ← c ∧ (0 < x.A); return c;

∅ ⨄

–∞ max2 true

∧

SLIDE 45

Torsten Grust U Tübingen

One Program Form for SQL

16

SLIDE 46

Torsten Grust U Tübingen

One Program Form for SQL

16

c ← z; foreach x ∈ xs do c ← f (c ,x); return c;

fold(z,f,xs) ≡

SLIDE 47

Torsten Grust U Tübingen

One Program Form for SQL

16

M carrier liftM zM ⊕M bag bag t {⋅} ∅ ⊎ set set t {⋅} ∅ ∪ list list t [⋅] [ ] + + all bool id TRUE ∧ some bool id FALSE ∨ sum num id + max t (ordered) id

∞

max2 min t (ordered) id ∞ min2

true false

c ← z; foreach x ∈ xs do c ← f (c ,x); return c;

fold(z,f,xs) ≡

SLIDE 48

Torsten Grust U Tübingen

One Program Form for SQL

17

SLIDE 49

Torsten Grust U Tübingen

One Program Form for SQL

17

SELECT A FROM S WHERE A > B

fold(∅,⊕,S) with ⊕(c,x) = c ⊎ (if (x.A > x.B) {x.A} else ∅)

SLIDE 50

Torsten Grust U Tübingen

One Program Form for SQL

17

SELECT A FROM S WHERE A > B SELECT x.A,y.B FROM R x,S y

fold(∅,⊕,S) with ⊕(c,x) = c ⊎ (if (x.A > x.B) {x.A} else ∅) fold(∅,⊕,R) with ⊕(c,x) = c ⊎ fold(∅,⊗,S) with ⊗(d,y) = d ⊎ {(x.A,y.B)}

SLIDE 51

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SLIDE 52

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B)

fold(0,⊕,fold(∅,⊗,R)) with with ⊕(c,_) = c + 1 ⊗(d,x) = d ⊎ if (fold(false,⊙,S)) {x} else ∅ with ⊙(e,y) = e ∨ (x.A = y.B)

SLIDE 53

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B)

fold(0,⊕,fold(∅,⊗,R)) with with ⊕(c,_) = c + 1 ⊗(d,x) = d ⊎ if (fold(false,⊙,S)) {x} else ∅ with ⊙(e,y) = e ∨ (x.A = y.B)

Algebraic Wonderland.

SLIDE 54

Torsten Grust U Tübingen

fold(,,) Gets Ugly Quickly

18

SELECT COUNT(*) FROM R x WHERE EXISTS (SELECT y FROM S y WHERE x.A = y.B)

fold(0,⊕,fold(∅,⊗,R)) with with ⊕(c,_) = c + 1 ⊗(d,x) = d ⊎ if (fold(false,⊙,S)) {x} else ∅ with ⊙(e,y) = e ∨ (x.A = y.B)

Algebraic Wonderland. REJECT!

SLIDE 55

Torsten Grust U Tübingen

Comprehension Semantics

19

SLIDE 56

Torsten Grust U Tübingen

Comprehension Semantics

19

[ e | v1 ← e1, q ]M [ e | ]M [ e | p, q ]M

SLIDE 57

Torsten Grust U Tübingen

Comprehension Semantics

19

[ e | v1 ← e1, q ]M [ e | ]M [ e | p, q ]M liftM(e)

≡ ≡ ≡

SLIDE 58

Torsten Grust U Tübingen

Comprehension Semantics

19

[ e | v1 ← e1, q ]M [ e | ]M [ e | p, q ]M liftM(e) if (p) [ e | q ]M else zM fold(zM,⊗,e1) with ⊗(c,v1) = c ⊕M [ e | q ]M

≡ ≡ ≡

SLIDE 59

Torsten Grust U Tübingen

Comprehensible SQL

20

SLIDE 60

Torsten Grust U Tübingen

Comprehensible SQL

20

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

SLIDE 61

Torsten Grust U Tübingen

Comprehensible SQL

20

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag

SLIDE 62

Torsten Grust U Tübingen

Comprehensible SQL

20

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag [ true | _ ← ]some

SLIDE 63

Torsten Grust U Tübingen

Comprehensible SQL

20

[ 1 | x ← R, ]sum

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag [ true | _ ← ]some

SLIDE 64

Torsten Grust U Tübingen

Comprehensible SQL

20

[ 1 | x ← R, [ x.A = y.B | y ← S ]some ]sum [ 1 | x ← R, ]sum

SELECT y FROM S y WHERE x.A = y.B EXISTS ( ) SELECT COUNT(*) FROM R x WHERE

[ y | y ← S, x.A = y.B ]bag [ true | _ ← ]some

SLIDE 65

Torsten Grust U Tübingen

Comprehension Unnesting

21

SLIDE 66

Torsten Grust U Tübingen

Comprehension Unnesting

21

(⊕M idempotent)

SLIDE 67

Torsten Grust U Tübingen 22

On Optimizing an SQL-like Nested Query

W. Kim, ACM TODS, 1982

When Syntax Distracts

SLIDE 68

Torsten Grust U Tübingen 22

On Optimizing an SQL-like Nested Query

W. Kim, ACM TODS, 1982

When Syntax Distracts

SLIDE 69

Torsten Grust U Tübingen 22

On Optimizing an SQL-like Nested Query

W. Kim, ACM TODS, 1982

When Syntax Distracts

Implemented in most RDBMSs to this day

SLIDE 70

Torsten Grust U Tübingen

When Syntax Distracts

23

Syntactic classification of nested SQL queries into

types N, Nx, D, J, A, JA, JA(NA), JA(AA), JA(AN), …

Classes are associated with their particular

SQL–level unnesting rewrites.

SLIDE 71

Torsten Grust U Tübingen

When Syntax Distracts

23

Syntactic classification of nested SQL queries into

types N, Nx, D, J, A, JA, JA(NA), JA(AA), JA(AN), …

Classes are associated with their particular

SQL–level unnesting rewrites.

SLIDE 72

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y))

SLIDE 73

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y))

[ f(x) | x ← R, [ p(x) = v | v ← [ g(y)| y ← S, q(x,y) ]bag ]some ]set

SLIDE 74

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y))

[ f(x) | x ← R, [ p(x) = g(y) | y ← S, q(x,y) ]some ]set

SLIDE 75

Torsten Grust U Tübingen

When Syntax Distracts

24

SELECT DISTINCT f(x) FROM R AS x WHERE p(x) IN (SELECT g(y) FROM S AS y WHERE q(x,y)) SELECT DISTINCT f(x) FROM R AS x, S AS y WHERE q(x,y) AND p(x) = g(y)

[ f(x) | x ← R, y ← S, q(x,y), p(x) = g(y) ]set

SLIDE 76

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department

f Computer Science, Columbia University

damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particular class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic

f a

group query is that it can be executed in a groupby-group

fashion. In other words, the underlying

relation(s) can be partitioned (based on some set

f attributes)

into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some performance results for group queries expressed in standard SQL, comparing a commercial database system with

ur optimized

plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title

f the publication

and its date appear, and notice is given that copying is by permission

f the Very Large Data Base

Endowment. To copy otherwise,

r to republish,

requires a fee and/or special permission from the Endowment. Proceedings

f the 23rd VLDB

Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))

r complex

(e.g. include joins, selections, further aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places

f a complex

piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently

n each partition.

Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition

f the database, and (c)

taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict

urselves to a special syntax.

Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically generated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

D. Chatziantoniou, K.A. Ross, VLDB 1997

SLIDE 77

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department

f Computer Science, Columbia University

damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particular class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic

f a

group query is that it can be executed in a groupby-group

fashion. In other words, the underlying

relation(s) can be partitioned (based on some set

f attributes)

into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some performance results for group queries expressed in standard SQL, comparing a commercial database system with

ur optimized

plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title

f the publication

and its date appear, and notice is given that copying is by permission

f the Very Large Data Base

Endowment. To copy otherwise,

r to republish,

requires a fee and/or special permission from the Endowment. Proceedings

f the 23rd VLDB

Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))

r complex

(e.g. include joins, selections, further aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places

f a complex

piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently

n each partition.

Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition

f the database, and (c)

taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict

urselves to a special syntax.

Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically generated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

D. Chatziantoniou, K.A. Ross, VLDB 1997

SELECT f(x), agg(g(x)) FROM R AS x GROUP BY f(x)

SLIDE 78

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department

f Computer Science, Columbia University

damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particular class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic

f a

group query is that it can be executed in a groupby-group

fashion. In other words, the underlying

relation(s) can be partitioned (based on some set

f attributes)

into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some performance results for group queries expressed in standard SQL, comparing a commercial database system with

ur optimized

plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title

f the publication

and its date appear, and notice is given that copying is by permission

f the Very Large Data Base

Endowment. To copy otherwise,

r to republish,

requires a fee and/or special permission from the Endowment. Proceedings

f the 23rd VLDB

Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))

r complex

(e.g. include joins, selections, further aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places

f a complex

piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently

n each partition.

Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition

f the database, and (c)

taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict

urselves to a special syntax.

Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically generated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

D. Chatziantoniou, K.A. Ross, VLDB 1997

[ ⟨f(x), [ g(y) | y ← R, f(y) = f(x) ]agg⟩ | x ← R ]set

SLIDE 79

Torsten Grust U Tübingen

A Zoo of Query Representations

25

Groupwise Processing

f Relational

Queries

Damianos Chatziantoniou” Kenneth A. ROSS* Department

f Computer Science, Columbia University

damia,nos,kar@cs.columbia.edu Abstract In this paper, we define and examine a particular class of queries called group queries. Group queries are natural queries in many decision- support applications. The main characteristic

f a

group query is that it can be executed in a groupby-group

fashion. In other words, the underlying

relation(s) can be partitioned (based on some set

f attributes)

into disjoint groups, and each group can be processed separately. We give a syntactic criterion to identify these queries and prove its sufficiency. We also prove the strong result that every group query has an equivalent formulation that satisfies our syntactic criterion. We describe a general evaluation technique for group queries, and demonstrate how an optimizer can determine this plan. We then consider more complex queries whose components are group queries with poten- tially different partitioning attributes. We give two methods to identify group query components within such a query. We also give some performance results for group queries expressed in standard SQL, comparing a commercial database system with

ur optimized

plan on top of the same commercial system. These results indicate that there are significant potential performance improvements. ‘This research was supported by a grant from the AT&T Foundation, by a David and Lucile Packard Foundation Fel- lowship in Science and Engineering, by a Sloan Foundation Fellowship, by NSF CISE grant CDA-9625374, and by an NSF Young Investigator award. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title

f the publication

and its date appear, and notice is given that copying is by permission

f the Very Large Data Base

Endowment. To copy otherwise,

r to republish,

requires a fee and/or special permission from the Endowment. Proceedings

f the 23rd VLDB

Conference Athens, Greece, 1997 1 Introduction With the recent interest in decision support systems and data warehousing has come a demand for techniques to evaluate and optimize very complex relational queries in- volving both aggregation and joins. Current commercial systems do not find good plans for many very complex queries. In this paper we examine a particular class of complex queries, called group queries. Many complex decision support queries describe the following idea: for each value z in a dimension D (e.g. for each customer), evaluate a query Q’. This query Q’ can be something simple (e.g. compute avg(sales))

r complex

(e.g. include joins, selections, further aggregations, etc). To specify this kind of complex query in SQL, one has to embed D within many places

f a complex

piece of SQL code. This may have many drawbacks in terms of performance if the optimizer is not aware of the query’s structure. This particular structure is often amenable to the following evaluation strategy: partition the data along dimension D, and evaluate Q’ independently

n each partition.

Therefore it is important to be able to identify whether an SQL specification has the form mentioned above. In that case, that SQL specification constitutes a group query. Our main contributions are: Group Queries (Section 2) We define the concept of a group query. A query Q is a group query with respect to certain partitioning attributes S if, for all databases, it is possible to answer Q by (a) partitioning the data according to the values for attribute(s) S, (b) evaluating another query Q’ on each partition

f the database, and (c)

taking the union of the results. Common decision support queries require complex operations within groups and not just simple aggregation. We provide a syntactic criterion for identifying group queries and prove its sufficiency. We also prove the surprising result that every group query can be expressed in a way that satisfies our syntactic criterion. We consider arbitrary relational queries expressed in SQL, and do not restrict

urselves to a special syntax.

Execution Plan (Section 3) We demonstrate how a group query can be evaluated by partitioning the data and applying a significantly simpler query to each partition. Further, this evaluation plan can be automatically generated given a query that satisfies the criterion. The main benefits of this approach are that the simpler query is often much easier to optimize and evaluate, and often partitions 476

Groupwise Processing of Relational Queries

D. Chatziantoniou, K.A. Ross, VLDB 1997

Q f g agg R ≡

[ ⟨f(x), [ g(y) | y ← R, f(y) = f(x) ]agg⟩ | x ← R ]set

SLIDE 80

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SLIDE 81

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SLIDE 82

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SLIDE 83

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SLIDE 84

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SLIDE 85

Torsten Grust U Tübingen 26

A Zoo of Query Representations

SELECT agg(g(x)) FROM P AS x

P P P

SLIDE 86

Torsten Grust U Tübingen 26

A Zoo of Query Representations

[ g(y) | y ← P ]agg

Q’ g agg P ≡ P P P

SLIDE 87

Torsten Grust U Tübingen 26

A Zoo of Query Representations

[ g(y) | y ← P ]agg

Q’ g agg P ≡ P P P

SLIDE 88

Torsten Grust U Tübingen 27

A Zoo of Query Representations

SLIDE 89

Torsten Grust U Tübingen 27

A Zoo of Query Representations

a p p l i e d s e p a r a t e l y t

p

a r t i t i

n

s

f

F Y I J J G b a s e d i n

i d ,

a n d t h e p a r t i a l r e s u l t s c

u

l d b e u n i

n

e d t

g

e t h e r . Q u e r y Q 3 i s e x p r e s s e d i n s t a n d a r d S Q L a s f

l

l

w

s . V i e w V l c

n

t a i n s t h e a v e r a g e d u r a t i

n
f

t h e s e c t i

n

s r e a d i n t h e d a y s b e f

r

e e a c h a c c e s s t

t

h e w

r

l d s e c t i

n

, a n d v i e w V 2 t h e a v e r a g e d u r a t i

n
f

t h e s e c t i

n

s r e a d i n t h e d a y s a f t e r . W e j

i

n V l a n d V 2 i n

r

d e r t

g

e t t h e r e s u l t i n s i n g l e t u p l e s . T h e a c t u a l S Q L f

r

m u l a t i

n

i s

m

i t t e d d u e t

l

a c k

f

s p a c e a n d a p p e a r s i n [ C h a 9 7 ] . O n c e a g a i n , F Y I

L

O G c a n b e p a r t i t i

n

e d

n

i d

a t t r i b u t e a n d a s i m p l e r q u e r y c a n b e p

s

e d s e p a r a t e l y

n

e a c h p a r t i t i

n

. W e s h a l l d i s c u s s Q u e r y Q 4 i n S e c t i

n

2 . 4 .

2 Theoretical Framework

I n t h i s s e c t i

n

, w e i n t r

d

u c e

u

r t e r m i n

l
g

y a n d d e f i n e w h a t w e m e a n b y a g r

u

p q u e r y . W e g i v e a s y n t a c t i c c r i t e r i

n

f

r

i d e n t i f y i n g g r

u

p q u e r i e s a n d p r

v

e t h a t t h i s c

n

d i t i

n

i s s u f f i c i e n t . W e a l s

s

h

w

t h a t e v e r y g r

u

p q u e r y c a n b e e x p r e s s e d i n a f

r

m t h a t s a t i s f i e s

u

r c r i t e r i

n

. 2 . 1 A s s u m p t i

n

s a n d T e r m i n

l
g

y W e a s s u m e t h a t q u e r i e s a r e w r i t t e n i n t e r m s

f

v i e w s , w i t h n

s

u b q u e r i e s . T h i s i s a v a l i d a s s u m p t i

n

s i n c e t h e r e a r e m a n y w a y s t

r

e w r i t e a s u b q u e r y a s a j

i

n

f

t w

(
r

m

r

e ) v i e w s [ K i m 8 2 , D a y 8 7 , S P L S G ] . W e i n i t i a l l y a s s u m e t h a t t h e d a t a b a s e c

n

t a i n s a s i n g l e r e l a t i

n

R . ( M u l t i p l e r e l a t i

n

s w i l l b e c

n

s i d e r e d i n S e c t i

n

2 . 4 . ) R m a y i t s e l f b e a v i e w

r

t h e r e s u l t

f

a n

t

h e r q u e r y , b u t f r

m
u

r p

i

n t

f

v i e w i t i s t r e a t e d a s a n e n c a p s u l a t e d t a b l e . ( I . e . , i f R w a s

a v i e w ,

t h e n w e d

n

’ t c

n

s i d e r u n f

l

d i n g t h e d e f i n i t i

n
f

R i n t

q

u e r i e s

v

e r R . ) W e s h a l l d e f i n e b e l

w

t h e n

t

i

n
f

a q u e r y g r a p h . A q u e r y g r a p h h a s n

d

e s t h a t a r e r e l a t i

n

a l

p

e r a t i

n

s . W e c

n

s i d e r t h r e e k i n d s

f

r e l a t i

n

a l

p

e r a t i

n

s : B a s i c B l

c

k s A b a s i c b l

c

k i s s

m

e c

m

b i n a t i

n
f

p r

j

e c t i

n

s a n d s e l e c t i

n

s a p p l i e d t

a

j

i

n

f

r e l a t i

n

s . I n S Q L s u c h

p

e r a t i

n

s a r e e x p r e s s e d a s S E L E C T

F

R O M

W

H E R E q u e r i e s w i t h

u

t a g g r e g a t e s

r

a t t r i b u t e r e n a m i n g . A b a s e r e l a t i

n

i s a l s

t

r e a t e d a s a b a s i c b l

c

k . A g g r e g a t i

n

B l

c

k s A n a g g r e g a t i

n

b l

c

k i s a s i n

g

l e a g g r e g a t i

n
p

e r a t i

n

s p e c i f y i n g a s e t

f

g r

u

p i n g a t t r i b u t e s a n d a l i s t

f

a g g r e g a t e f u n c t i

n

s t

b

e c

m
p

u t e d

v

e r t h e g r

u

p s . T h e a g g r e g a t i

n

c a n

p

t i

n

a l l y b e f

l

l

w

e d b y a s e l e c t i

n

. I n S Q L , s u c h

p

e r a t i

n

s a r e e x p r e s s e d

a s S E L E C T

F

R O M

G

R O U P B Y

H

A V I N G

q u e r i e s w h e r e t h e

S E L E C T

c l a u s e i n c l u d e s a l l g r

u

p i n g a t t r i b u t e s ( p l u s s

m

e a g g r e g a t e s ) , a n d t h e F R O M c l a u s e c

n

t a i n s a s i n g l e r e l a t i

n

. S e t B l

c

k s S e t b l

c

k s e x p r e s s t h e s e t

r

i e n t e d

p

e r a

t

i

n

s , n a m e l y u n i

n

, i n t e r s e c t i

n

a n d d i f f e r e n c e

f

r e l a t i

n

s w i t h t h e s a m e s c h e m a . T h e c

r

r e s p

n

d i n g S Q L c

n

s t r u c t s

a r e U N I O N , I N T E R S E C T

a n d E X C E P T . A n y r e l a t i

n

a l q u e r y c a n b e s p e c i f i e d u s i n g t h e s e t h r e e

p

e r a t i

n

s . A q u e r y c a n b e s p l i t i n t

t

w

b

l

c

k s i f n e c e s

s

a r y , f

r

e x a m p l e i f t h e q u e r y h a s b

t

h a W B E R E c l a u s e a n d a G R O U P B Y c l a u s e t h e n w e c a n r e w r i t e i t a s a n a g g r e g a t i

n

b l

c

k a p p l i e d t

a

b a s i c b l

c

k . N

t

e t h a t w e d

n
t

a l l

w

c

n

s t a n t r e l a t i

n

s i n q u e r i e s , n

r

d

w

e a l l

w

a t t r i b u t e r e n a m i n g . ( I n [ C h a 9 7 ] w e d e s c r i b e t h e e x t e n s i

n

s n e e d e d t

h

a n d l e t h e r e n a m i n g

p

e r a t

r

. ) D e f i n i t i

n

2 . 1 : ( Q u e r y G r a p h ) S u p p

s

e t h a t w e a r e g i v e n a q u e r y Q . T h e n

d

e s

f

t h e q u e r y g r a p h f

r

Q a r e t h e r e l a t i

n

a l

p

e r a t i

n

s ( a s d e f i n e d a b

v

e ) u s e d i n Q a n d i t s s u b v i e w s . B a s i c b l

c

k s a r e w r i t t e n a s r e c t a n g l e s , a g g r e g a t i

n

b l

c

k s a s c i r c l e s , a n d s e t b l

c

k s a s d i a m

n

d s . E v e r y b l

c

k i s g i v e n a l a b e l c

r

r e s p

n

d i n g t

t

h e v i e w (

r

r e l a t i

n

) n a m e

f

t h a t b l

c

k . T h e r e i s a n e d g e f r

m

n

d

e l a b e l e d N i t

n
d

e l a b e l e d N s i f N s i s m e n t i

n

e d i n t h e F R O M c l a u s e

f

N i ,

r

i f N s i s a n a r g u m e n t

f

a s e t

p

e r a t i

n

i n N i . A n e d g e w h

s

e s

u

r c e i s a n a g g r e g a t i

n

b l

c

k N i s l a b e l e d w i t h t h e g r

u

p i n g a t t r i b u t e s

f

N . E d g e s c

m

i n g

u

t

f

a s i n g l e b a s i c b l

c

k N a r e l i n k e d t

g

e t h e r w i t h a n a r c , a n d j

i

n t l y l a b e l e d b y t h e j

i

n c

n

d i t i

n
f

N . E d g e s c

m

i n g

u

t

f

s e t b l

c

k s a r e n

t

l a b e l e d . A q u e r y g r a p h i s a d i r e c t e d a c y c l i c g r a p h w i t h a s i n g l e s

u

r c e ( r

t

) n

d

e r e p r e s e n t i n g t h e q u e r y r e s u l t . W e s h a l l c

n

v e n t i

n

a l l y d r a w t h e g r a p h w i t h e d g e s “ p

i

n t i n g u p . ” F

r

s i n g l e

r

e l a t i

n

d a t a b a s e s , t h e r e w i l l a l w a y s b e a s i n g l e s i n k n

d

e a t t h e t

p
f

t h e p i c t u r e . F i g u r e 1 s h

w

s t h e q u e r y g r a p h s f

r

q u e r i e s Q l , a n d Q 2 . N

t

i c e t h e s e p a r a t i

n
f

V 2 a n d V 3 f r

m

q u e r y Q l i n t

b

a s i c b l

c

k s ( V 2 B , V 3 B ) a n d a g g r e g a t i

n

b l

c

k s ( V 2 A , V 3 A ) .

E ) , , O 2 : j

i

n c

n

d i t i

n

s . : j

i

n c

n

d i t i

n

.

(4 query Ql (b) query Q2

F i g u r e 1 : Q u e r y g r a p h s D e f i n i t i

n

2 . 2 : ( P a r t i t i

n

i n g ) L e t S b e a s e t

f

a t t r i b u t e s f r

m

t h e u n d e r l y i n g r e l a t i

n

R , a n d l e t D r e p r e s e n t t h e d

m

a i n

v

e r w h i c h t h e t u p l e s

f

S v a l u e s r a n g e . F

r

e v e r y n

d

e N i n t h e q u e r y g r a p h a n d f

r

x E D w e w r i t e N ( x ) t

r

e p r e s e n t t h e a n s w e r a t n

d

e N w h e n t h e e x t e n s i

n
f

R i s r e p l a c e d b y c r s = = ( R ) . O b s e r v e t h a t t h e a t t r i b u t e s i n S d

n
t

h a v e t

b

e a t

t

r i b u t e s

f

t h e a n s w e r a t n

d

e N f

r

N ( x ) t

b

e w e l l

d

e f i n e d . A l s

,
b

s e r v e t h a t R = U z E D R ( z ) . 4 7 8

SLIDE 90

Torsten Grust U Tübingen 27

A Zoo of Query Representations

a p p l i e d s e p a r a t e l y t

p

a r t i t i

n

s

f

F Y I J J G b a s e d i n

i d ,

a n d t h e p a r t i a l r e s u l t s c

u

l d b e u n i

n

e d t

g

e t h e r . Q u e r y Q 3 i s e x p r e s s e d i n s t a n d a r d S Q L a s f

l

l

w

s . V i e w V l c

n

t a i n s t h e a v e r a g e d u r a t i

n
f

t h e s e c t i

n

s r e a d i n t h e d a y s b e f

r

e e a c h a c c e s s t

t

h e w

r

l d s e c t i

n

, a n d v i e w V 2 t h e a v e r a g e d u r a t i

n
f

t h e s e c t i

n

s r e a d i n t h e d a y s a f t e r . W e j

i

n V l a n d V 2 i n

r

d e r t

g

e t t h e r e s u l t i n s i n g l e t u p l e s . T h e a c t u a l S Q L f

r

m u l a t i

n

i s

m

i t t e d d u e t

l

a c k

f

s p a c e a n d a p p e a r s i n [ C h a 9 7 ] . O n c e a g a i n , F Y I

L

O G c a n b e p a r t i t i

n

e d

n

i d

a t t r i b u t e a n d a s i m p l e r q u e r y c a n b e p

s

e d s e p a r a t e l y

n

e a c h p a r t i t i

n

. W e s h a l l d i s c u s s Q u e r y Q 4 i n S e c t i

n

2 . 4 .

2 Theoretical Framework

I n t h i s s e c t i

n

, w e i n t r

d

u c e

u

r t e r m i n

l
g

y a n d d e f i n e w h a t w e m e a n b y a g r

u

p q u e r y . W e g i v e a s y n t a c t i c c r i t e r i

n

f

r

i d e n t i f y i n g g r

u

p q u e r i e s a n d p r

v

e t h a t t h i s c

n

d i t i

n

i s s u f f i c i e n t . W e a l s

s

h

w

t h a t e v e r y g r

u

p q u e r y c a n b e e x p r e s s e d i n a f

r

m t h a t s a t i s f i e s

u

r c r i t e r i

n

. 2 . 1 A s s u m p t i

n

s a n d T e r m i n

l
g

y W e a s s u m e t h a t q u e r i e s a r e w r i t t e n i n t e r m s

f

v i e w s , w i t h n

s

u b q u e r i e s . T h i s i s a v a l i d a s s u m p t i

n

s i n c e t h e r e a r e m a n y w a y s t

r

e w r i t e a s u b q u e r y a s a j

i

n

f

t w

(
r

m

r

e ) v i e w s [ K i m 8 2 , D a y 8 7 , S P L S G ] . W e i n i t i a l l y a s s u m e t h a t t h e d a t a b a s e c

n

t a i n s a s i n g l e r e l a t i

n

R . ( M u l t i p l e r e l a t i

n

s w i l l b e c

n

s i d e r e d i n S e c t i

n

2 . 4 . ) R m a y i t s e l f b e a v i e w

r

t h e r e s u l t

f

a n

t

h e r q u e r y , b u t f r

m
u

r p

i

n t

f

v i e w i t i s t r e a t e d a s a n e n c a p s u l a t e d t a b l e . ( I . e . , i f R w a s

a v i e w ,

t h e n w e d

n

’ t c

n

s i d e r u n f

l

d i n g t h e d e f i n i t i

n
f

R i n t

q

u e r i e s

v

e r R . ) W e s h a l l d e f i n e b e l

w

t h e n

t

i

n
f

a q u e r y g r a p h . A q u e r y g r a p h h a s n

d

e s t h a t a r e r e l a t i

n

a l

p

e r a t i

n

s . W e c

n

s i d e r t h r e e k i n d s

f

r e l a t i

n

a l

p

e r a t i

n

s : B a s i c B l

c

k s A b a s i c b l

c

k i s s

m

e c

m

b i n a t i

n
f

p r

j

e c t i

n

s a n d s e l e c t i

n

s a p p l i e d t

a

j

i

n

f

r e l a t i

n

s . I n S Q L s u c h

p

e r a t i

n

s a r e e x p r e s s e d a s S E L E C T

F

R O M

W

H E R E q u e r i e s w i t h

u

t a g g r e g a t e s

r

a t t r i b u t e r e n a m i n g . A b a s e r e l a t i

n

i s a l s

t

r e a t e d a s a b a s i c b l

c

k . A g g r e g a t i

n

B l

c

k s A n a g g r e g a t i

n

b l

c

k i s a s i n

g

l e a g g r e g a t i

n
p

e r a t i

n

s p e c i f y i n g a s e t

f

g r

u

p i n g a t t r i b u t e s a n d a l i s t

f

a g g r e g a t e f u n c t i

n

s t

b

e c

m
p

u t e d

v

e r t h e g r

u

p s . T h e a g g r e g a t i

n

c a n

p

t i

n

a l l y b e f

l

l

w

e d b y a s e l e c t i

n

. I n S Q L , s u c h

p

e r a t i

n

s a r e e x p r e s s e d

a s S E L E C T

F

R O M

G

R O U P B Y

H

A V I N G

q u e r i e s w h e r e t h e

S E L E C T

c l a u s e i n c l u d e s a l l g r

u

p i n g a t t r i b u t e s ( p l u s s

m

e a g g r e g a t e s ) , a n d t h e F R O M c l a u s e c

n

t a i n s a s i n g l e r e l a t i

n

. S e t B l

c

k s S e t b l

c

k s e x p r e s s t h e s e t

r

i e n t e d

p

e r a

t

i

n

s , n a m e l y u n i

n

, i n t e r s e c t i

n

a n d d i f f e r e n c e

f

r e l a t i

n

s w i t h t h e s a m e s c h e m a . T h e c

r

r e s p

n

d i n g S Q L c

n

s t r u c t s

a r e U N I O N , I N T E R S E C T

a n d E X C E P T . A n y r e l a t i

n

a l q u e r y c a n b e s p e c i f i e d u s i n g t h e s e t h r e e

p

e r a t i

n

s . A q u e r y c a n b e s p l i t i n t

t

w

b

l

c

k s i f n e c e s

s

a r y , f

r

e x a m p l e i f t h e q u e r y h a s b

t

h a W B E R E c l a u s e a n d a G R O U P B Y c l a u s e t h e n w e c a n r e w r i t e i t a s a n a g g r e g a t i

n

b l

c

k a p p l i e d t

a

b a s i c b l

c

k . N

t

e t h a t w e d

n
t

a l l

w

c

n

s t a n t r e l a t i

n

s i n q u e r i e s , n

r

d

w

e a l l

w

a t t r i b u t e r e n a m i n g . ( I n [ C h a 9 7 ] w e d e s c r i b e t h e e x t e n s i

n

s n e e d e d t

h

a n d l e t h e r e n a m i n g

p

e r a t

r

. ) D e f i n i t i

n

2 . 1 : ( Q u e r y G r a p h ) S u p p

s

e t h a t w e a r e g i v e n a q u e r y Q . T h e n

d

e s

f

t h e q u e r y g r a p h f

r

Q a r e t h e r e l a t i

n

a l

p

e r a t i

n

s ( a s d e f i n e d a b

v

e ) u s e d i n Q a n d i t s s u b v i e w s . B a s i c b l

c

k s a r e w r i t t e n a s r e c t a n g l e s , a g g r e g a t i

n

b l

c

k s a s c i r c l e s , a n d s e t b l

c

k s a s d i a m

n

d s . E v e r y b l

c

k i s g i v e n a l a b e l c

r

r e s p

n

d i n g t

t

h e v i e w (

r

r e l a t i

n

) n a m e

f

t h a t b l

c

k . T h e r e i s a n e d g e f r

m

n

d

e l a b e l e d N i t

n
d

e l a b e l e d N s i f N s i s m e n t i

n

e d i n t h e F R O M c l a u s e

f

N i ,

r

i f N s i s a n a r g u m e n t

f

a s e t

p

e r a t i

n

i n N i . A n e d g e w h

s

e s

u

r c e i s a n a g g r e g a t i

n

b l

c

k N i s l a b e l e d w i t h t h e g r

u

p i n g a t t r i b u t e s

f

N . E d g e s c

m

i n g

u

t

f

a s i n g l e b a s i c b l

c

k N a r e l i n k e d t

g

e t h e r w i t h a n a r c , a n d j

i

n t l y l a b e l e d b y t h e j

i

n c

n

d i t i

n
f

N . E d g e s c

m

i n g

u

t

f

s e t b l

c

k s a r e n

t

l a b e l e d . A q u e r y g r a p h i s a d i r e c t e d a c y c l i c g r a p h w i t h a s i n g l e s

u

r c e ( r

t

) n

d

e r e p r e s e n t i n g t h e q u e r y r e s u l t . W e s h a l l c

n

v e n t i

n

a l l y d r a w t h e g r a p h w i t h e d g e s “ p

i

n t i n g u p . ” F

r

s i n g l e

r

e l a t i

n

d a t a b a s e s , t h e r e w i l l a l w a y s b e a s i n g l e s i n k n

d

e a t t h e t

p
f

t h e p i c t u r e . F i g u r e 1 s h

w

s t h e q u e r y g r a p h s f

r

q u e r i e s Q l , a n d Q 2 . N

t

i c e t h e s e p a r a t i

n
f

V 2 a n d V 3 f r

m

q u e r y Q l i n t

b

a s i c b l

c

k s ( V 2 B , V 3 B ) a n d a g g r e g a t i

n

b l

c

k s ( V 2 A , V 3 A ) .

E ) , , O 2 : j

i

n c

n

d i t i

n

s . : j

i

n c

n

d i t i

n

.

(4 query Ql (b) query Q2

F i g u r e 1 : Q u e r y g r a p h s D e f i n i t i

n

2 . 2 : ( P a r t i t i

n

i n g ) L e t S b e a s e t

f

a t t r i b u t e s f r

m

t h e u n d e r l y i n g r e l a t i

n

R , a n d l e t D r e p r e s e n t t h e d

m

a i n

v

e r w h i c h t h e t u p l e s

f

S v a l u e s r a n g e . F

r

e v e r y n

d

e N i n t h e q u e r y g r a p h a n d f

r

x E D w e w r i t e N ( x ) t

r

e p r e s e n t t h e a n s w e r a t n

d

e N w h e n t h e e x t e n s i

n
f

R i s r e p l a c e d b y c r s = = ( R ) . O b s e r v e t h a t t h e a t t r i b u t e s i n S d

n
t

h a v e t

b

e a t

t

r i b u t e s

f

t h e a n s w e r a t n

d

e N f

r

N ( x ) t

b

e w e l l

d

e f i n e d . A l s

,
b

s e r v e t h a t R = U z E D R ( z ) . 4 7 8

applied separately to partitions

f FYIJJG

based in id, and the partial results could be unioned together. Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration

f the sections read in the

days before each access to the world section, and view V2 the average duration

f the sections read in the days after.

We join Vl and V2 in order to get the result in single

tuples. The actual SQL formulation

is omitted due to lack

f space and appears in [Cha97].

Once again, FYI-LOG can be partitioned

n id attribute

and a simpler query can be posed separately

n each partition.

We shall discuss Query Q4 in Section 2.4.

2 Theoretical Framework

In this section, we introduce

ur terminology

and define what we mean by a group query. We give a syntactic criterion for identifying group queries and prove that this condition is sufficient. We also show that every group query can be expressed in a form that satisfies our criterion. 2.1 Assumptions and Terminology We assume that queries are written in terms of views, with no subqueries. This is a valid assumption since there are many ways to rewrite a subquery as a join of two (or more) views [Kim82, Day87, SPLSG]. We initially assume that the database contains a single relation

R. (Multiple

relations will be considered in Section 2.4.) R may itself be a view

r the result of another query, but from our point of view it

is treated as an encapsulated table. (I.e., if R was a view, then we don’t consider unfolding the definition

f R into

queries over R.) We shall define below the notion of a query graph. A query graph has nodes that are relational

perations.

We consider three kinds of relational

perations:

Basic Blocks A basic block is some combination

f

projections and selections applied to a join of relations. In SQL such operations are expressed as SELECT-FROM-WHERE queries without aggregates or attribute renaming. A base relation is also treated as a basic block. Aggregation Blocks An aggregation block is a single aggregation

peration

specifying a set of grouping attributes and a list of aggregate functions to be com- puted over the groups. The aggregation can optionally be followed by a selection. In SQL, such operations are expressed as SELECT-FROM-GROUPBY-HAVING queries where the SELECT clause includes all grouping attributes (plus some aggregates), and the FROM clause contains a single relation. Set Blocks Set blocks express the set-oriented

pera-

tions, namely union, intersection and difference of relations with the same schema. The corresponding SQL constructs

are UNION, INTERSECT and EXCEPT.

Any relational query can be specified using these three

perations.

A query can be split into two blocks if neces- sary, for example if the query has both a WBERE clause and a GROUPBY clause then we can rewrite it as an aggregation block applied to a basic block. Note that we do not allow constant relations in queries, nor do we allow attribute renaming. (In [Cha97] we describe the extensions needed to handle the renaming

perator.)

Definition 2.1: (Query Graph) Suppose that we are given a query Q. The nodes of the query graph for Q are the relational

perations

(as defined above) used in Q and its subviews. Basic blocks are written as rectangles, aggregation blocks as circles, and set blocks as diamonds. Every block is given a label corresponding to the view (or relation) name of that block. There is an edge from node labeled Ni to node labeled Ns if Ns is mentioned in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation block N is labeled with the grouping attributes

f N. Edges coming
ut of a single basic block N are linked together with an

arc, and jointly labeled by the join condition

f N. Edges

coming out of set blocks are not labeled. A query graph is a directed acyclic graph with a single source (root) node representing the query result. We shall conventionally draw the graph with edges “pointing up.” For single-relation databases, there will always be a single sink node at the top of the picture. Figure 1 shows the query graphs for queries Ql, and Q2. Notice the separation

f V2 and V3 from query Ql into basic blocks (V2B,

V3B) and aggregation blocks (V2A, V3A).

E), , O2 : join conditions. 0 : join condition.

(4 query Ql (b) query Q2

Figure 1: Query graphs Definition 2.2: (Partitioning) Let S be a set of attributes from the underlying relation R, and let D represent the domain over which the tuples of S values range. For every node N in the query graph and for x E D we write N(x) to represent the answer at node N when the extension of R is replaced by crs== (R). 0 Observe that the attributes in S do not have to be attributes

f the answer at node N for N(x)

to be well-

defined. Also, observe that R = UzED R(z).

478

SLIDE 91

Torsten Grust U Tübingen 27

A Zoo of Query Representations

a p p l i e d s e p a r a t e l y t

p

a r t i t i

n

s

f

F Y I J J G b a s e d i n

i d ,

a n d t h e p a r t i a l r e s u l t s c

u

l d b e u n i

n

e d t

g

e t h e r . Q u e r y Q 3 i s e x p r e s s e d i n s t a n d a r d S Q L a s f

l

l

w

s . V i e w V l c

n

t a i n s t h e a v e r a g e d u r a t i

n
f

t h e s e c t i

n

s r e a d i n t h e d a y s b e f

r

e e a c h a c c e s s t

t

h e w

r

l d s e c t i

n

, a n d v i e w V 2 t h e a v e r a g e d u r a t i

n
f

t h e s e c t i

n

s r e a d i n t h e d a y s a f t e r . W e j

i

n V l a n d V 2 i n

r

d e r t

g

e t t h e r e s u l t i n s i n g l e t u p l e s . T h e a c t u a l S Q L f

r

m u l a t i

n

i s

m

i t t e d d u e t

l

a c k

f

s p a c e a n d a p p e a r s i n [ C h a 9 7 ] . O n c e a g a i n , F Y I

L

O G c a n b e p a r t i t i

n

e d

n

i d

a t t r i b u t e a n d a s i m p l e r q u e r y c a n b e p

s

e d s e p a r a t e l y

n

e a c h p a r t i t i

n

. W e s h a l l d i s c u s s Q u e r y Q 4 i n S e c t i

n

2 . 4 .

2 Theoretical Framework

I n t h i s s e c t i

n

, w e i n t r

d

u c e

u

r t e r m i n

l
g

y a n d d e f i n e w h a t w e m e a n b y a g r

u

p q u e r y . W e g i v e a s y n t a c t i c c r i t e r i

n

f

r

i d e n t i f y i n g g r

u

p q u e r i e s a n d p r

v

e t h a t t h i s c

n

d i t i

n

i s s u f f i c i e n t . W e a l s

s

h

w

t h a t e v e r y g r

u

p q u e r y c a n b e e x p r e s s e d i n a f

r

m t h a t s a t i s f i e s

u

r c r i t e r i

n

. 2 . 1 A s s u m p t i

n

s a n d T e r m i n

l
g

y W e a s s u m e t h a t q u e r i e s a r e w r i t t e n i n t e r m s

f

v i e w s , w i t h n

s

u b q u e r i e s . T h i s i s a v a l i d a s s u m p t i

n

s i n c e t h e r e a r e m a n y w a y s t

r

e w r i t e a s u b q u e r y a s a j

i

n

f

t w

(
r

m

r

e ) v i e w s [ K i m 8 2 , D a y 8 7 , S P L S G ] . W e i n i t i a l l y a s s u m e t h a t t h e d a t a b a s e c

n

t a i n s a s i n g l e r e l a t i

n

R . ( M u l t i p l e r e l a t i

n

s w i l l b e c

n

s i d e r e d i n S e c t i

n

2 . 4 . ) R m a y i t s e l f b e a v i e w

r

t h e r e s u l t

f

a n

t

h e r q u e r y , b u t f r

m
u

r p

i

n t

f

v i e w i t i s t r e a t e d a s a n e n c a p s u l a t e d t a b l e . ( I . e . , i f R w a s

a v i e w ,

t h e n w e d

n

’ t c

n

s i d e r u n f

l

d i n g t h e d e f i n i t i

n
f

R i n t

q

u e r i e s

v

e r R . ) W e s h a l l d e f i n e b e l

w

t h e n

t

i

n
f

a q u e r y g r a p h . A q u e r y g r a p h h a s n

d

e s t h a t a r e r e l a t i

n

a l

p

e r a t i

n

s . W e c

n

s i d e r t h r e e k i n d s

f

r e l a t i

n

a l

p

e r a t i

n

s : B a s i c B l

c

k s A b a s i c b l

c

k i s s

m

e c

m

b i n a t i

n
f

p r

j

e c t i

n

s a n d s e l e c t i

n

s a p p l i e d t

a

j

i

n

f

r e l a t i

n

s . I n S Q L s u c h

p

e r a t i

n

s a r e e x p r e s s e d a s S E L E C T

F

R O M

W

H E R E q u e r i e s w i t h

u

t a g g r e g a t e s

r

a t t r i b u t e r e n a m i n g . A b a s e r e l a t i

n

i s a l s

t

r e a t e d a s a b a s i c b l

c

k . A g g r e g a t i

n

B l

c

k s A n a g g r e g a t i

n

b l

c

k i s a s i n

g

l e a g g r e g a t i

n
p

e r a t i

n

s p e c i f y i n g a s e t

f

g r

u

p i n g a t t r i b u t e s a n d a l i s t

f

a g g r e g a t e f u n c t i

n

s t

b

e c

m
p

u t e d

v

e r t h e g r

u

p s . T h e a g g r e g a t i

n

c a n

p

t i

n

a l l y b e f

l

l

w

e d b y a s e l e c t i

n

. I n S Q L , s u c h

p

e r a t i

n

s a r e e x p r e s s e d

a s S E L E C T

F

R O M

G

R O U P B Y

H

A V I N G

q u e r i e s w h e r e t h e

S E L E C T

c l a u s e i n c l u d e s a l l g r

u

p i n g a t t r i b u t e s ( p l u s s

m

e a g g r e g a t e s ) , a n d t h e F R O M c l a u s e c

n

t a i n s a s i n g l e r e l a t i

n

. S e t B l

c

k s S e t b l

c

k s e x p r e s s t h e s e t

r

i e n t e d

p

e r a

t

i

n

s , n a m e l y u n i

n

, i n t e r s e c t i

n

a n d d i f f e r e n c e

f

r e l a t i

n

s w i t h t h e s a m e s c h e m a . T h e c

r

r e s p

n

d i n g S Q L c

n

s t r u c t s

a r e U N I O N , I N T E R S E C T

a n d E X C E P T . A n y r e l a t i

n

a l q u e r y c a n b e s p e c i f i e d u s i n g t h e s e t h r e e

p

e r a t i

n

s . A q u e r y c a n b e s p l i t i n t

t

w

b

l

c

k s i f n e c e s

s

a r y , f

r

e x a m p l e i f t h e q u e r y h a s b

t

h a W B E R E c l a u s e a n d a G R O U P B Y c l a u s e t h e n w e c a n r e w r i t e i t a s a n a g g r e g a t i

n

b l

c

k a p p l i e d t

a

b a s i c b l

c

k . N

t

e t h a t w e d

n
t

a l l

w

c

n

s t a n t r e l a t i

n

s i n q u e r i e s , n

r

d

w

e a l l

w

a t t r i b u t e r e n a m i n g . ( I n [ C h a 9 7 ] w e d e s c r i b e t h e e x t e n s i

n

s n e e d e d t

h

a n d l e t h e r e n a m i n g

p

e r a t

r

. ) D e f i n i t i

n

2 . 1 : ( Q u e r y G r a p h ) S u p p

s

e t h a t w e a r e g i v e n a q u e r y Q . T h e n

d

e s

f

t h e q u e r y g r a p h f

r

Q a r e t h e r e l a t i

n

a l

p

e r a t i

n

s ( a s d e f i n e d a b

v

e ) u s e d i n Q a n d i t s s u b v i e w s . B a s i c b l

c

k s a r e w r i t t e n a s r e c t a n g l e s , a g g r e g a t i

n

b l

c

k s a s c i r c l e s , a n d s e t b l

c

k s a s d i a m

n

d s . E v e r y b l

c

k i s g i v e n a l a b e l c

r

r e s p

n

d i n g t

t

h e v i e w (

r

r e l a t i

n

) n a m e

f

t h a t b l

c

k . T h e r e i s a n e d g e f r

m

n

d

e l a b e l e d N i t

n
d

e l a b e l e d N s i f N s i s m e n t i

n

e d i n t h e F R O M c l a u s e

f

N i ,

r

i f N s i s a n a r g u m e n t

f

a s e t

p

e r a t i

n

i n N i . A n e d g e w h

s

e s

u

r c e i s a n a g g r e g a t i

n

b l

c

k N i s l a b e l e d w i t h t h e g r

u

p i n g a t t r i b u t e s

f

N . E d g e s c

m

i n g

u

t

f

a s i n g l e b a s i c b l

c

k N a r e l i n k e d t

g

e t h e r w i t h a n a r c , a n d j

i

n t l y l a b e l e d b y t h e j

i

n c

n

d i t i

n
f

N . E d g e s c

m

i n g

u

t

f

s e t b l

c

k s a r e n

t

l a b e l e d . A q u e r y g r a p h i s a d i r e c t e d a c y c l i c g r a p h w i t h a s i n g l e s

u

r c e ( r

t

) n

d

e r e p r e s e n t i n g t h e q u e r y r e s u l t . W e s h a l l c

n

v e n t i

n

a l l y d r a w t h e g r a p h w i t h e d g e s “ p

i

n t i n g u p . ” F

r

s i n g l e

r

e l a t i

n

d a t a b a s e s , t h e r e w i l l a l w a y s b e a s i n g l e s i n k n

d

e a t t h e t

p
f

t h e p i c t u r e . F i g u r e 1 s h

w

s t h e q u e r y g r a p h s f

r

q u e r i e s Q l , a n d Q 2 . N

t

i c e t h e s e p a r a t i

n
f

V 2 a n d V 3 f r

m

q u e r y Q l i n t

b

a s i c b l

c

k s ( V 2 B , V 3 B ) a n d a g g r e g a t i

n

b l

c

k s ( V 2 A , V 3 A ) .

E ) , , O 2 : j

i

n c

n

d i t i

n

s . : j

i

n c

n

d i t i

n

.

(4 query Ql (b) query Q2

F i g u r e 1 : Q u e r y g r a p h s D e f i n i t i

n

2 . 2 : ( P a r t i t i

n

i n g ) L e t S b e a s e t

f

a t t r i b u t e s f r

m

t h e u n d e r l y i n g r e l a t i

n

R , a n d l e t D r e p r e s e n t t h e d

m

a i n

v

e r w h i c h t h e t u p l e s

f

S v a l u e s r a n g e . F

r

e v e r y n

d

e N i n t h e q u e r y g r a p h a n d f

r

x E D w e w r i t e N ( x ) t

r

e p r e s e n t t h e a n s w e r a t n

d

e N w h e n t h e e x t e n s i

n
f

R i s r e p l a c e d b y c r s = = ( R ) . O b s e r v e t h a t t h e a t t r i b u t e s i n S d

n
t

h a v e t

b

e a t

t

r i b u t e s

f

t h e a n s w e r a t n

d

e N f

r

N ( x ) t

b

e w e l l

d

e f i n e d . A l s

,
b

s e r v e t h a t R = U z E D R ( z ) . 4 7 8

applied separately to partitions

f FYIJJG

based in id, and the partial results could be unioned together. Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration

f the sections read in the

days before each access to the world section, and view V2 the average duration

f the sections read in the days after.

We join Vl and V2 in order to get the result in single

tuples. The actual SQL formulation

is omitted due to lack

f space and appears in [Cha97].

Once again, FYI-LOG can be partitioned

n id attribute

and a simpler query can be posed separately

n each partition.

We shall discuss Query Q4 in Section 2.4.

2 Theoretical Framework

In this section, we introduce

ur terminology

and define what we mean by a group query. We give a syntactic criterion for identifying group queries and prove that this condition is sufficient. We also show that every group query can be expressed in a form that satisfies our criterion. 2.1 Assumptions and Terminology We assume that queries are written in terms of views, with no subqueries. This is a valid assumption since there are many ways to rewrite a subquery as a join of two (or more) views [Kim82, Day87, SPLSG]. We initially assume that the database contains a single relation

R. (Multiple

relations will be considered in Section 2.4.) R may itself be a view

r the result of another query, but from our point of view it

is treated as an encapsulated table. (I.e., if R was a view, then we don’t consider unfolding the definition

f R into

queries over R.) We shall define below the notion of a query graph. A query graph has nodes that are relational

perations.

We consider three kinds of relational

perations:

Basic Blocks A basic block is some combination

f

projections and selections applied to a join of relations. In SQL such operations are expressed as SELECT-FROM-WHERE queries without aggregates or attribute renaming. A base relation is also treated as a basic block. Aggregation Blocks An aggregation block is a single aggregation

peration

specifying a set of grouping attributes and a list of aggregate functions to be com- puted over the groups. The aggregation can optionally be followed by a selection. In SQL, such operations are expressed as SELECT-FROM-GROUPBY-HAVING queries where the SELECT clause includes all grouping attributes (plus some aggregates), and the FROM clause contains a single relation. Set Blocks Set blocks express the set-oriented

pera-

tions, namely union, intersection and difference of relations with the same schema. The corresponding SQL constructs

are UNION, INTERSECT and EXCEPT.

Any relational query can be specified using these three

perations.

A query can be split into two blocks if neces- sary, for example if the query has both a WBERE clause and a GROUPBY clause then we can rewrite it as an aggregation block applied to a basic block. Note that we do not allow constant relations in queries, nor do we allow attribute renaming. (In [Cha97] we describe the extensions needed to handle the renaming

perator.)

Definition 2.1: (Query Graph) Suppose that we are given a query Q. The nodes of the query graph for Q are the relational

perations

(as defined above) used in Q and its subviews. Basic blocks are written as rectangles, aggregation blocks as circles, and set blocks as diamonds. Every block is given a label corresponding to the view (or relation) name of that block. There is an edge from node labeled Ni to node labeled Ns if Ns is mentioned in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation block N is labeled with the grouping attributes

f N. Edges coming
ut of a single basic block N are linked together with an

arc, and jointly labeled by the join condition

f N. Edges

coming out of set blocks are not labeled. A query graph is a directed acyclic graph with a single source (root) node representing the query result. We shall conventionally draw the graph with edges “pointing up.” For single-relation databases, there will always be a single sink node at the top of the picture. Figure 1 shows the query graphs for queries Ql, and Q2. Notice the separation

f V2 and V3 from query Ql into basic blocks (V2B,

V3B) and aggregation blocks (V2A, V3A).

E), , O2 : join conditions. 0 : join condition.

(4 query Ql (b) query Q2

Figure 1: Query graphs Definition 2.2: (Partitioning) Let S be a set of attributes from the underlying relation R, and let D represent the domain over which the tuples of S values range. For every node N in the query graph and for x E D we write N(x) to represent the answer at node N when the extension of R is replaced by crs== (R). 0 Observe that the attributes in S do not have to be attributes

f the answer at node N for N(x)

to be well-

defined. Also, observe that R = UzED R(z).

478

applied separately to partitions

f FYIJJG

based in i d , and the partial results could be unioned together. Query Q3 is expressed in standard SQL as follows. View Vl contains the average duration

f the sections read in the

days before each access to the world section, and view V2 the average duration

f the sections read in the days after.

We join Vl and V2 in order to get the result in single

tuples. The actual SQL formulation

is omitted due to lack

f space and appears in [Cha97].

Once again, FYI-LOG can be partitioned

n

i d attribute and a simpler query can be posed separately

n each partition.

We shall discuss Query Q4 in Section 2.4.

2 Theoretical Framework

In this section, we introduce

ur terminology

and define what we mean by a group query. We give a syntactic criterion for identifying group queries and prove that this condition is sufficient. We also show that every group query can be expressed in a form that satisfies our criterion. 2.1 Assumptions and Terminology We assume that queries are written in terms of views, with no subqueries. This is a valid assumption since there are many ways to rewrite a subquery as a join of two (or more) views [Kim82, Day87, SPLSG]. We initially assume that the database contains a single relation

R. (Multiple

relations will be considered in Section 2.4.) R may itself be a view

r the result of another query, but from our point of view it

is treated as an encapsulated table. (I.e., if R was a v i e w , then we don’t consider unfolding the definition

f R into

queries over R.) We shall define below the notion of a query graph. A query graph has nodes that are relational

perations.

We consider three kinds of relational

perations:

Basic Blocks A basic block is some combination

f

projections and selections applied to a join of relations. In SQL such operations are expressed as SELECT-FROM-WHERE queries without aggregates or attribute renaming. A base relation is also treated as a basic block. Aggregation Blocks An aggregation block is a single aggregation

peration

specifying a set of grouping attributes and a list of aggregate functions to be com- puted over the groups. The aggregation can optionally be followed by a selection. In SQL, such operations are expressed a s S E L E C T

F

R O M

G

R O U P B Y

H

A V I N G queries where the S E L E C T clause includes all grouping attributes (plus some aggregates), and the FROM clause contains a single relation. Set Blocks Set blocks express the set-oriented

pera-

tions, namely union, intersection and difference of relations with the same schema. The corresponding SQL constructs a r e U N I O N , I N T E R S E C T and EXCEPT. Any relational query can be specified using these three

perations.

A query can be split into two blocks if neces- sary, for example if the query has both a WBERE clause and a GROUPBY clause then we can rewrite it as an aggregation block applied to a basic block. Note that we do not allow constant relations in queries, nor do we allow attribute renaming. (In [Cha97] we describe the extensions needed to handle the renaming

perator.)

Definition 2.1: (Query Graph) Suppose that we are given a query Q. The nodes of the query graph for Q are the relational

perations

(as defined above) used in Q and its subviews. Basic blocks are written as rectangles, aggregation blocks as circles, and set blocks as diamonds. Every block is given a label corresponding to the view (or relation) name of that block. There is an edge from node labeled Ni to node labeled Ns if Ns is mentioned in the FROM clause of Ni , or if Ns is an argument of a set operation in Ni. An edge whose source is an aggregation block N is labeled with the grouping attributes

f N. Edges coming
ut of a single basic block N are linked together with an

arc, and jointly labeled by the join condition

f N. Edges

coming out of set blocks are not labeled. A query graph is a directed acyclic graph with a single source (root) node representing the query result. We shall conventionally draw the graph with edges “pointing up.” For single-relation databases, there will always be a single sink node at the top of the picture. Figure 1 shows the query graphs for queries Ql, and Q2. Notice the separation

f V2 and V3 from query Ql into basic blocks (V2B,

V3B) and aggregation blocks (V2A, V3A). E ) , , O 2 : j

i

n c

n

d i t i

n

s . : j

i

n c

n

d i t i

n

.

(4 query Ql (b) query Q2

Figure 1: Query graphs Definition 2.2: (Partitioning) Let S be a set of attributes from the underlying relation R, and let D represent the domain over which the tuples of S values range. For every node N in the query graph and for x E D we write N(x) to represent the answer at node N when the extension of R is replaced by crs== (R). 0 Observe that the attributes in S do not have to be attributes

f the answer at node N for N(x)

to be well-

defined. Also, observe that R = UzED R(z).

478

SQL surface syntax, relational algebra, query graphs + annotations, iteration

SLIDE 92

Torsten Grust U Tübingen

A Uniform Query Representation

28

SLIDE 93

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

SLIDE 94

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

map (λ⟨x,P⟩.⟨x, Q’ g agg P⟩ (partition f xs)

SLIDE 95

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

map (λ⟨x,P⟩.⟨x, Q’ g agg P⟩ (partition f xs)

[ ⟨f(x), [ g(y) | y ← R, f(y) = f(x) ]agg⟩ | x ← R ]set

SLIDE 96

Torsten Grust U Tübingen

A Uniform Query Representation

28

[ g(y) | y ← P ]agg Q’ g agg P = [ f(x) | x ← xs ]M map f xs = [ ⟨f(x), [ y | y ← xs, f(x) = f(y) ]M⟩ | x ← xs ]set partition f xs =

map (λ⟨x,P⟩.⟨x, Q’ g agg P⟩ (partition f xs)

SELECT f(x), agg(g(x)) FROM R AS x GROUP BY f(x)

SLIDE 97

Torsten Grust U Tübingen 29

XPath

SLIDE 98

Torsten Grust U Tübingen

XPath Comprehensions

30

SLIDE 99

Torsten Grust U Tübingen

XPath Comprehensions

30

/descendant::a[following::b]/child::c

SLIDE 100

Torsten Grust U Tübingen

XPath Comprehensions

30

/descendant::a[following::b]/child::c

1. Normalize, simplify, flip XPath step expressions

SLIDE 101

Torsten Grust U Tübingen

XPath Comprehensions

30

/descendant::a[following::b]/child::c

1. Normalize, simplify, flip XPath step expressions
2. Compile XPath into queries over tabular XML encoding

SLIDE 102

Torsten Grust U Tübingen

XPath Comprehensions

30

xpath p (root c) xpath (/p) c = [ n’ | n ← xpath p1 c, n’ ← xpath p2 n ]X xpath (p1/p2) c = [ n | n ← xpath p c, [ true | _ ← xpath q n ]some ]X xpath (p[q]) c = step (ax::t) c xpath (ax::t) c =

/descendant::a[following::b]/child::c

1. Normalize, simplify, flip XPath step expressions
2. Compile XPath into queries over tabular XML encoding

SLIDE 103

Torsten Grust U Tübingen

A Tabular XML Encoding

31

SLIDE 104

Torsten Grust U Tübingen

A Tabular XML Encoding

31

a b c d e g i j f h

SLIDE 105

Torsten Grust U Tübingen

A Tabular XML Encoding

31

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

SLIDE 106

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

SLIDE 107

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

SLIDE 108

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

[ n | n ← doc, pre c < pre n, post c > post n, tag n = t ]X step (descendant::t) c =

SLIDE 109

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

[ n | n ← doc, pre c < pre n, post c > post n, tag n = t ]X step (descendant::t) c =

SLIDE 110

Torsten Grust U Tübingen

A Tabular XML Encoding

31

5 5

post pre a b c d e f g h i j

a b c d e g i j f h 1b3 0a9 4e1 2c2 3d0 5f8 6g4 8i5 9j6 7h7

[ n | n ← doc, pre c < pre n, post c > post n, tag n = t ]X step (descendant::t) c = [ n | n ← doc, pre c > pre n, post c < post n, tag n = t ]X step (ancestor::t) c =

SLIDE 111

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

D. Olteanu et al., XMLDM (EDBT 2002), March 2002

SLIDE 112

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

D. Olteanu et al., XMLDM (EDBT 2002), March 2002

a b c d e g i j f h

SLIDE 113

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

D. Olteanu et al., XMLDM (EDBT 2002), March 2002

/descendant::g/preceding::c

a b c d e g i j f h

SLIDE 114

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

D. Olteanu et al., XMLDM (EDBT 2002), March 2002

/descendant::g/preceding::c

a b c d e g i j f h

←

SLIDE 115

Torsten Grust U Tübingen

XPath: Looking Forward

32

XPath: Looking Forward

D. Olteanu et al., XMLDM (EDBT 2002), March 2002

/descendant::g/preceding::c

a b c d e g i j f h

/descendant::c[following::g]

≡

←

SLIDE 116

Torsten Grust U Tübingen

XPath: Looking Forward

33

SLIDE 117

Torsten Grust U Tübingen

XPath: Looking Forward

33

SLIDE 118

Torsten Grust U Tübingen

Comprehending XPath

34

SLIDE 119

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c

SLIDE 120

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c

[ v’ | v ← doc, tag v = ’g’, v’ ← doc, pre v’ < pre v, post v’ < post v, tag v’ = ’c’ ]X

SLIDE 121

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c /descendant::c[following::g]

[ v’ | v ← doc, tag v = ’g’, v’ ← doc, pre v’ < pre v, post v’ < post v, tag v’ = ’c’ ]X

SLIDE 122

Torsten Grust U Tübingen

Comprehending XPath

34

/descendant::g/preceding::c

SELECT DISTINCT v’ FROM doc v, doc v’ WHERE tag v = ’g’ AND tag v’ = ’c’ AND pre v’ < pre v AND post v’ < post v ORDER BY pre v’

[ v’ | v ← doc, tag v = ’g’, v’ ← doc, pre v’ < pre v, post v’ < post v, tag v’ = ’c’ ]X

SLIDE 123

Torsten Grust U Tübingen 35

BRING BACK MONAD COMPREHENSIONS

SLIDE 124

Torsten Grust U Tübingen

Comprehensions in Haskell

36

Haskell

SLIDE 125

Torsten Grust U Tübingen

Comprehensions in Haskell

36



Haskell

SLIDE 126

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011

Haskell

SLIDE 127

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011

Haskell

SLIDE 128

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads

Haskell

SLIDE 129

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads

Haskell

27

SLIDE 130

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4

Haskell

27

SLIDE 131

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98

Haskell

27

SLIDE 132

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98

GHC

27

SLIDE 133

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98 Comprehensive  Comprehensions

GHC

27

SLIDE 134

Torsten Grust U Tübingen

Comprehensions in Haskell

36



1990 1997 2003 2007 2011 Comprehending Monads Haskell 1.4 Haskell 98 Comprehensive  Comprehensions Bringing Back  Monad Comprehensions

GHC

27

SLIDE 135

Torsten Grust U Tübingen 36



Movie Plot Line

5 min 120 min Meeting Inciting Incident Turning Point Climax Pinch

SLIDE 136

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

SLIDE 137

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

[ (the dept, maximum salary) | (name, dept, salary) <- employees , then group by dept using groupWith , length dept > 10 , then sortWith by Down (sum salary) , then take 5 ]

SLIDE 138

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

[ (the dept, maximum salary) | (name, dept, salary) <- employees , then group by dept using groupWith , length dept > 10 , then sortWith by Down (sum salary) , then take 5 ]

LIMIT ORDER BY ASC/DESC AGGR FROM HAVING GROUP BY

SLIDE 139

Torsten Grust U Tübingen

Comprehensi{ve, ons}

37

Comprehensive Comprehensions

P. Wadler, S. Peyton-Jones, Haskell Workshop, October 2007

[ sum salary | (name, “MS”, salary) <- employees , then group using runs 3 , then take 5 ]

OVER

row patterns!

Not shown: set operations, joins, WITH…RECURSIVE, …

SLIDE 140

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA DATA

SLIDE 141

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA DATA

SLIDE 142

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA Haskell d DATA

SLIDE 143

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS Haskell DATA

SLIDE 144

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL

SLIDE 145

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL

SLIDE 146

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL d

SLIDE 147

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL d d

SLIDE 148

Torsten Grust U Tübingen

Database–Supported Haskell

38

Haskell Heap DBMS DATA SQL SQL d d haskell

SLIDE 149

Torsten Grust U Tübingen 39

Database–Supported Haskell

SLIDE 150

Torsten Grust U Tübingen 39

Database–Supported Haskell

— rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

— margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

— our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

— best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ] Q Q Q Q Q Q Q Q

SLIDE 151

Torsten Grust U Tübingen 39

Database–Supported Haskell

— rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

— margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

— our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

— best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ]

i d t s d a y p r i c e A C M E 1 7 / 1 / 1 5 3 . A C M E 2 7 / 1 / 1 5 4 . A C M E 3 7 / 1 / 1 5 1 . A C M E 4 7 / 1 / 1 5 7 . ⠇ ⠇ ⠇ ⠇

Trades

Q Q Q Q Q Q Q Q

SLIDE 152

Torsten Grust U Tübingen 39

Database–Supported Haskell

— rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

— margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

— our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

— best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ] Q Q Q Q Q Q Q Q

SLIDE 153

Torsten Grust U Tübingen 39

Database–Supported Haskell

— rolling minimum (mins [3,4,1,7] = [3,3,1,1])

mins :: Ord a => [a] -> [a] mins xs = [ minimum [ y | (y,j) <- #xs, j <= i ] | (_,i) <- #xs ]

— margin: current value - minimum value up to now

margins :: (Ord a, Num a) => [a] -> [a] margins xs = [ x - y | (x,y) <- zip xs (mins xs) ]

— our profit is the maximum margin obtainable

profit :: (Ord a, Num a) => [a] -> [a] profit xs = maximum (margins xs)

— best profit obtainable for stock on given date

bestProfit :: Text -> Date -> [Trade] -> Double bestProfit stock date trades = profit [ price t | t <- sortWith ts trades, id t == stock, day t == date ]

Q Q Q Q Q Q Q Q

SLIDE 154

Torsten Grust U Tübingen 39

Database–Supported Haskell

—- SQL code generated from Haskell source SELECT MAX(margins.price - margins.min) FROM (SELECT t.price, MIN(t.price) OVER (ORDER BY t.ts ROW BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM trades AS t WHERE t.id = ‘ACME’ AND t.day = ’07/01/2015’ ) AS margins(price,min)

SLIDE 155

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

SLIDE 156

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

SLIDE 157

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

SLIDE 158

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

1 :: [a] → [b]

f

1

SLIDE 159

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

[ [ f y | y ← g x ] | x ← xs ] f :: a → b f

1 :: [a] → [b]

f

2 :: [[a]] → [[b]]

f

2

SLIDE 160

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

f :: a → b f

1 :: [a] → [b]

f

2 :: [[a]] → [[b]]

f

2 [ g x | x ← xs ]

SLIDE 161

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

f :: a → b f

1 :: [a] → [b]

f

2 :: [[a]] → [[b]]

f

2 (g1 xs)

SLIDE 162

Torsten Grust U Tübingen

Comprehensions Yield Independent Work 

40

[ f n e | x ← xs ] ⇝ f n+1 [ e | x ← xs ]

SLIDE 163

Torsten Grust U Tübingen 40

[ f n e | x ← xs ] ⇝ f n+1 [ e | x ← xs ]

Nested Data Parallelism

Implementation of a Portable Nested Data-Parallel Language

G. E. Blelloch et al., ACM PPoPP, May 1993

SLIDE 164

Torsten Grust U Tübingen

The Flatter, the Better

41

SLIDE 165

Torsten Grust U Tübingen

The Flatter, the Better

41

xss +2 yss

SLIDE 166

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , , 3 , 1 1 , 1 , 7 ]  +   [ , 4 , 1 2 , 1 3 , 2 , 3 ] [ ] [ ] [ ]     [ ] [ ] [ ]

2

xss +2 yss

SLIDE 167

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , , 3 , 1 1 , 1 , 7 ]  +   [ , 4 , 1 2 , 1 3 , 2 , 3 ] [ ] [ ] [ ] [ ] [ ] [ ]

2

xss +2 yss

SLIDE 168

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , , 3 , 1 1 , 1 , 7 ]  +   [ , 4 , 1 2 , 1 3 , 2 , 3 ]

1

forget

[ ] [ ] [ ]

xss +2 yss

SLIDE 169

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ]

forget

[ ] [ ] [ ]

xss +2 yss

SLIDE 170

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ]

forget imprint

[ ] [ ] [ ]

xss +2 yss

SLIDE 171

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ]

forget imprint

[ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

SLIDE 172

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

SLIDE 173

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

SLIDE 174

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

SLIDE 175

Torsten Grust U Tübingen

The Flatter, the Better

41

[ 1 9 , 4 , 4 2 , 2 4 , 1 2 , 1 ] [ ] [ ] [ ]

f n e ⇝ imprintn-1 (f 1 (forgetn-1 e))

pos sum

1 19 2 4 3 42 4 24 5 12 6 10

seg pos

1 1 1 2 1 3

seg

1 1 2 3 3 3

SLIDE 176

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42

SLIDE 177

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42

+1

SLIDE 178

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1

SLIDE 179

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 πsum: x+y( )

SLIDE 180

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( )

SLIDE 181

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( )

SLIDE 182

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( ) ⋉

p

SLIDE 183

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( ) ⋉

p∧ seg1 = seg2

SLIDE 184

Torsten Grust U Tübingen

Database Systems:  Designed to Implement _

1

42 seg1 ⋯ x

1 19 1 2 30 3 11 3 3 7

seg2 ⋯ y

1 1 4 2 12 3 13 3 2 3 3

seg ⋯ x y

1 19 1 4 2 30 12 3 11 13 3 2 3 7 3

+1 ⋉

p

1

πsum: x+y( ) ⋉

p∧ seg1 = seg2

SLIDE 185

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[(Int,[Str],[(Bool,[(Int,Int)])])]

SLIDE 186

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[(Int,[Str],[(Bool,[(Int,Int)])])]

SLIDE 187

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[[][[]]]

SLIDE 188

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[[][[]]]

Query Plan Bundle

SLIDE 189

Torsten Grust U Tübingen

Plan Bundles Instead of Query Avalanches

43

[[][[]]]

Query Plan Bundle

SLIDE 190

Torsten Grust U Tübingen 44