[PPT] - The script-writers dream: How to write great SQL in your own PowerPoint Presentation

SLIDE 1

The script-writer’s dream: How to write great SQL in your own language and be sure it will succeed.

Ezra Cooper University of Edinburgh August 24, 2009

SLIDE 2

The Problem

SLIDE 3

The Problem

SLIDE 4

The Problem

Problems with SQL:

◮ Embedding is usually a pain ◮ Lacks nested data structures ◮ Lacks abstraction

SLIDE 5

The Problem

Problems with SQL:

◮ Embedding is usually a pain ◮ Lacks nested data structures ◮ Lacks abstraction

The Solution Language-integrated query

SLIDE 6

Example: Little League database

teams: name Marchmont United Bruntsfield Hustlers Newington Numpties Leith Sluggers players: name team age Sam Marchmont United 17 Ezra Bruntsfield Hustlers 15 Angus Newington Numpties 18 Jeremy Newington Numpties 15 Paidi Leith Sluggers 15 Bob Newington Numpties 14 Lucy Leith Sluggers 17 Hugh Bruntsfield Hustlers 13

SLIDE 7

Meet the comprehension

for (x ← src) body

SLIDE 8

Meet the comprehension

for (x ← src) body

Bind x to successive values from src evaluating body for each one; the result is the union of all the collections so produced.

SLIDE 9

Meet the comprehension

for (x ← src) body

Bind x to successive values from src evaluating body for each one; the result is the union of all the collections so produced. Some syntactic sugar:

for (x ← src) where(cond) body = for (x ← src) (if (cond) then body else [])

SLIDE 10

Example: A simple query

var players = table “players” : [(age : int, name : string)]; fun overAgePlayers() { query { for (p ← players) where (p.age > 16) [(name = p.name)] } }

SLIDE 11

Example: A simple query

var players = table “players” : [(age : int, name : string)]; fun overAgePlayers() { query { for (p ← players) where (p.age > 16) [(name = p.name)] } } SQL: select p.name as name from players as p where p.age > 16

SLIDE 12

Example: Detecting non-queryizable operations

fun overAgePlayersReversed() { query { for (p ← players) where (p.age > 16) [(name = reverse(p.name))] # ERROR! } }

SLIDE 13

Example: Abstracting the query

fun selectedPlayers(pred) { query { for (p ← players) where (pred(p)) [(name = p.name)] } }

SLIDE 14

Example: Abstracting the query

fun selectedPlayers(pred) { query { for (p ← players) where (pred(p)) [(name = p.name)] } } SQL ?? select p.name as name from players as p where [ · · · ]

SLIDE 15

Example: Nested query

fun unusablePlayers() { query { var teamRosters : [(roster : [Player])]= for (t ← teams) [(roster = for (p ← players) where (p.team == t.name) [p])]; for (tr ← teamRosters) where (length(tr.roster) < 9) tr.roster } }

SLIDE 16

Example: Nested query, unnested as SQL

SQL: select p.* from players as p, teams as t where p.team = t.name and ((select count(*) from players as p2 where p2.team = t.name) < 9)

SLIDE 17

Example: Nested query, refactored

fun teamRosters() { for (t ← teams) [(name = t.name roster = for (p ← players) where (p.team == t.team)) [p]]; } fun unusablePlayers() { query { for (tr ← teamRosters() ) where (length(tr.roster) < 9) tr.roster } }

SLIDE 18

Example: Nested query, abstracted

fun unusablePlayers(pred) { query { for (t ← teamRosters() ) where (pred(t.roster)) t.roster } }

SLIDE 19

Example: Nested query, abstracted

fun unusablePlayers(pred) { query { for (t ← teamRosters() ) where (pred(t.roster)) t.roster } } Very hard in SQL

SLIDE 20

Example: Nested query, abstracted, attempted in SQL

select p.* from players as p, teams as t where p.team = t.name and ((select count(*) from players as p2 where p2.team = t.name) < 9)

SLIDE 21

The goal

Write queries in an ordinary programming language.

SLIDE 22

The goal

Write queries in an ordinary programming language.

◮ Allow nested data structures.

SLIDE 23

The goal

Write queries in an ordinary programming language.

◮ Allow nested data structures. ◮ Allow abstraction.

SLIDE 24

The goal

Write queries in an ordinary programming language.

◮ Allow nested data structures. ◮ Allow abstraction. ◮ Statically detect unqueryizable expressions.

SLIDE 25

The goal

Write queries in an ordinary programming language.

◮ Allow nested data structures. ◮ Allow abstraction. ◮ Statically detect unqueryizable expressions.

“Language-integrated query”

SLIDE 26

Previous work, language-integrated query

◮ Libkin and Wong (early 90s):

Theory; proof for first-order unnesting, pure setting, total.

◮ Kleisli, LINQ, Links (originally):

Implemented, impure setting, partial.

◮ Ferry (Grust, et al., 2009):

Implemented, pure setting, total, handles nested final results.

◮ Fegaras (1998):

Theory, higher-order, but no proof of termination.

◮ Van den Bussche (2001):

Theory, nested final results.

SLIDE 27

Example: Nested query with untranslatable fragment

fun teamRosters() { query{ for (t ← teams) [(name = reverse(t.name) # ERROR! roster = for (p ← players) where (p.team == t.team)) [p]]; } }

SLIDE 28

Example: Nested query with untranslatable fragment

fun teamRosters() { query{ for (t ← teams) [(name = reverse(t.name) # ERROR! roster = for (p ← players) where (p.team == t.team)) [p]]; } } Original version of Links, like Kleisli, translates this to an iteration

ver a query—one players query for each row of teams.

SLIDE 29

What I want to show you

SLIDE 30

What I want to show you

How to translate any “pure” expression

f relational type

to an equivalent single SQL query.

SLIDE 31

What I want to show you

How to translate any “pure” expression

f relational type

to an equivalent single SQL query. and How to statically detect whether a designated expression is queryizable.

SLIDE 32

The contribution

Add to language-integrated query

◮ Handling higher-order functions. ◮ Separating query-translatable from non-translatable

sublanguages of a general-purpose language.

◮ Providing a complete translation from the translatable

sublanguage.

SLIDE 33

The Solution

SLIDE 34

Plan

Compile time Run time Type-and-effect check Normalize Translate

SLIDE 35

The source language

(types) T ::=

| (−

− → l : T) | [T]| S

e

→ T (base types)

::=

bool | int | string | · · · (terms) B, L, M, N ::= for (x ← L) M | if B then M else N | table s : T | [M]| []| M ⊎ N | (− − − → l = M) | M.l | LM | λx.N | x | c | length(M) | empty(M) | prim(− → M) | query{M} (atomic effects) E ::= noQ (effect sets) e a set of effects E

SLIDE 36

Type-and-effect checking

SLIDE 37

Typing judgment

Γ ⊢ M : T ! e Γ variable typing environment M expression T type of M e effects (a set) “In environment Γ, evaluating expression M has effects contained in e and results in type T.”

SLIDE 38

Typing rules

Γ ⊢ M : T ! ∅ T has the form [(− − → l : o)] Γ ⊢ query{M} : T ! ∅ (T-Query)

SLIDE 39

Typing rules

prim : S1 × · · · × Sn

e

→ T Γ ⊢ Mi : Si ! ei for each 1 ≤ i ≤ n Γ ⊢ prim(− → M) : T ! e ∪

i ei

(T-Prim)

SLIDE 40

Typing rules

prim : S1 × · · · × Sn

e

→ T Γ ⊢ Mi : Si ! ei for each 1 ≤ i ≤ n Γ ⊢ prim(− → M) : T ! e ∪

i ei

(T-Prim) Side condition: Every primitive must have an SQL equivalent or an effect tag.

SLIDE 41

Typing rules

T has the form [(− − → l : o)] Γ ⊢ (table s : T) : T ! ∅ (T-Table)

SLIDE 42

Typing rules

Γ ⊢ M : [S]! e Γ, x : S ⊢ N : [T]! e′ Γ ⊢ for (x ← M) N : [T]! e ∪ e′ (T-For)

SLIDE 43

Typing rules

Γ ⊢ c : Tc ! ∅ (T-Const) Γ, x : T ⊢ x : T ! ∅ (T-Var) Γ, x : S ⊢ N : T ! e′ Γ ⊢ λx.N : S e′ → T ! ∅ (T-Abs) Γ ⊢ L : S

e

→ T ! e′ Γ ⊢ M : S ! e′′ Γ ⊢ LM : T ! e ∪ e′ ∪ e′′ (T-App)

SLIDE 44

Typing rules

Γ ⊢ M : [T] Γ ⊢ empty(M) : bool (T-Empty) Γ ⊢ M : [T] Γ ⊢ length(M) : int (T-Length)

SLIDE 45

Typing rules

Γ ⊢ []: [T]! ∅ (T-Null) Γ ⊢ M : T ! e Γ ⊢ [M]: [T]! e (T-Singleton) Γ ⊢ M : [T]! e Γ ⊢ N : [T]! e′ Γ ⊢ M ⊎ N : [T]! e ∪ e′ (T-Union)

SLIDE 46

Typing rules

Γ ⊢ Mi : Ti ! ei for each Mi, Ti in − − − → M, T Γ ⊢ (− − − → l = M) : (− − → l : T) !

i ei

(T-Record) Γ ⊢ M : (− − → l : T) ! e (l : T) ∈ (− − → l : T) Γ ⊢ M.l : T ! e (T-Project) Γ ⊢ L : bool ! e Γ ⊢ M : T ! e′ Γ ⊢ N : T ! e′′ Γ ⊢ if L then M1 else M2 : T ! e ∪ e′ ∪ e′′ (T-If) Γ ⊢ M : S

e

→ T e ⊂ e′ Γ ⊢ M : S e′ → T (T-Subsump)

SLIDE 47

What the typing gives us

Any well-typed expression query{M}: has relation type and never executes any primitive that lacks an SQL equivalent.

SLIDE 48

What the typing gives us

Any well-typed expression query{M}: has relation type and never executes any primitive that lacks an SQL equivalent. So, at runtime M can be translated to an SQL query.

SLIDE 49

Run Time

SLIDE 50

Translation

SLIDE 51

Normal forms

(normal forms) V , U ::= V ⊎ U | []| F (comprehension NFs) F ::= for (x ← table s : T) F | Z (comprehension bodies) Z ::= if B then Z else []| [(− − − → l = B)]| table s : T (basic expressions) B ::= if B then B′ else B′′ | empty(V ) | length(V ) | prim(− → B ) | x.l | c

SLIDE 52

Target SQL fragment

Q, R ::= Q union all R | S S ::= select − − − → e as l from − − − → t as x where e e ::= case when e then e′ else e′′ end | c | x.l | e ∧ e′ | ¬e | prim(− → e ) | exists(Q) | count(∗)

SLIDE 53

SQL translation of normal forms

V ⊎ U = V union all U for (x ← table s : T) F = select − − − → e as l from s as x, − − − → t as y where B where (select − − − → e as l from − − − → t as y where B) = F if B then Z else [] = select − − − → e as l from t where B′ ∧ B where (select − − − → e as l from t where B′) = Z table s : [( − − → l : o)]

=

select − − − − → s.l as l from s where true [( − − − → l = B)]

=

select − − − − − → B as l from ∅ where true if B then B′ else B′′ = case when B then B′ else B′′ end empty(V ) = ¬exists(V ) length(V ) = select count(∗) from t where B where (select − − − → e as l from t where B) = V prim( − → B ) = prim( − → B) x.l = x.l c = c

SLIDE 54

Example

for (x ← table s)

for (y ← table t) if (x.c > 0) then [(a = x.a, b = y.b)] else []

=

select x.a as a, y.b as b from s as x, t as y where x.c > 0

SLIDE 55

Normalization

SLIDE 56

Rewriting source expressions

Partial evaluation, with some twists.

(λx.N)M : T

N[M/x]

(abs-β) ( − − − → l = M).l : T

Ml

(record-β) for (x ← [M]) N : T

N[M/x]

(for-β)

SLIDE 57

Rewriting

Simplifying list expressions.

if B then []else []: T

[]

(if-zero) for (x ← []) M : T

[]

(for-zero-src) for (x ← N) []: T

[]

(for-zero-body) for (x ← M1 ⊎ M2) N : T

(for-union-src)

(for (x ← M1) N) ⊎ (for (x ← M2) N) for (x ← M) (N1 ⊎ N2) : T

(for-union-body)

(for (x ← M) N1) ⊎ (for (x ← M) N2) if B then M ⊎ N else []: T

(if-union)

if B then M else []⊎ if B then N else []

SLIDE 58

Rewriting

Rearranging comprehensions, unions, and conditionals.

if B then (for (x ← M) N) else []: T

(if-for)

for (x ← M) (if B then N else []) for (x ← if B then M else []) N : T

(for-if-src)

if B then (for (x ← M) N) else [] for (x ← for (y ← L) M) N : T

(for-assoc)

for (y ← L) (for (x ← M) N) if y ∈ fv(N)

SLIDE 59

Rewriting

Eliminating non-base-type conditionals

(if B then L else L′)M : T

if B then LM else L′M

(app-if) if B then M else N : (− − → l : T)

(−

− − → l = L) (if-record) where Ll = if B then M.l else N.l if B then M else N : [T] (if-split) (if B then M else []) ⊎ (if ¬B then N else []) if N = []

SLIDE 60

Rewriting

Special forms.

empty(M) : bool

empty(for (x ← M) [()])

if M : S and S not a flat relation type (empty-flatten) length(M) : bool

length(for (x ← M) [()])

if M : S and S not a flat relation type (length-flatten) query{M} : T

M

(ignore-query)

SLIDE 61

Normal forms

◮ These rules are strongly normalizing.

SLIDE 62

Normal forms

◮ These rules are strongly normalizing. ◮ If query{M} typechecks, and substitution σ closes M, then

the normal form of Mσ lies in the SQL-like sublanguage.

SLIDE 63

Adding Recursion

Add a fixpoint operator letrec which produces a recursive function annotated with an effect rec: Γ, f : S

{rec}∪e

− → T, x : S ⊢ M : T ! e Γ, f : S

{rec}∪e

− → T ⊢ N : U ! e′ Γ ⊢ letrec f x = M in N : U ! e′ (T-LetRec) If a recursive function is applied anywhere within a query expression, type-checking will fail.

SLIDE 64

The whole process

,

SLIDE 65

The message

SQL queries can be written with a natural query construct and integrated with a general-purpose programming language, allowing more flexibility and robustness.

SLIDE 66

The message

SQL queries can be written with a natural query construct and integrated with a general-purpose programming language, allowing more flexibility and robustness.