The script-writers dream: How to write great SQL in your own - - PowerPoint PPT Presentation
The script-writers dream: How to write great SQL in your own - - PowerPoint PPT Presentation
The script-writers dream: How to write great SQL in your own language and be sure it will succeed. Ezra Cooper University of Edinburgh August 24, 2009 The Problem The Problem The Problem Problems with SQL: Embedding is usually a
The Problem
The Problem
The Problem
Problems with SQL:
◮ Embedding is usually a pain ◮ Lacks nested data structures ◮ Lacks abstraction
The Problem
Problems with SQL:
◮ Embedding is usually a pain ◮ Lacks nested data structures ◮ Lacks abstraction
The Solution Language-integrated query
Example: Little League database
teams: name Marchmont United Bruntsfield Hustlers Newington Numpties Leith Sluggers players: name team age Sam Marchmont United 17 Ezra Bruntsfield Hustlers 15 Angus Newington Numpties 18 Jeremy Newington Numpties 15 Paidi Leith Sluggers 15 Bob Newington Numpties 14 Lucy Leith Sluggers 17 Hugh Bruntsfield Hustlers 13
Meet the comprehension
for (x ← src) body
Meet the comprehension
for (x ← src) body
Bind x to successive values from src evaluating body for each one; the result is the union of all the collections so produced.
Meet the comprehension
for (x ← src) body
Bind x to successive values from src evaluating body for each one; the result is the union of all the collections so produced. Some syntactic sugar:
for (x ← src) where(cond) body = for (x ← src) (if (cond) then body else [])
Example: A simple query
var players = table “players” : [(age : int, name : string)]; fun overAgePlayers() { query { for (p ← players) where (p.age > 16) [(name = p.name)] } }
Example: A simple query
var players = table “players” : [(age : int, name : string)]; fun overAgePlayers() { query { for (p ← players) where (p.age > 16) [(name = p.name)] } } SQL: select p.name as name from players as p where p.age > 16
Example: Detecting non-queryizable operations
fun overAgePlayersReversed() { query { for (p ← players) where (p.age > 16) [(name = reverse(p.name))] # ERROR! } }
Example: Abstracting the query
fun selectedPlayers(pred) { query { for (p ← players) where (pred(p)) [(name = p.name)] } }
Example: Abstracting the query
fun selectedPlayers(pred) { query { for (p ← players) where (pred(p)) [(name = p.name)] } } SQL ?? select p.name as name from players as p where [ · · · ]
Example: Nested query
fun unusablePlayers() { query { var teamRosters : [(roster : [Player])]= for (t ← teams) [(roster = for (p ← players) where (p.team == t.name) [p])]; for (tr ← teamRosters) where (length(tr.roster) < 9) tr.roster } }
Example: Nested query, unnested as SQL
SQL: select p.* from players as p, teams as t where p.team = t.name and ((select count(*) from players as p2 where p2.team = t.name) < 9)
Example: Nested query, refactored
fun teamRosters() { for (t ← teams) [(name = t.name roster = for (p ← players) where (p.team == t.team)) [p]]; } fun unusablePlayers() { query { for (tr ← teamRosters() ) where (length(tr.roster) < 9) tr.roster } }
Example: Nested query, abstracted
fun unusablePlayers(pred) { query { for (t ← teamRosters() ) where (pred(t.roster)) t.roster } }
Example: Nested query, abstracted
fun unusablePlayers(pred) { query { for (t ← teamRosters() ) where (pred(t.roster)) t.roster } } Very hard in SQL
Example: Nested query, abstracted, attempted in SQL
select p.* from players as p, teams as t where p.team = t.name and ((select count(*) from players as p2 where p2.team = t.name) < 9)
The goal
Write queries in an ordinary programming language.
The goal
Write queries in an ordinary programming language.
◮ Allow nested data structures.
The goal
Write queries in an ordinary programming language.
◮ Allow nested data structures. ◮ Allow abstraction.
The goal
Write queries in an ordinary programming language.
◮ Allow nested data structures. ◮ Allow abstraction. ◮ Statically detect unqueryizable expressions.
The goal
Write queries in an ordinary programming language.
◮ Allow nested data structures. ◮ Allow abstraction. ◮ Statically detect unqueryizable expressions.
“Language-integrated query”
Previous work, language-integrated query
◮ Libkin and Wong (early 90s):
Theory; proof for first-order unnesting, pure setting, total.
◮ Kleisli, LINQ, Links (originally):
Implemented, impure setting, partial.
◮ Ferry (Grust, et al., 2009):
Implemented, pure setting, total, handles nested final results.
◮ Fegaras (1998):
Theory, higher-order, but no proof of termination.
◮ Van den Bussche (2001):
Theory, nested final results.
Example: Nested query with untranslatable fragment
fun teamRosters() { query{ for (t ← teams) [(name = reverse(t.name) # ERROR! roster = for (p ← players) where (p.team == t.team)) [p]]; } }
Example: Nested query with untranslatable fragment
fun teamRosters() { query{ for (t ← teams) [(name = reverse(t.name) # ERROR! roster = for (p ← players) where (p.team == t.team)) [p]]; } } Original version of Links, like Kleisli, translates this to an iteration
- ver a query—one players query for each row of teams.
What I want to show you
What I want to show you
How to translate any “pure” expression
- f relational type
to an equivalent single SQL query.
What I want to show you
How to translate any “pure” expression
- f relational type
to an equivalent single SQL query. and How to statically detect whether a designated expression is queryizable.
The contribution
Add to language-integrated query
◮ Handling higher-order functions. ◮ Separating query-translatable from non-translatable
sublanguages of a general-purpose language.
◮ Providing a complete translation from the translatable
sublanguage.
The Solution
Plan
Compile time Run time Type-and-effect check Normalize Translate
The source language
(types) T ::=
- | (−
− → l : T) | [T]| S
e
→ T (base types)
- ::=
bool | int | string | · · · (terms) B, L, M, N ::= for (x ← L) M | if B then M else N | table s : T | [M]| []| M ⊎ N | (− − − → l = M) | M.l | LM | λx.N | x | c | length(M) | empty(M) | prim(− → M) | query{M} (atomic effects) E ::= noQ (effect sets) e a set of effects E
Type-and-effect checking
Typing judgment
Γ ⊢ M : T ! e Γ variable typing environment M expression T type of M e effects (a set) “In environment Γ, evaluating expression M has effects contained in e and results in type T.”
Typing rules
Γ ⊢ M : T ! ∅ T has the form [(− − → l : o)] Γ ⊢ query{M} : T ! ∅ (T-Query)
Typing rules
prim : S1 × · · · × Sn
e
→ T Γ ⊢ Mi : Si ! ei for each 1 ≤ i ≤ n Γ ⊢ prim(− → M) : T ! e ∪
i ei
(T-Prim)
Typing rules
prim : S1 × · · · × Sn
e
→ T Γ ⊢ Mi : Si ! ei for each 1 ≤ i ≤ n Γ ⊢ prim(− → M) : T ! e ∪
i ei
(T-Prim) Side condition: Every primitive must have an SQL equivalent or an effect tag.
Typing rules
T has the form [(− − → l : o)] Γ ⊢ (table s : T) : T ! ∅ (T-Table)
Typing rules
Γ ⊢ M : [S]! e Γ, x : S ⊢ N : [T]! e′ Γ ⊢ for (x ← M) N : [T]! e ∪ e′ (T-For)
Typing rules
Γ ⊢ c : Tc ! ∅ (T-Const) Γ, x : T ⊢ x : T ! ∅ (T-Var) Γ, x : S ⊢ N : T ! e′ Γ ⊢ λx.N : S e′ → T ! ∅ (T-Abs) Γ ⊢ L : S
e
→ T ! e′ Γ ⊢ M : S ! e′′ Γ ⊢ LM : T ! e ∪ e′ ∪ e′′ (T-App)
Typing rules
Γ ⊢ M : [T] Γ ⊢ empty(M) : bool (T-Empty) Γ ⊢ M : [T] Γ ⊢ length(M) : int (T-Length)
Typing rules
Γ ⊢ []: [T]! ∅ (T-Null) Γ ⊢ M : T ! e Γ ⊢ [M]: [T]! e (T-Singleton) Γ ⊢ M : [T]! e Γ ⊢ N : [T]! e′ Γ ⊢ M ⊎ N : [T]! e ∪ e′ (T-Union)
Typing rules
Γ ⊢ Mi : Ti ! ei for each Mi, Ti in − − − → M, T Γ ⊢ (− − − → l = M) : (− − → l : T) !
i ei
(T-Record) Γ ⊢ M : (− − → l : T) ! e (l : T) ∈ (− − → l : T) Γ ⊢ M.l : T ! e (T-Project) Γ ⊢ L : bool ! e Γ ⊢ M : T ! e′ Γ ⊢ N : T ! e′′ Γ ⊢ if L then M1 else M2 : T ! e ∪ e′ ∪ e′′ (T-If) Γ ⊢ M : S
e
→ T e ⊂ e′ Γ ⊢ M : S e′ → T (T-Subsump)
What the typing gives us
Any well-typed expression query{M}: has relation type and never executes any primitive that lacks an SQL equivalent.
What the typing gives us
Any well-typed expression query{M}: has relation type and never executes any primitive that lacks an SQL equivalent. So, at runtime M can be translated to an SQL query.
Run Time
Translation
Normal forms
(normal forms) V , U ::= V ⊎ U | []| F (comprehension NFs) F ::= for (x ← table s : T) F | Z (comprehension bodies) Z ::= if B then Z else []| [(− − − → l = B)]| table s : T (basic expressions) B ::= if B then B′ else B′′ | empty(V ) | length(V ) | prim(− → B ) | x.l | c
Target SQL fragment
Q, R ::= Q union all R | S S ::= select − − − → e as l from − − − → t as x where e e ::= case when e then e′ else e′′ end | c | x.l | e ∧ e′ | ¬e | prim(− → e ) | exists(Q) | count(∗)
SQL translation of normal forms
V ⊎ U = V union all U for (x ← table s : T) F = select − − − → e as l from s as x, − − − → t as y where B where (select − − − → e as l from − − − → t as y where B) = F if B then Z else [] = select − − − → e as l from t where B′ ∧ B where (select − − − → e as l from t where B′) = Z table s : [( − − → l : o)]
- =
select − − − − → s.l as l from s where true [( − − − → l = B)]
- =
select − − − − − → B as l from ∅ where true if B then B′ else B′′ = case when B then B′ else B′′ end empty(V ) = ¬exists(V ) length(V ) = select count(∗) from t where B where (select − − − → e as l from t where B) = V prim( − → B ) = prim( − → B) x.l = x.l c = c
Example
- for (x ← table s)
for (y ← table t) if (x.c > 0) then [(a = x.a, b = y.b)] else []
- =
select x.a as a, y.b as b from s as x, t as y where x.c > 0
Normalization
Rewriting source expressions
Partial evaluation, with some twists.
(λx.N)M : T
- N[M/x]
(abs-β) ( − − − → l = M).l : T
- Ml
(record-β) for (x ← [M]) N : T
- N[M/x]
(for-β)
Rewriting
Simplifying list expressions.
if B then []else []: T
- []
(if-zero) for (x ← []) M : T
- []
(for-zero-src) for (x ← N) []: T
- []
(for-zero-body) for (x ← M1 ⊎ M2) N : T
- (for-union-src)
(for (x ← M1) N) ⊎ (for (x ← M2) N) for (x ← M) (N1 ⊎ N2) : T
- (for-union-body)
(for (x ← M) N1) ⊎ (for (x ← M) N2) if B then M ⊎ N else []: T
- (if-union)
if B then M else []⊎ if B then N else []
Rewriting
Rearranging comprehensions, unions, and conditionals.
if B then (for (x ← M) N) else []: T
- (if-for)
for (x ← M) (if B then N else []) for (x ← if B then M else []) N : T
- (for-if-src)
if B then (for (x ← M) N) else [] for (x ← for (y ← L) M) N : T
- (for-assoc)
for (y ← L) (for (x ← M) N) if y ∈ fv(N)
Rewriting
Eliminating non-base-type conditionals
(if B then L else L′)M : T
- if B then LM else L′M
(app-if) if B then M else N : (− − → l : T)
- (−
− − → l = L) (if-record) where Ll = if B then M.l else N.l if B then M else N : [T] (if-split) (if B then M else []) ⊎ (if ¬B then N else []) if N = []
Rewriting
Special forms.
empty(M) : bool
- empty(for (x ← M) [()])
if M : S and S not a flat relation type (empty-flatten) length(M) : bool
- length(for (x ← M) [()])
if M : S and S not a flat relation type (length-flatten) query{M} : T
- M
(ignore-query)
Normal forms
◮ These rules are strongly normalizing.
Normal forms
◮ These rules are strongly normalizing. ◮ If query{M} typechecks, and substitution σ closes M, then
the normal form of Mσ lies in the SQL-like sublanguage.
Adding Recursion
Add a fixpoint operator letrec which produces a recursive function annotated with an effect rec: Γ, f : S
{rec}∪e
− → T, x : S ⊢ M : T ! e Γ, f : S
{rec}∪e