Deciding Equivalences among Aggregate Queries W erner Nutt - - PDF document

deciding equivalences among aggregate queries w erner
SMART_READER_LITE
LIVE PREVIEW

Deciding Equivalences among Aggregate Queries W erner Nutt - - PDF document

Deciding Equivalences among Aggregate Queries W erner Nutt German Resea rch Center fo r AI (DFKI) Saa rb r uck en, Germany Y ehoshua Sagiv, Sa ra Shurin The Heb rew Universit y Jerusalem, Israel Deciding


slide-1
SLIDE 1 Deciding Equivalences among Aggregate Queries W erner Nutt German Resea rch Center fo r AI (DFKI) Saa rb r uck en, Germany Y ehoshua Sagiv, Sa ra Shurin The Heb rew Universit y Jerusalem, Israel Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 1
slide-2
SLIDE 2 Ackno wledgement
  • This
w
  • rk
  • riginated
within the ESPRIT Long T erm Resea rch Project "F
  • undations
  • f
Data W a rehouse Qualit y" (D W Q) Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 2
slide-3
SLIDE 3 Motivation In recent y ea r, increased interest in
  • ptimization
  • f
aggregate queries
  • data
w a rehousing
  • decision
supp
  • rt
Aggregate queries a re costly
  • they
touch many data items ; need fo r sp ecialized
  • ptimizati
  • n
techniques Idea: Use p revious results to answ er new queries
  • exploit
redundancy!
  • create
redundancy! T
  • do
so, w e have to b e able to answ er the question: \What can b e computed from what?" (= the view usabilit y p roblem) Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 3
slide-4
SLIDE 4 Decision Supp
  • rt
Queries Pr
  • duct
Region Sales This Month Gr
  • wth
in Sales vs. Last Month Sales as %
  • f
Ca tegor y Change in Sales as %
  • f
Ca tegor y vs. Last Month F ramis Cen tral 110 12% 31% 3% F ramis Eastern 179
  • 3%
28%
  • 1%
F ramis W estern 55 5% 12% 1% T
  • tal
F r amis 344 6% 33% 1% Widget Cen tral 66 2% 18% 2% Widget Eastern 102 4% 12% 5% Widget W estern 39
  • 9%
9%
  • 1%
T
  • tal
Widget 207 1% 13% 4% Grand T
  • tal
551 4% 20% 2% Example
  • f
a business rep
  • rt.
Exceptionally high v alues are mark ed with an asterisk (). Exceptionally lo w v alues are sho wn as b
  • ld.
(The example is tak en from R. Kim ball, The Data W arehouse T
  • lkit,
Addison W esley) An SQL-query fo r the rst column: Select p.product
  • na
me as Product, m.region name as Region, sum(f.sal es) as Sales This Month From sales fact f, product p, market m, time t Where f.product key = p.product key, f.market key = m.market key, f.time key = t.time key, p.product name in ('Framis', 'Widget' ), t.month = 'May', t.year = 1996 Groupby p.product name, m.region name Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 4
slide-5
SLIDE 5 Aggregate Queries: Abstract Notation SQL notation: Select p.A, s.B, max(r.C), sum(s.D), count(*) From p, r, s Where p.Z = r.Z, p.A = s.A, r.W = 'Joe', s.B < 10 Groupby p.A, s.B Abstract notation: q (A; B ; max (C ); sum(D ); coun t ) s(A; B ; D ) & p(A; Z ) & r (Z ; C ; W ) & W = Joe & B
  • 10
In general: q (x 1 ; : : : ; x m ;
  • 1
(y 1 ); : : : ;
  • n
(y n )) R & C Sho rt: q (
  • x
;
  • (
  • y
)) R & C , with R conjunction
  • f
relational atoms C conjunction
  • f
compa risons Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 5
slide-6
SLIDE 6 The View Usabilit y Problem Given views v i (
  • x
i ;
  • i
(
  • y
i )) R i & C i and a query q (
  • x
;
  • (
  • y
)) R & C ; is there a query ~ q (
  • x;
  • (
  • y
)) ~ R & ~ C ; such that
  • R
consists
  • f
instantiations
  • f
the v i
  • q
and ~ q a re equivalent (i.e., q and ~ q p ro duce the same results
  • ver
all databases) ; W e need a syntactic cha racterization
  • f
equivalence! Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 6
slide-7
SLIDE 7 Dimensions
  • f
the Problem q (
  • x
;
  • (
  • y
) ) R & C
  • Which
aggregate functions? { min, max { coun t { sum { coun t distinct { . . .
  • Queries
{ without compa risons { with compa risons
  • ver
the:
  • rationals,
  • integers
Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 7
slide-8
SLIDE 8 Previous W
  • rk
  • Equivalence
  • f
conjunctive queries Chandra/Merlin 1977, Klug 1988
  • View
usabilit y fo r conjunctive queries Levy et al. 1995
  • Containment
and equivalence
  • f
conjunc- tive queries under bag-semantics Chaudhuri/V a rdi 1993
  • Equivalence
p reserving transfo rmations
  • f
aggregate queries Levy/Mumick 1994, Gupta et al. 1995
  • View
usabilit y fo r aggregate queries (su- cient criteria) Srivastava et al. 1996
  • View
usabilit y fo r data cub es Ha rina ra y an et al. 1996, Gupta et al. 1997 ; (Almost) no complete cha racterizations fo r aggregate queries! Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 8
slide-9
SLIDE 9 Decouple Aggregations Observation: In q (Pro d ; max (Sales ); sum (Prot )) R & C , the aggregates max (Sales ), sum (Prot ), a re functionally dep endent
  • n
Pro d . Denition: q (
  • x
;
  • j
(y j )) R & C is the j
  • th
k ernel
  • f
q (
  • x
;
  • 1
(y 1 ); : : : ;
  • n
(y n )) R & C . Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 9
slide-10
SLIDE 10 Divide and Conquer Theo rem: q (
  • x
;
  • 1
(y 1 ); : : : ;
  • n
(y n )) q (
  • x
;
  • 1
(y 1 ); : : : ;
  • n
(y n )) a re equivalent if and
  • nly
if their k ernels q j (
  • x;
  • j
(y j )) q j (
  • x;
  • j
(y j )) a re pairwise equivalent fo r all j 2 1::n. ; it suces to solve the equivalence p roblem fo r queries with a single aggregate term q (
  • x
; (y )) R & C (simple aggregate queries). Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 10
slide-11
SLIDE 11 Aggregate Queries and Conjunctive Queries The co re
  • f
q (
  • x
; (y )) R & C is the conjunctive query
  • q
(
  • x
; y ) R & C : Examples:
  • The
co re
  • f
q (
  • x
; sum (y )) R & C is
  • q
(
  • x;
y ) R & C :
  • The
co re
  • f
q (
  • x;
coun t ) R & C is
  • q
(
  • x
) R & C : Strategy: Reduce equivalence
  • f
simple aggre- gate queries to p rop erties
  • f
their co res. Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 11
slide-12
SLIDE 12 Reminder
  • n
Conjunctive Queries
  • Conjunctive
queries have the fo rm q (
  • x
) relational atom with va riables R conjunction
  • f
relational atoms & C conjunction
  • f
compa risons
  • q
D := the result
  • f
q
  • ver
database D
  • q
and q a re equivalent (written q
  • q
) i q D = q 0D fo r all db's D
  • q
is contained in q (written q
  • q
) i q D
  • q
0D fo r all db's D ; Ho w can w e check containment? Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 12
slide-13
SLIDE 13 Query Homomo rphisms An homomo r phism from q (
  • x
) R & C to q (
  • x
) R & C is a substitution
  • such
that
  • x
=
  • x
  • R
  • R
  • C
j =
  • C
. Theo rem (Chandra/Merlin 77): F
  • r
relational conjunctive queries: q
  • q
, there is an homomo rphism from q to q Finding an homomo rphism is NP-complete! Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 13
slide-14
SLIDE 14 Containment and Compa risons Classical example: q p(u; v ) & u
  • v
q p(y ; z ) & p(z ; y ) W e have q
  • q
, but no homomo rphism from q to q . Idea: replace q with its linea r expansion (q L ) L ! q fy <z g p(y ; z ) & p(z ; y ) & y < z q fy =z g p(y ; z ) & p(z ; y ) & y = z q fy >z g p(y ; z ) & p(z ; y ) & y > z (case analysis) Theo rem (Klug 88): q
  • q
, fo r every q L in (q L ) L , there is an homomo rphism from q to q L Containment with compa risons is
  • P
2
  • complete.
(van der Meyden) Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 14
slide-15
SLIDE 15 Relational Max-Queries q (
  • x
; max (y )) R
  • q
(
  • x;
max (y )) R ? Theo rem: F
  • r
relational max-queries: q
  • q
, the co res
  • q
and
  • q
a re equivalent Relational queries deliver the same max
  • nly
if they deliver the same values! Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 15
slide-16
SLIDE 16 Max-Queries with Compa risons The theo rem fails if there a re compa risons. q (max (y )) p(y ) & p(z ) & z < y q (max (y )) p(y ) & p(z 1 ) & p(z 2 ) & z 1 < z 2 Observations:
  • q
returns all elements
  • f
p, but the least
  • q
returns all elements
  • q
and
  • q
return answ ers if p has at least t w
  • elements
) the max-queries a re equivalent. Which p rop ert y
  • f
co res entails equivalence
  • f
max-queries? Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 16
slide-17
SLIDE 17 Dominance Denition:
  • q
is dominated b y
  • q
i fo r every db, if
  • q
returns (
  • d;
d), then
  • q
returns some (
  • d
; d ) such that d
  • d
. Prop
  • sition:
F
  • r
a rbitra ry max-queries: q
  • q
, the co res
  • q
and
  • q
dominate each
  • ther
; Ho w can w e check dominance? Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 17
slide-18
SLIDE 18 Dominance Mappings A dominance mapping
  • from
  • q
(
  • x
; y ) R & C to
  • q
(
  • x
; y ) R & C is lik e a homomo rphism, except that
  • C
j = y
  • (y
) instead
  • f
y =
  • (y
). Theo rem: q is dominated b y q , fo r every q L in the lin. expansion
  • f
q , there is a dominance mapping from q to q L Co rolla ry: Equivalence
  • f
max-queries is
  • P
2
  • complete.
Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 18
slide-19
SLIDE 19 Relational Count-Queries q (
  • x
; coun t ) R
  • q
(
  • x;
coun t ) R ?
  • q
(
  • x
) and
  • q
(
  • x
) return the same results with the same multipli ci ti es
  • ver
every db, i.e.,
  • q
(
  • x
) and
  • q
(
  • x
) a re bag-set-equivalent Theo rem (Chaudhuri/V a rdi 93): If
  • q
,
  • q
a re relational queries, then
  • q
and
  • q
a re bag-set-equivalent ,
  • q
and
  • q
a re isomo rphic I.e.,
  • q
and
  • q
a re the same, up to renaming
  • f
va riables. Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 19
slide-20
SLIDE 20 Count-Queries with Compa risons The theo rem fails if there a re compa risons. q (coun t ) p(x) & p(y ) & p(z ) & x < y & x < z q (coun t ) p(x) & p(y ) & p(z ) & x < z & y < z Observations:
  • q
and q a re not isomo rphic
  • q
and q a re equivalent Ho w can w e check bag-set-equivalence? Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 20
slide-21
SLIDE 21 Isomo rphic Linea r Expansions Let (
  • q
L ) L , (
  • q
M ) M b e linea r expansions
  • f
  • q
,
  • q
. Denition: (q L ) L and (q M ) M a re isomo rphic i
  • there
is a bijection : (q L ) L ! (q M ) M such that
  • q
L and q (L) a re isomo rphic, fo r all L. t q L 1 t q M 1 X X X X X X X X X X X X X X X X X X X z
  • t
q L 2 t q M 2 X X X X X X X X X X X X X X X X X X X z
  • t
q L 3 t q M 3 Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z ~
  • t
q L 4 t q M 4
  • >
  • .
. . . . . t q L k t q M k
  • *
  • Deciding
Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 21
slide-22
SLIDE 22 Equivalence
  • f
General Count-Queries Theo rem: If
  • q
,
  • q
a re a rbitra ry conjunctive queries, then
  • q
and
  • q
a re bag-set-equivalent ,
  • q
and
  • q
have isomo rphic linea r expansions Co rolla ry: The follo wing can b e decided with PSP A CE:
  • Bag-set-equivalence
  • f
conjunctive queries
  • Equivalence
  • f
count-queries Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 22
slide-23
SLIDE 23 Sum-Queries When a re q (
  • x;
sum(y ))
  • q
(
  • x;
sum(y )) ? Bag-set-equivalence
  • f
the co res is a sucient condition. But is it necessa ry? Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 23
slide-24
SLIDE 24 Sum-Queries: Dicult y 1 The co res
  • f
q and q a re not bag-set-equivalent: q (sum(y )) p(1) & p(2) & p(3) & p(y ) & 1
  • y
  • 3
q (sum(y )) p(1) & p(2) & p(3) & p(y ) & 1
  • y
  • 2
& p(z ) & 1
  • z
  • 2
Over the integers, q and q a re equivalent, since 1 + 2 + 3 = 1 + 2 + 1 + 2: Over the rationals, they a re not. Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 24
slide-25
SLIDE 25 Sum-Queries: Dicult y 2 q (sum(y )) p(y ) & < y & p(z ) & < z & p(w ) &
  • w
q (sum(y )) p(y ) &
  • y
& p(z ) & < z & p(w ) &
  • w
  • q
and
  • q
return non-zero numb ers with the same multipli ci t y
  • .
. . but
  • q
ma y return 0, while
  • q
do es not ) q and q a re equivalent, but the co res a re not Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 25
slide-26
SLIDE 26 Equivalence
  • f
Sum-Queries Diculties a rise with compa risons and constants. Theo rem: If q (
  • x;
sum(y )), q (
  • x;
sum(y )) a re sum-queries without compa risons
  • r
without constants, then q and q a re equivalent , the co res
  • q
and
  • q
a re bag-set-equivalent In the general case, the cha racterization is complex. Theo rem: F
  • r
sum-queries with compa risons
  • ver
the integers,
  • r
  • ver
the rationals, equivalence can b e decided with PSP A CE. Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 26
slide-27
SLIDE 27 Linea r Queries A query is linea r if there a re no multiple
  • ccur-
rences
  • f
the same p redicate. Kno wn: Containment
  • f
linea r conjunc- tive queries under set-semantics is PTIME- decidable. W e have generalized this result: Theo rem: F
  • r
linea r queries, the follo wing p roblems a re in PTIME:
  • bag-set-equivalence
  • f
conjunctive queries
  • equivalence
  • f
max-queries
  • equivalence
  • f
count-queries
  • equivalence
  • f
sum-queries Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 27
slide-28
SLIDE 28 Conclusion
  • Complete
cha racterizations fo r the equiva- lence
  • f
aggregate queries with min , max , coun t , and sum .
  • Cha
racterization fo r sp ecial cases
  • f
queries with coun t distinct
  • P
  • lynomialit
y in the case
  • f
linea r queries
  • F
  • undation
fo r solving the view usabilit y p roblem fo r aggregate queries. Deciding Equivalences among Aggregate Queries PODS
  • June
1998 { Slide 28