Compilation of Query-Rewriting Problems into Tractable Fragments of - - PowerPoint PPT Presentation

compilation of query rewriting problems into tractable
SMART_READER_LITE
LIVE PREVIEW

Compilation of Query-Rewriting Problems into Tractable Fragments of - - PowerPoint PPT Presentation

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic Yolif Arvelo Blai Bonet Mara Esther Vidal Departamento de Computacin Universidad Simn Bolvar Caracas, Venezuela Y. Arvelo, B. Bonet, M. Vidal.


slide-1
SLIDE 1
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 1/19

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic

Yolifé Arvelo Blai Bonet María Esther Vidal Departamento de Computación Universidad Simón Bolívar Caracas, Venezuela

slide-2
SLIDE 2
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 2/19

Introduction

■ We consider the problem of rewriting a query using materialized views ■ This problem appears frequently in the context of Data Integration, Web

Infrastructures and Query Optimization: – [Duschka & Genesereth 1997; Kwok & Weld 1996; Lambrecht, Kambhampati & Gnanaprakasam 1999] – [Levy, Rajaraman & Ordille 1996; Zaharioudakis et al. 2000; Mitra 2001]

■ The problem is in general intractable and existing algorithms do not scale

well even in simple cases

slide-3
SLIDE 3
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 3/19

Data Integration

■ OBJECTIVE: Given a query Q, retrieve all tuples obtainable from the

data sources that satisfy Q

■ Data sources are assumed to be: ◆ Independent (i.e. maintained in a distributed manner) ◆ Described as views (i.e. the Local As View model) ◆ Incomplete

slide-4
SLIDE 4
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 4/19

Data Integration: Example

QUERY: Find round-trip flights that start in the US

slide-5
SLIDE 5
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 5/19

Query Rewriting Problem: Example

QUERY: Find round-trip flights that start in the US Q(x, y) :− flight(x, y), flight(y, x), uscity(x) Data sources modelled as views: national(x1, y1) :− flight(x1, y1), uscity(x1), uscity(y1)

  • neway(x2, y2) :− flight(x2, y2)
  • nestop(x3, z3) :− flight(x3, y3), flight(y3, z3)
slide-6
SLIDE 6
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 6/19

Query Rewriting Problem: Solution

■ ASSUMPTION: Views may be incomplete ■ Then, the solution is the collection of rewritings:

R1(x, y) :− oneway(x, y), oneway(y, x), national(x, w) R2(x, y) :− oneway(x, y), oneway(y, x), national(w, x) R3(x, y) :− national(x, y), national(y, x) R4(x, y) :− oneway(x, y), national(y, x) R5(x, y) :− national(x, y), oneway(y, x)

■ Observe that there is no rewriting using onestop(x, y)

slide-7
SLIDE 7
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 7/19

Query Rewriting Problem: Formal

■ INPUT: A query Q and set of views V = {V1, V2, . . . , Vn} ■ TASK: Find a maximal-contained set of rewritings of Q using the views ■ A rewriting is a query-like expression that refers only to the views ■ ASSUMPTION: Q and Vi are conjunctive queries without arithmetic

predicates

slide-8
SLIDE 8
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 8/19

Related Work: Algorithms

■ Bucket algorithm [Levy & Rajaraman & Ullman 1996] ■ Inverse rules algorithm [Duscka & Genesereth 1997] ■ MiniCon algorithm [Pottinger & Halevy 2001]

slide-9
SLIDE 9
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 9/19

The MiniCon Algorithm [Pottinger & Halevy 2001]

■ Exploit independences to decompose into smaller subproblems and then

combine solutions

■ Solutions to subproblems are called MCDs

MCD View Mapping Covered subgoals M1 national {X → X1, Y → Y1} {0} M2 national {X → Y1, Y → X1} {1} M3 national {X → X1} {2} M4 national {X → Y1} {2} M5

  • neway

{X → X2, Y → Y2} {0} M6

  • neway

{X → Y2, Y → X2} {1}

slide-10
SLIDE 10
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 10/19

The MiniCon Algorithm: How does it work?

■ Generate all MCDs (very expensive since performs blind search) ■ Rewritings generated greedily as combination of MCDs such that: ◆ Cover disjoint subsets of subgoals in the query ◆ Cover all subgoals in the query ■ In the example, combining M3, M5, M6 produces the rewriting:

R1(x, y) :− oneway(x, y), oneway(y, x), national(x, w)

slide-11
SLIDE 11
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 11/19

Our Approach: MCDSAT

■ Given a query Q and a set of views V ■ Build a propositional theory such that its models are in correspondence

with the MCDs

■ Generating MCDs is now a problem of model enumeration ■ Model enumeration can be done with modern SAT techniques that

implement:

◆ Non-chronological backtracking via clause learning ◆ Caching of common subproblems ◆ Heuristics ■ We also extend propositional theory such that its models are in

correspondence with the rewritings

■ We call our approach MCDSAT!!

slide-12
SLIDE 12
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 12/19

Negation Normal Forms (NNF)

■ A formula is in Negation Normal Form (NNF) if constructed from literals

using only conjunctions and disjunctions [Barwise 1977]

■ It can be represented as a rooted DAG whose leaves are literals and

internal nodes are labeled with conjunction or disjunction

  • r
  • r
  • r
  • r
  • r

and and and and and and and and and and

~A ~B B C ~D D ~C A

slide-13
SLIDE 13
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 13/19

Deterministic and Decomposable NNFs (d-DNNFs)

■ Introduced by [Darwiche 2001] ■ A NNF is decomposable if each variable appears at most once below

each conjunct

■ A NNF is deterministic if disjuncts are pairwise logically inconsistent ■ A d-DNNF supports a number of operations in linear time: ◆ satisfiability ◆ clause entailment ◆ model counting ◆ model enumeration (output linear time) ◆ ... ■ Transformation into d-DNNF is intractable in the worst case, but not

necessarily so on average

slide-14
SLIDE 14
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 14/19

Implementation

■ MCDSAT translates QRP into a propositional theory T ■ T is compiled into d-DNNF using Darwiche’s c2d compiler ■ Models are obtained from the d-DNNF and transformed into MCDs or

rewritings

■ c2d and models are off-the-shelf components ■ MCDSAT written in scripting language

slide-15
SLIDE 15
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 15/19

Experimental Study

OBJECTIVE: To study the effect of the query sizes and number of views in the performance of MCDSAT and MiniCon

■ Large benchmark with problems of different sizes and structures ■ Comparison metric: time ■ For lack of space, we only report few instances

slide-16
SLIDE 16
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 16/19

Experimental Results

■ MCD Theory: time to generate MCDs (no combination) ■ Extended Theory: time to generate rewritings ■ Structure: Chain and Star ■ Half distinguished variables ■ Queries of different length ■ Different number of views ■ Each point is average over 10 instances ■ Random instances created with generator of [Afrati, Li & Ullman 2001]

slide-17
SLIDE 17
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 17/19

Experimental Results: MCD Theories

0.1 1 10 100 1000 3 4 5 6 7 8 9 10 time in seconds number of goals in query chain queries / half distinguished vars / 80 views MiniCon McdSat 0.1 1 10 100 1000 3 4 5 6 7 8 9 10 time in seconds number of goals in query star queries / half distinguished vars / 80 views MiniCon McdSat 0.1 1 10 100 1000 10000 20 40 60 80 100 120 140 time in seconds number of views chain queries / half distinguished vars / 8 subgoals MiniCon McdSat 0.1 1 10 100 1000 10000 20 40 60 80 100 120 140 time in seconds number of views star queries / half distinguished vars / 8 subgoals MiniCon McdSat

slide-18
SLIDE 18
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 18/19

Experimental Results: Extended Theories

0.1 1 10 100 1000 3 4 5 6 7 8 9 10 time in seconds number of goals in query chain queries / half distinguished vars / 80 views MiniCon McdSat 0.1 1 10 100 1000 3 4 5 6 7 8 9 10 time in seconds number of goals in query star queries / half distinguished vars / 80 views MiniCon McdSat 0.1 1 10 100 1000 20 40 60 80 100 120 140 time in seconds number of views chain queries / half distinguished vars / 6 subgoals MiniCon McdSat 0.1 1 10 100 1000 20 40 60 80 100 120 140 time in seconds number of views star queries / half distinguished vars / 6 subgoals MiniCon McdSat

slide-19
SLIDE 19
  • Y. Arvelo, B. Bonet, M. Vidal. AAAI-06. July 18th, 2006.

Compilation of Query-Rewriting Problems into Tractable Fragments of Propositional Logic - p. 19/19

Conclusions

■ Proposed a novel method for QRPs using propositional logic which: ◆ Uses off-the-shelf propositional components ◆ It’s easy to implement ◆ Shows improved performance over other methods ■ Thus, the logical approach is not only of scientific interest but

practical too!

■ Similar ideas can be applied to other problems!