Distributed Query Processing Advanced Topics in Database Management - - PDF document

▶

Feb 24, 2023 27 likes •283 views

Distributed Query Processing Advanced Topics in Database Management (INFSCI 2711) Some materials are from Database System Concepts, Siberschatz, Korth and Sudarshan Vladimir Zadorozhny, DINS, University of Pittsburgh 1 1 Banking Example

SLIDE 1

1

Distributed Query Processing

Vladimir Zadorozhny, DINS, University of Pittsburgh Advanced Topics in Database Management (INFSCI 2711)

Some materials are from Database System Concepts, Siberschatz, Korth and Sudarshan

Banking Example

branch (branch_name, branch_city, assets) customer (customer_name, customer_street, customer_city) account (account_number, branch_name, balance) loan (loan_number, branch_name, amount) depositor (customer_name, account_number) borrower (customer_name, loan_number)

1 2

SLIDE 2

2

Basic Query Processing Architecture

branch(branch-ID, branch-name) account(acc-number,branch-ID,customer-ID) depositor(customer-ID,customer-name,customer-addr) query parser internal representation

ptimizer

execution plan evaluator

utput

data Statist stics about ut data

Find the names of customers having accounts in Brooklyn

select customer-name from branch, account, depositor where branch-name = “Brooklyn” and branch.branch-ID = account.branch-ID and account.customer-ID = depositor.customer-ID)

customer-name((branch-city = “Brooklyn” (branch (account depositor)))

Catalog (metadata repository)

Relational Algebra





−

3 4

SLIDE 3

3

Sailors Database

5 6

SLIDE 4

4

Basic Steps in Query Processing

Parsing and translation translate the query into its internal form. This is then translated into relational algebra. Parser checks syntax, verifies relations Evaluation The query-execution engine takes a query evaluation plan, executes that plan, and returns the answers to the query.

7 8

SLIDE 5

5

Evaluation Plan

An evaluation plan defines exactly what algorithm is used for each

peration, and how the execution of the operations is coordinated.

Measures of Query Cost

Cost is generally measured as total elapsed time for answering query Many factors contribute to time cost

 disk accesses, CPU, or network communication

Typically disk access is the predominant cost in centralized system, and is also relatively easy to estimate. Measured by taking into account Number of seeks * average-seek-cost Number of blocks read * average-block-read-cost Number of blocks written * average-block-write-cost

 Cost to write a block is greater than cost to read a block

– data is read back after being written to ensure that the write was successful For simplicity we just use the number of block transfers from disk

9 10

SLIDE 6

6

Selection Operation

File scan – search algorithms that locate and retrieve records that fulfill a selection condition. Algorithm A1 (linear search). Scan each file block and test all records to see whether they satisfy the selection condition. Cost estimate = br block transfers

br denotes number of blocks containing records from relation r

If selection is on a key attribute, can stop on finding record

 cost = (br /2) block transfers

Linear search can be applied regardless of

 selection condition or  ordering of records in the file, or  availability of indices 11

Selection Operation (Cont.)

A2 (binary search). Applicable if selection is an equality comparison on the attribute on which file is ordered. Assume that the blocks of a relation are stored contiguously Cost estimate (number of disk blocks to be scanned):

 cost of locating the first tuple by a binary search on the blocks

– log2(br)

 If there are multiple records satisfying selection

– Add transfer cost of the number of blocks containing records that satisfy selection condition Index scan – search algorithms that use an index selection condition must be on search-key of index. A3 (primary index on candidate key, equality). Retrieve a single record that satisfies the corresponding equality condition Cost = (hi + 1)

11 12

SLIDE 7

7

Join Operation

Several different algorithms to implement joins Nested-loop join Merge-join … Choice based on cost estimate Examples use the following information Number of records of customer: 10,000 depositor: 5000 Number of blocks of customer: 400 depositor: 100

Nested-Loop Join (Cont.)

In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is

nr  bs + br

block transfers If the smaller relation fits entirely in memory, use that as the inner relation. Reduces cost to br + bs block transfers Assuming worst case memory availability cost estimate is with depositor as outer relation:

 5000  400 + 100 = 2,000,100 block transfers,

with customer as the outer relation

 10000  100 + 400 = 1,000,400 block transfers

If smaller relation (depositor) fits entirely in memory, the cost estimate will be 500 block transfers.

13 14

SLIDE 8

8

Merge-Join

Sort both relations on their join attribute (if not already sorted on the join attributes).

Merge the sorted relations to join them

Join step is similar to the merge stage of the sort-merge algorithm.

Main difference is handling of duplicate values in join attribute — every pair with same value on join attribute must be matched

Merge-Join (Cont.)

Can be used only for equi-joins and natural joins Each block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memory Thus the cost of merge join is: br + bs block transfers + the cost of sorting if relations are unsorted.

15 16

SLIDE 9

9

Query Optimization

A relational algebra expression may have many equivalent expressions E.g., balance2500(balance(account)) is equivalent to

balance(balance2500(account))

Each relational algebra operation can be evaluated using one of several different algorithms Correspondingly, a relational-algebra expression can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation- plan. E.g., can use an index on balance to find accounts with balance < 2500,

r can perform complete relation scan and discard accounts with balance  2500

Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost. Cost is estimated using statistical information from the database catalog

 e.g. number of tuples in each relation, size of tuples, etc.

Pictorial Depiction of Equivalence Rules

17 18

SLIDE 10

10

Multiple Transformations (Cont.)

Join Ordering Example

For all relations r1, r2, and r3, (r1 r2) r3 = r1 (r2 r3 ) (Join Associativity) If r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3 so that we compute and store a smaller temporary relation.

19 20

SLIDE 11

11

Join Ordering Example (Cont.)

Consider the expression customer_name ((branch_city = “Brooklyn”(branch)) (account depositor)) Could compute account depositor first, and join result with branch_city = “Brooklyn” (branch) but account depositor is likely to be a large relation. Only a small fraction of the bank’s customers are likely to have accounts in branches located in Brooklyn it is better to compute branch_city = “Brooklyn” (branch) account first.

Cost Estimation

Cost of each operator computed using statistics of input relations

 E.g. number of tuples, sizes of tuples

Inputs can be results of sub-expressions Need to estimate statistics of expression results To do so, we require additional statistics

 E.g. number of distinct values for an attribute 22

21 22

SLIDE 12

12

Statistical Information for Cost Estimation

nr: number of tuples in a relation r. br: number of blocks containing tuples of r. lr: size of a tuple of r. fr: blocking factor of r — i.e., the number of tuples of r that fit into one block. V(A, r): number of distinct values that appear in r for attribute A; same as the size of A(r). If tuples of r are stored together physically in a file, then:

ú ú ú ú ù ê ê ê ê é

= r f r n r b

Histograms

Histogram on attribute age of relation person

23 24

SLIDE 13

13

Evaluation of Expressions

So far: we have seen algorithms for individual operations Alternatives for evaluating an entire expression tree Materialization: generate results of an expression whose inputs are relations or are already computed, materialize (store) it on

disk. Repeat.

Pipelining: pass on tuples to parent operations even as an

peration is being executed

Materialization

Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level

perations.

E.g., in figure below, compute and store then compute and store its join with customer, and finally compute the projections on customer-name.

) (

2500 account balance



25 26

SLIDE 14

14

Materialization (Cont.)

Materialized evaluation is always applicable Cost of writing results to disk and reading them back can be quite high Our cost formulas for operations ignore cost of writing results to disk, so

 Overall cost = Sum of costs of individual operations +

cost of writing intermediate results to disk

Pipelining

Pipelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next. E.g., in previous expression tree, don’t store result of instead, pass tuples directly to the join.. Similarly, don’t store result of join, pass tuples directly to projection. Much cheaper than materialization: no need to store a temporary relation to disk. Pipelining may not always be possible – e.g., sort. For pipelining to be effective, evaluation algorithms should generate output tuples even as tuples are received for inputs to the operation. Pipelines can be executed in two ways: demand driven (lazy, pull) and producer driven (eager, push).

) (

2500 account balance



27 28

SLIDE 15

15

Distributed Database System

A distributed database system consists of loosely coupled sites that share no physical component Database systems that run on each site are independent of each other Transactions may access data at one or more sites

Distributed Query Processing

For centralized systems, the primary criterion for measuring the cost

f a particular strategy is the number of disk accesses.

In a distributed system, other issues must be taken into account: The cost of a data transmission over the network. The potential gain in performance from having several sites process parts of the query in parallel.

29 30

SLIDE 16

16

Distributed Data Storage

Assume relational data model Replication System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance. Fragmentation Relation is partitioned into several fragments stored in distinct sites Replication and fragmentation can be combined Relation is partitioned into several fragments: system maintains several identical replicas of each such fragment.

Data Replication

A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. Full replication of a relation is the case where the relation is stored at all sites. Fully redundant databases are those in which every site contains a copy of the entire database.

31 32

SLIDE 17

17 Data Replication (Cont.)

Advantages of Replication Availability: failure of site containing relation r does not result in unavailability of r is replicas exist. Parallelism: queries on r may be processed by several nodes in parallel. Reduced data transfer: relation r is available locally at each site containing a replica of r. Disadvantages of Replication Increased cost of updates: each replica of relation r must be updated. Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented.

 One solution: choose one copy as primary copy and apply

concurrency control operations on primary copy

Data Fragmentation

Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r. Horizontal fragmentation: each tuple of r is assigned to one or more fragments Vertical fragmentation: the schema for relation r is split into several smaller schemas All schemas must contain a common candidate key (or superkey) to ensure lossless join property. A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key. Example : relation account with following schema Account = (branch_name, account_number, balance )

33 34

SLIDE 18

18 Horizontal Fragmentation of account Relation

branch_name account_number balance Hillside Hillside Hillside A-305 A-226 A-155 500 336 62 account1 = branch_name=“Hillside” (account ) branch_name account_number balance Valleyview Valleyview Valleyview Valleyview A-177 A-402 A-408 A-639 205 10000 1123 750 account2 = branch_name=“Valleyview” (account )

Vertical Fragmentation of employee_info Relation

branch_name customer_name tuple_id Hillside Hillside Valleyview Valleyview Hillside Valleyview Valleyview Lowman Camp Camp Kahn Kahn Kahn Green deposit1 = branch_name, customer_name, tuple_id (employee_info ) 1 2 3 4 5 6 7 account_number balance tuple_id 500 336 205 10000 62 1123 750 1 2 3 4 5 6 7 A-305 A-226 A-177 A-402 A-155 A-408 A-639 deposit2 = account_number, balance, tuple_id (employee_info )

35 36

SLIDE 19

19 Advantages of Fragmentation

Horizontal: allows parallel processing on fragments of a relation allows a relation to be split so that tuples are located where they are most frequently accessed Vertical: allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed tuple-id attribute allows efficient joining of vertical fragments allows parallel processing on a relation Vertical and horizontal fragmentation can be mixed. Fragments may be successively fragmented to an arbitrary depth.

Data Transparency

Data transparency: Degree to which system user may remain unaware

f the details of how and where the data items are stored in a distributed

system Consider transparency issues in relation to: Fragmentation transparency Replication transparency Location transparency

37 38

SLIDE 20

20

Query Transformation

Translating algebraic queries on fragments. It must be possible to construct relation r from its fragments Replace relation r by the expression to construct relation r from its fragments Consider the horizontal fragmentation of the account relation into account1 =  branch_name = “Hillside” (account ) account2 =  branch_name = “Valleyview” (account ) The query  branch_name = “Hillside” (account ) becomes  branch_name = “Hillside” (account1  account2) which is optimized into  branch_name = “Hillside” (account1)   branch_name = “Hillside” (account2)

Example Query (Cont.)

Since account1 has only tuples pertaining to the Hillside branch, we can eliminate the selection operation. Apply the definition of account2 to obtain  branch_name = “Hillside” ( branch_name = “Valleyview” (account ) This expression is the empty set regardless of the contents of the account relation. Final strategy is for the Hillside site to return account1 as the result of the query.

39 40

SLIDE 21

21

Simple Join Processing

Consider the following relational algebra expression in which the three relations are neither replicated nor fragmented account depositor branch account is stored at site S1 depositor at S2 branch at S3 For a query issued at site SI, the system needs to produce the result at site SI

Possible Query Processing Strategies

Ship copies of all three relations to site SI and choose a strategy for processing the entire result locally at site SI. Ship a copy of the account relation to site S2 and compute temp1 = account depositor at S2. Ship temp1 from S2 to S3, and compute temp2 = temp1 branch at S3. Ship the result temp2 to SI. Devise similar strategies, exchanging the roles S1, S2, S3 Must consider following factors: amount of data being shipped cost of transmitting a data block between sites relative processing speed at each site

41 42

SLIDE 22

22

Semijoin Strategy

Let r1 be a relation with schema R1 stores at site S1 Let r2 be a relation with schema R2 stores at site S2 Evaluate the expression r1 r2 and obtain the result at S1.

1. Compute temp1  R1  R2 (r1) at S1.
2. Ship temp1 from S1 to S2.
3. Compute temp2  r2

temp1 at S2

4. Ship temp2 from S2 to S1.
5. Compute r1

temp2 at S1. This is the same as r1 r2.

Formal Definition

The semijoin of r1 with r2, is denoted by: r1 r2 it is defined by: R1 (r1 r2) Thus, r1 r2 selects those tuples of r1 that contributed to r1 r2. In step 3 above, temp2=r2 r1. For joins of several relations, the above strategy can be extended to a series of semijoin steps.

43 44

SLIDE 23

23 Join Strategies that Exploit Parallelism

Consider r1 r2 r3 r4 where relation ri is stored at site Si. The result must be presented at site S1. r1 is shipped to S2 and r1 r2 is computed at S2: simultaneously r3 is shipped to S4 and r3 r4 is computed at S4 S2 ships tuples of (r1 r2) to S1 as they produced; S4 ships tuples of (r3 r4) to S1 Once tuples of (r1 r2) and (r3 r4) arrive at S1 (r1 r2) (r3 r4) is computed in parallel with the computation of (r1 r2) at S2 and the computation of (r3 r4) at S4.

Heterogeneous Distributed Databases

Many database applications require data from a variety of preexisting databases located in a heterogeneous collection of hardware and software platforms Data models may differ (hierarchical, relational , etc.) Transaction commit protocols may be incompatible Concurrency control may be based on different techniques (locking, timestamping, etc.) System-level details almost certainly are totally incompatible. A multidatabase system is a software layer on top of existing database systems, which is designed to manipulate information in heterogeneous databases Creates an illusion of logical database integration without any physical database integration

45 46

SLIDE 24

24 Approximate Joins

Q1: relate Precipitation and Population in different states

South& Carolina SC Florida FL … … 1.00 0.06

Approximate Joins (cont.)

Q1: relate Precipitation and Population in different areas Q2: relate Precipitation and Temperature in different areas

47 48

SLIDE 25

25 Approximate Joins (cont.)

Q1: relate Precipitation and Population in different areas ? Q2: relate Precipitation and Temperature in different areas ? We need more advanced data linkage methods