1 340151 Big Data & Cloud Computing (P. Baumann)
Evaluation of Join Operations
Ramakrishnan/Gehrke Chapter 14, Part A (Joins)
Evaluation of Join Operations Ramakrishnan/Gehrke Chapter 14, Part A - - PowerPoint PPT Presentation
Evaluation of Join Operations Ramakrishnan/Gehrke Chapter 14, Part A (Joins) 340151 Big Data & Cloud Computing (P. Baumann) 1 Relational Operations: Join Definition Natural join: R S := L ( C ( R S ) ) Where C:
1 340151 Big Data & Cloud Computing (P. Baumann)
Ramakrishnan/Gehrke Chapter 14, Part A (Joins)
2 340151 Big Data & Cloud Computing (P. Baumann)
C( R S ) )
"R.x=S.x and…" in C
3 340151 Big Data & Cloud Computing (P. Baumann)
C( R S )
{ =, , >, , <, }
C
x y
4 340151 Big Data & Cloud Computing (P. Baumann)
5 340151 Big Data & Cloud Computing (P. Baumann)
SELECT * FROM Reserves R, Sailors S WHERE R.sid=S.sid
6 340151 Big Data & Cloud Computing (P. Baumann)
M = 1000; pR = 100
N = 500; pS = 80
7 340151 Big Data & Cloud Computing (P. Baumann)
clustering:
1 I/O for all tuples (typical), unclustered: up to 1 I/O per matching S tuple
8 340151 Big Data & Cloud Computing (P. Baumann)
1.2 I/Os to get data entry in index + 1 I/O to get (the exactly one) matching Sailors tuple
1.2 I/Os to find index page with data entries + cost of retrieving matching Reserves tuples (*)
Cost is 1 or 2.5 I/Os, depending on whether index is clustered
SELECT * FROM Reserves R, Sailors S WHERE R.sid=S.sid
9 340151 Big Data & Cloud Computing (P. Baumann)
R & S
Hash table for block of R (k < B-1 pages) Input buffer for S Output buffer
Join Result
Earlier: block = page Now: block = sequence of pages
10 340151 Big Data & Cloud Computing (P. Baumann)
11 340151 Big Data & Cloud Computing (P. Baumann)
12 340151 Big Data & Cloud Computing (P. Baumann)
using sorting refinement that produces runs of length 2B in Pass 0: #runs of each relation is < B/2
read+write each relation in Pass 0 + read each relation in (only) merging pass (+ writing of result tuples)