A Study on Workload-Aware Wavelet Synopses for Point and Range Sum - PowerPoint PPT Presentation
A Study on Workload-Aware Wavelet Synopses for Point and Range Sum Queries Michael Mathioudakis , mathiou@cs.toronto.edu Dimitris Sacharidis, dsachar@dblab.ntua.gr Timos Sellis, timos@dblab.ntua.gr DOLAP 2006 Outline Introduction
A Study on Workload-Aware Wavelet Synopses for Point and Range Sum Queries Michael Mathioudakis , mathiou@cs.toronto.edu Dimitris Sacharidis, dsachar@dblab.ntua.gr Timos Sellis, timos@dblab.ntua.gr DOLAP 2006
Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results
Introduction • Approximate Query Processing over Synopses: An effective approach to manage large data sets (eg OLAP queries) 1. Query optimization process - Provide highly accurate query selectivity estimates 2. Can be used instead of the actual data - Provide quick approximate answers to large queries • Workload-Awareness: Take user behavior under consideration - More accuracy for important data - workload aware synopses • Histograms, Wavelet Transformation : Commonly Used Synopses construction techniques
Introduction - Our Contribution • Focus on wavelet synopsis construction algorithms • Theoretical presentation of existing algorithms • Presentation of a novel workload-aware algorithm for range- sum queries • Experimental study - Accuracy vs Time Efficiency
Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results
Wavelet Preliminaries • It’s a transformation! +,-.-!/01!.! !# !" !% !& !' !( !) !$ 2!34/4.05647 ! *-4,.8 *# *" *% *& *' *( *) *$ 944:06,/;0<==>0670.?4@A • Histograms: Construct Buckets on Initial Data - Assign one value per bucket Initial Data a1 a2 a3 a4 a5 a6 a7 a8 Bucket 1 Bucket 2 Bucket 3
Wavelet Preliminaries Haar W/T: recursive pairwise calculation of averages and semi- differences (details) 11/4 = (3/2 +4)/2 -5/4 = (3/2 - 4)/2 11/4 pairwise pairwise averages details -5/4 3/2 4 2 1/2 4 1 4 0 0 -1 -1 0 2 2 0 2 3 5 4 4
Wavelet Preliminaries • Initial values can be reconstructed in logarithmic time • Similar values for near data - small details • Coefficients near the root are more important - normalization needed 11/4 O(logN) coeffs + needed + - -5/4 1/2 0 - - + + 0 -1 -1 0 + + + - - - - + 2 2 0 2 3 5 4 4
Wavelet Synopses Keep B coefficients - Dropped coefficients are considered zero - Error introduced to the values of our data 11/4 + + -5/4 - 1/2 0 + + - - 0 -1 -1 0 - - - - + + + + 2 2 0 2 3 5 4 4 2 2 1 1 4 4 4 4 Point Error = 1 Range Sum Error = 1
Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results
Error Metrics • Weighted Error Metrics • For point queries :L wp = Σ i w[i]e[i] p • For range sum queries: L wp = Σ i ≤ j w[i,j]e[i:j] p Initial Values 0 4 2 -2 8 2 3 -1 After Synopsis -1 3 3 -1 3 3 5 1 Point Errors 1 1 -1 -1 5 -1 -2 -2 Range Sum Error(2:5) = 4
Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results
Classic Algorithm • Minimizes L 2 of point errors • Selects the B largest normalized coeffs, using a heap • Complexity: O(N) space, O(N+BlogN) time 11/4 + + - -5/4 1/2 0 + - + - 0 -1 -1 0 + + + - - - - + 2 2 0 2 3 5 4 4
Garofalakis - Kumar • Minimizes Weighted Error Metrics • Dynamic Programming Algorithm on transformation’s tree • Complexity: O(N 2 ) Space, O(N 2 logB) Time Already Kept Coefficients B coefficients available K B-K weights
Matias-Urieli • Minimizes L w2 of point errors • Using a modified Haar wavelet transformation, then apply the classic algorithm • Complexity: O(N) space, O(N+B log N) time Weighted Average Weighted Difference w2 w1
Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results
Matias - Urieli • Minimizes L 2 - Complexity: O(N) space, O(N+BlogN) time • Working with prefix sums has disadvantages: sparse data become dense, difficult to update Haar Transformation Greedily Pick the On The Prefix Sums Largest B Coeffs 2 2 0 4 3 7 5 5 Prefix Sums 2 0 -2 4 -1 4 -2 0 Raw Data
RangeWave range-sum query workload • Minimizes Weighted-L p of range sum queries, that follow a dyadic hierarchy • Workload Aware - Applies on Raw Data Dyadic Ranges Hierarchy Raw Data
RangeWave • A Dynamic Programming Algorithm • Complexity: O(N 2 logB) time, O(N 2 ) space Already Kept Coefficients Compute the error for the corresponding dyadic B coeffs interval available i Weight W[i] B-K coeffs K coeffs Raw Data
Outline • Introduction • Wavelets • Error Metrics • Algorithms for Point Errors • Algorithms for Range Sum Errors • Experimental Results
Algorithms Summary Point Query Workload Algorithm Time Space Optimal Matias - Urieli N+B log N N Yes Garofalakis - N2 log B N2 Yes Kumar Classic Wavelets N+B log N N No Classic N2B NB Yes Histograms Dyadic Range Sum Query Workload Algorithm Time Space Optimal RangeWave N2 log B N2 Yes Koudas- N7B2 N5B Yes Muthukrishnan Only for uniform Matias - Urieli N+B log N N workload Classic N+B log N N No
Experimental Study Point-Query Workloads • Data and Point Workload follow Zipfian distribution • Increasing Synopsis Size • Urieli-Matias provides the best trade-off between accuracy (weighted L 2 error) and running time
Experimental Study Unbiased Dyadic Range Sum Query Workload • RangeWave exhibits significant accuracy gains as the synopsis size increases for this workload • Classic still performs well
Experimental Study Biased Dyadic Range Sum Query Workload • Biased Workload : Assigns more significance to larger range-sum queries • The accuracy of RangeWave is orders of magnitude higher
Conclusions • Point Query Workloads: You Get What You Pay Quadratic algorithms outperform linear ones in accuracy, at a high price • Range Sum Query Workloads: We can do better Find a linear time algorithm for all Range Sum Queries Extend RangeWave to general hierarchy of queries
Thank You
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.