Wentao Wu1,2, Yun Chi2, Hakan Hacigumus2, Jeffrey Naughton1
1Dept of Computer Sciences, University of Wisconsin-Madison 2NEC Laboratories America
1
Background Database as a service (DaaS) User Service Provider - - PowerPoint PPT Presentation
Wentao Wu 1,2 , Yun Chi 2 , Hakan Hacigumus 2 , Jeffrey Naughton 1 1 Dept of Computer Sciences, University of Wisconsin-Madison 2 NEC Laboratories America 1 Background Database as a service (DaaS) User Service Provider Service Level
1Dept of Computer Sciences, University of Wisconsin-Madison 2NEC Laboratories America
1
Admission control, query scheduling, progress monitoring,
2
Database
Service Level Agreement (SLA)
Standalone workloads [ICDE’09, ICDE’12, VLDB’12, ICDE’13] Concurrent but static workloads [EDBT’11, SIGMOD’11]
Dynamic: queries are not known a priori.
3
4
At time ti, predict the (remaining) execution time for each query in the mix.
(a) At time t1 (b) At time t2 (c) At time t3
5
Even if the query is running together with other queries
Cost Unit Value cs: seq_page_cost 1.0 cr: rand_page_cost 4.0 ct: cpu_tuple_cost 0.01 ci: cpu_index_tuple_cost 0.005 co: cpu_operator_cost 0.0025
𝐷 = 𝑜𝑡𝑑𝑡 + 𝑜𝑠𝑑𝑠 + 𝑜𝑢𝑑𝑢 + 𝑜𝑗𝑑𝑗 + 𝑜𝑝𝑑𝑝
Wentao Wu, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton, Predicting query execution time: are optimizer cost models really unusable? In ICDE, 2013.
A phase = an operator? Pipelining of operators => interleaved phases!
6
Time Scan B Scan A Scan B q1 q2 t1 t2 t3 t4 t5
7
multiple stages of mixes of pipelines
Time P31 P32 P21 P22 P23 P11 P12 P13 s0 f21 f11 f22 f12 f31 f23 f32 f13 q1 q2 q3
8
Separate the modeling of CPU and I/O interactions.
If 𝑛 ≥ 𝑜, then 𝑑𝑑𝑞𝑣 = 𝜐 (same as the standalone case). If 𝑛 < 𝑜, then 𝑑𝑑𝑞𝑣 =
𝑜 𝑛 ⋅ 𝜐, assuming fair sharing.
Use machine learning.
9
Assume that all the queries are known beforehand. Run sample mixes and train a regression model. Apply to static workloads (e.g., report generation).
We do not know all the queries to be run.
10
scan type = scan operator + table name (e.g., index scan over orders)
11
NB: Additional I/O’s (e.g., from hash-joins) => Additional scans
Infinitely many unknown queries/query mixes
12
per visit of a customer.
The buffer pool cannot be modeled as a service center.
For the “clock” algorithm used by PostgreSQL
13
14
TPC-H2: 12 templates (Q7, 8, 9 are more expensive) MB1: heavy index scans with different data sharing rate.
15
Predict its time by using the single-query predictor.
16
17
18
need to calibrate the optimizer’s cost model.
needs a training stage (usually 2 days)
needs to evaluate the analytic models (usually < 120 ms)
19
publish a technique to predict query execution times for
workloads that are both concurrent and dynamic;
present a systematic exploration of its performance.
20
21
22
23
The example query plan contains 3 pipelines with the execution order: P1P2P3.
Tables: Students (sid, sname) Enroll (sid, cid, grade) SELECT S.sname, AVG (grade) AS gpa FROM Students S, Enroll E WHERE S.sid = E.sid GROUP BY S.sname GroupAgg MergeJoin Sort Sort SeqScan SeqScan Students Enroll P1 P2 P3
𝑙𝜐𝑙 𝑘≠𝑛
Residence Time Service Time Queueing Time
𝑅𝑙,𝑘 = 𝑊
𝑙,𝑘𝑆𝑙,𝑘
σ𝑗=1
𝐿
𝑊
𝑗,𝑘𝑆𝑗,𝑘
𝑍
𝑙 = 1
𝐷𝑙 𝜍4.464(𝐷𝑙
0.676−1)
𝜍𝑙 = 𝜐𝑙 𝐷𝑙
𝑘=1 𝑁
𝑊
𝑙,𝑘
σ𝑗=1
𝐿
𝑊
𝑗,𝑘𝑆𝑗,𝑘
(Queue Length) (Correction Factor, Yk = 1 if Ck = 1) (Utility)
24
The buffer pages are organized in a circular queue. On a buffer miss, the clock pointer scans the pages and
If a page has a count greater than 0, then the count is
On a buffer hit, the counter of the page is reset to its
25
𝑞=1 𝑄
𝑇𝑞 1 − 1 1 + 𝑜0 𝑛 𝑠
𝑞
𝑇𝑞
𝐽𝑞+1
− 𝐶 = 0
Model the “clock” algorithm by using a Markov chain.
(steady-state condition) 𝑂𝑞 = 𝑇𝑞 1 − 1 1 + 𝑜0 𝑛 𝑠
𝑞
𝑇𝑞
𝐽𝑞+1
ℎ𝑞 = 𝑂𝑞 𝑇𝑞 (# pages in the buffer) (buffer hit rate) 𝑛𝑞 = 1 − ℎ𝑞 = 1 + 𝑜0 𝑛 𝑠
𝑞
𝑇𝑞
𝐽𝑞+1 −1
(buffer miss rate)
expected # accesses to a page in the partition p
26
TPC-H1: 9 light to moderate TPC-H query templates TPC-H2: TPC-H1 + 3 more expensive templates (Q7, 8, 9) Create query mixes with Latin Hypercube Sampling (LHS).
27
MB1: mixes of heavy index scans with different data
MB2: mixes mingled with both sequential scans and
MB3: similar to MB2, but we replace the scans with real
28
29
30
31
32
33
34
35
36