The Glass Half Full
Using Programmable Hardware Accelerators in Analytical Databases
Zsolt István
IMDEA Software Institute
1
The Glass Half Full Using Programmable Hardware Accelerators in - - PowerPoint PPT Presentation
The Glass Half Full Using Programmable Hardware Accelerators in Analytical Databases Zsolt Istvn IMDEA Software Institute 1 IM IMDEA Soft ftware In Institute 16 Faculty in the areas of: Program Analysis and Verification
1
3
4
https://www.statista.com/statistics/810188/worldwide-commercial-database-market-size/
5
“The first goal is to design it with the capability of handling a very large on-line database of 10^10 bytes or beyond since special-purpose machines are not likely to be cost-effective for small databases.”
Jayanta Banerjee, David K. Hsiao, Krishnamurthi Kannan: DBC - A Database Computer for Very Large Databases. IEEE Trans. Computers 28(6): 414-429 (1979)
6
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237
7
CPU Scaling Commodity in Cloud Specialized Hardware Revival
8
ASICs FPGAs CPUs
9
10
Accel. 1) On the side 2) In data-path 3) Co-processor Data Data Data Accel. Accel.
11
Socket1 Socket2 CPU FPGA Socket1 CPU FPGA CPU FPGA Intel Xeon+FPGA Gen.1 Intel Xeon+FPGA Gen.2
12
13
20 40 60 80 100 120 Software With Acceleration
Compute Data Movement
2x
Accel. Data
▪ Can’t always know the number of keys aprioi
14
15
16
17
18
Database Server IBEX
SSD IBEX – An Intelligent Storage Engine with Support for Advanced SQL Off-loading. L. Woods, Z. Istvan and G. Alonso, VLDB’14
→ Larger bandwidth, more IOPS (Samsung YourSQL, MIT BlueDBM)
▪ Opportunity to extend SSDs/Flash with complex offload Samsung “smart” SSD
Workers (Compute) Storage
Caribou: Distributed storage with processing
Zsolt István, David Sidler, Gustavo Alonso: Caribou: Intelligent Distributed Storage. PVLDB 10(11), 2017.
20
Intel Hyperscan library (Xeon E5-2680 v2) 2.8x
[FCCM16] Runtime Parameterizable Regular Expression Operators for Databases. Zs. Istvan, D. Sidler, G. Alonso. FCCM’16
21
▪ select avg(salary) from employees group by department
22
CPU Ibex with SW-only Group-By
Projection Selection Group-by
Final Group s
Input table Filtered data
▪ select avg(salary) from employees group by department
23
CPU Ibex with HW-only Group-By
Projection Selection Group-by
Final Group s
Input table Filtered data
CPU Ibex with HW-only Group-By
Projection Selection Group-by
Final Group s
Input table Filtered data
▪ select avg(salary) from employees group by department
24
CPU Ibex with Hybrid Group-by
Input table Projection Selection Group-by Group-by
Final Group s
Filtered data
Partial Group s
25
[Kara18] Kara et al: ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation. PVLDB 12(4): 348-361 (2018) [Owaida18] Owaida et al: Application Partitioning on FPGA Clusters: Inference over Decision Tree Ensembles. FPL 2018: 295-300
26
CPU FPGA Co-processor
27
Database Engine (MonetDB)
Hardware Operator Software
Software
Hardware Operator Hardware Operator
Hardware Operator
◼ Goal: partition unlabeled data into several clusters, where the number of clusters is the “k” in the k-means. ◼ Two steps in each iteration: ◼ Assignment: assign data points to closet centroid according to distance metric ◼ Centroid update: the centroids are re- calculated by averaging all the data points within each cluster ◼ Long process if the data set and number of iterations are large
28
Receives K-Means parameters
1
Fetch the initial centroids and the data
2 3
Calculates the distance between a data point and all the centroids and assign it to closest centroid
4
Accumulates data points per cluster and counts how many data points are assigned to each cluster Collect partial results from each pipeline
5
Division for updating new centroid
6
Writes back the final results
7 1 2 3 4 5 6 7
Zhenhao He, David Sidler, Zsolt István, Gustavo Alonso: A Flexible K-Means Operator for Hybrid Databases. FPL 2018
29
30
K is known / Centroids known Need to determine K (Elbow method)
31
32
For more details, see: The Glass Half Full: Using Programmable Hardware Accelerators in Analytics. Z. István. IEEE Data Engineering Bulletin, March 2019.