5/5/2015
FatMan vs. LittleBoy: Scaling up Linear Algebraic Operations in Scale-out Data Platforms
Luna Xu (Virginia Tech)
Seung-Hwan Lim (ORNL) Ali R. Butt (Virginia Tech) Sreenivas R. Sukumar (ORNL) Ramakrishnan Kannan (ORNL)
FatMan vs. LittleBoy: Scaling up Linear Algebraic Operations in - - PowerPoint PPT Presentation
FatMan vs. LittleBoy: Scaling up Linear Algebraic Operations in Scale-out Data Platforms Luna Xu (Virginia Tech) Seung-Hwan Lim (ORNL) Ali R. Butt (Virginia Tech) Sreenivas R. Sukumar (ORNL) Ramakrishnan Kannan (ORNL) 5/5/2015 HPC is used
5/5/2015
Seung-Hwan Lim (ORNL) Ali R. Butt (Virginia Tech) Sreenivas R. Sukumar (ORNL) Ramakrishnan Kannan (ORNL)
2
Source: Stephens, Zachary D. et al. “Big Data: Astronomical or Genomical?” PLoS Biology 13.7 (2015) PMC. Web. 3 Nov. 2016.
4
5
6
GPU GPU GPU GPU GPU GPU GPU
7
Sources: https://www.nextplatform.com/2015/07/13/top-500-supercomputer-list- reflects-shifting-state-of-global-hpc-trends/; Nvidia
8
Sources: https://www.nextplatform.com/2015/07/13/top-500-supercomputer-list- reflects-shifting-state-of-global-hpc-trends/; Nvidia
9
10
11
Sparse Matrix RDD
12
Sparse Matrix RDD
13
Sparse Matrix
RDD
14
Sparse Matrix
RDD
15
16
17
matrix multiplication MLlib selector Scala Impl BLAS enabler netlib-java (JNI) Open BLAS NV BLAS
GPU GPU GPU
JVM native worker
CPU CPU CPU
cuBLASxt
18
19
20
# of Rows (Cols) Density Raw size (GB) 4873 1 0.34 14684 1 3.1 24495 1 8.4 663331 1 77 4873 0.05 0.104 14684 0.05 2.6 24495 0.05 19 663331 0.05 41
21
System Rhea GPU node CPU Dual Intel Xeon E5 CPU cores 28 (56 HT) Memory 1TB GPU Dual NVIDIA K80 GPU cores 4992 x 2 GPU memory 24 x 2 GB CUDA 7.5
22
1 1.2 1.4 1.6 1.8 2 2.2 2.4 4873 14684 24495 66331
Speedup Matrix size OpenBLAS NVBLAS
2.2x
1.5x
baseline
23
10 20 30 40 50 60 70 80 90 100
4873 14684 24495 66331
Time percentage (%) Matrix size GC Shuffle Compute Others
24
10 20 30 40 50 60 70 80 90 100
4873 14684 24495 66331
Time percentage (%) Matrix size GC Shuffle Compute Others
25
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 4873 24495 66331 97708
Speedup Matrix size OpenBLAS NVBLAS
10%
baseline
26
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 4873 24495 66331 97708
Speedup Matrix size OpenBLAS NVBLAS
36.7%
baseline
27
10 20 30 40 50 60 70 80 90 100 4873 14684 24495 66331
Time percentage (%) Matrix size GC Shuffle Compute Others
28