A Sparse Tensor Format and a Benchmark Suite
Jiajia Li
Pacific Northwest National Laboratory
January 25, 2019 @ MIT
Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores
A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific - - PowerPoint PPT Presentation
A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific Northwest National Laboratory January 25, 2019 @ MIT Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores HiCOO: Hierarchical Storage of Sparse
Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores
1 Georgia Institute of Technology 2 Pacific Northwest National Laboratory
Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores
SUNLAB
The concept “mode-genericity” is inherited from [Baskaran et al. 2012]. [Baskaran et al. 2012] M. Baskaran et al., “Efficient and scalable computations with sparse tensors,” HPEC2012
(a) COO i j k val 1 1 2 1 3 1 2 4 2 1 5 2 2 2 6 3 1 3 3 2 7 8 (b) CSF i j k val 1 1 2 1 3 2 4 2 1 5 1 3 3 2 2 2 6 7 8 (c) F-COO bf j k val 1 1 1 2 1 3 2 4 1 1 5 2 2 6 1 1 3 2 7 8 sf[0]=1 sf[1]=1
Mode-Generic Mode-Specific
prefer different representations for different modes.
i = 1,…,I j = 1 , … , J k = 1,…,K
4 4 3
i j k val 1 1 2 1 3 2 4 2 1 5 1 3 3 2 2 2 6 7 8
j i k val 1 1 3 1 4 2 2 2 5 3 3 2 2 2 6 7 8 2 1 1 3 k i j val 1 1 2 2 1 3 2 2 4 6 1 3 3 3 8 5 7 1 1 2
Tensor Decomposition Kernel in Mode-1 Kernel in Mode-2 Kernel in Mode-3
i j k val 1 1 2 1 3 2 4 2 1 5 1 3 3 2 2 2 6 7 8
Tensor Decomposition Kernel in Mode-1 Kernel in Mode-2 Kernel in Mode-3
Mode-1 oriented (CSF/FCOO) Coordinate (COO)
Kernel in Mode-1 Kernel in Mode-2 Kernel in Mode-3
HiCOO Efficient In-efficient
i = 1,…,I j = 1,…,J k = 1,…,K
ek val
ei ej bi bj bk bptr
i j k val
Extension from Compressed Sparse Blocks (CSB) format by Buluc et al. SPAA. 2009.
i = bi * B + ei 32-bit 8-bit 32-bit block indices element indices
ek val
ei ej bi bj bk bptr
i j k val
i = 1,…,I j = 1,…,J k = 1,…,K
ek val
ei ej bi bj bk bptr
i j k val
8-bit 32-bit block indices element indices 32-bit
i = bi * B + ei 32-bit 8-bit 32-bit COO indices: = nnz * 3 * 32 HiCOO indices: = nnz * 3 * 8 + nnb * (3 * 32 + 32) block indices element indices
ek val
ei ej bi bj bk bptr
i j k val
nnz: #Nonzeros; nnb: #Non-zero blocks
32-bit 8-bit 32-bit
ek val
ei ej bi bj bk bptr
i j k val
3D 4D
choa choa darpa darpa darpa deli deli deli fb−m fb−m fb−m fb−s fb−s fb−s nell1 nell1 nell1 nell2 nell2 nell2
COO CSF−1 HiCOO
crime crime deli4d deli4d deli4d enron enron enron flickr flickr flickr nips
nips
nips
COO CSF−1 HiCOO 0.25 0.50 1.00 2.00 4.00 1 2 4 1 2 4
Compression ratio relative to CSF (higher is better) Speedup over CSF (higher is better)
1 Pacific Northwest National Laboratory 2 Hangzhou Dianzi University 3 Virginia Tech Figure sources: “A brief survey of tensors” by Berton Earnshaw and NVIDIA Tensor Cores
Data Structures/ Algorithms
(Element-Wise)
(Tensor-scalar)
(Tensor-Times- Vector)
(Tensor-Times- Matrix)
(Matriced Tensor-Times- Khatri-Rao Product)
Data Structures/ Algorithms
(Element-Wise)
(Tensor-scalar)
(Tensor-Times- Vector)
(Tensor-Times- Matrix)
(Matriced Tensor-Times- Khatri-Rao Product)
Arbitrary shape and nonuniform nonzero pattern
Data Structures/ Algorithms
(Element-Wise)
(Tensor-scalar)
(Tensor-Times- Vector)
(Tensor-Times- Matrix)
(Matriced Tensor-Times- Khatri-Rao Product)
Parallelize nonzeros Parallelize nonzero fibers Parallelize nonzeros with atomics Parallelize nonzero partitions