Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu - - PowerPoint PPT Presentation

▶

Dec 28, 2022 222 likes •385 views

15-618 Final Project Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu Heran Lin yiminliu@andrew.cmu.edu lin1@andrew.cmu.edu Carnegie Mellon University May 11, 2015 Overview Undirected graph G = ( V , E ) Symmetric

SLIDE 1

15-618 Final Project

Parallel Eigensolver for Graph Spectral Analysis

n GPU

Heran Lin lin1@andrew.cmu.edu Yimin Liu yiminliu@andrew.cmu.edu

Carnegie Mellon University

May 11, 2015

SLIDE 2

Overview

◮ Undirected graph G = (V , E) ◮ Symmetric square matrix M associated with graph G

(adjacency matrix A, graph Laplacian L, etc.)

◮ Eigenvalues of M encodes interesting properties of the graph

Mx = λx

SLIDE 3

Eigendecomposition Overview

◮ Transform M to a symmetric tridiagonal matrix Tm ◮ Calculate eigenvalues of Tm

⇒ Lanczos ⇒ (easy)

SLIDE 4

The Lanczos Algorithm for Tridiagonalization

Tm =

      

α1 β2 β2 α2 ... ... ... βm βm αm

      

1. v0 ← 0, v1 ← norm-1 random vector, β1 ← 0
2. for j = 1, . . . , m

◮ wj ← Mvj ◮ αj ← w⊤

j vj

◮ wj ← wj − αjvj − βjvj−1 ◮ βj+1 ← wj2 ◮ vj+1 ← wj/βj+1

Potential parallelism for CUDA: matrix-vector product, dot-product, SAXPY

SLIDE 5

Challenges

Characteristics of M

◮ Really sparse ◮ Skewed distribution of non-zero elements

◮ Example: power-law node degree distribution in social networks

SLIDE 6

Compressed Sparse Row (CSR) Matrix-Vector Multiplication (SPMV)

= × Row 0 Row 1 Row 2 · · · column index

SLIDE 7

Naive Work Assignment

Thread 0 Thread 1 Thread 2 Row 0 Row 1 Row 2 · · · Row 0 Result

◮ Each thread is responsible for one row ◮ Work imbalance issues

SLIDE 8

Warp-based Work Assignment

Warp 0 Warp 1 Warp 2 Row 0 Row 1 Row 2 · · · Partial Sum Row 0 Result

◮ Each warp (32 threads) is responsible for one row ◮ Reduce partial sum in shared memory

SLIDE 9

Warp-based Work Assignment for Row Groups

Warp 0 Warp 1 Row 0 Row 1 Row 2 · · · Row 0 Result Row 1 Result

◮ Each warp is responsible for a group of rows ◮ Group size depending on the average row sparsity of the

matrix

SLIDE 10

Evaluation Environment

Amazon Web Service EC2 g2.2xlarge

◮ NVIDIA GK104 GPU, 1,536 CUDA cores, with CUDA 7.0

Toolkit installed

◮ Intel Xeon E5-2670 CPU, 8 cores, with gcc/g++ 4.8.2

installed, -O3 optimization switched on Competitive reference: SPMV implementation in cuSparse (http://docs.nvidia.com/cuda/cusparse/) Dataset: generated scale-free networks based-on the Barab´ asi-Albert model, using Python NetworkX

SLIDE 11

float SPMV Performance Similiar to cuSparse

400 800 1,600 3,200 3 4 5 6 7 8 9 Graph Node Count (×103) Speedup of GPU SPMV over CPU Group SPMV cuSparse SPMV Naive SPMV

SLIDE 12

double SPMV Performance Better than cuSparse

400 800 1,600 3,200 4 5 6 7 8 9 10 11 Graph Node Count (×103) Speedup of GPU SPMV over CPU Group SPMV cuSparse SPMV Naive SPMV

SLIDE 13

Real-world Graphs

◮ as-Skitter: ∼ 1,700,000 nodes, ∼ 11,000,000 edges ◮ cit-Patents: ∼ 3,800,000 nodes, ∼ 17,000,000 edges

Converted to symmetric double adjacency matrices Data source: SNAP (http://snap.stanford.edu/data/index.html)

SLIDE 14

SPMV Better than cuSparse on Large Real-world Graphs

as-Skitter cit-Patents 2 4 6 8 10 12 7.4 11.6 7.5 10.8 2.5 7.5 Real-world Graph Speedup of GPU SPMV over CPU Group SPMV cuSparse SPMV Naive SPMV

SLIDE 15

Faster Eigenvalue Solver on GPU

as-Skitter cit-Patents 10 20 30 40 1.6 3.1 9 31.8 Real-world Graph Running Time of Eigensolvers (sec) GPU Eigensolver CPU Eigensolver

SLIDE 16