James Fox Collaborators Oded Green, Research Scientist (GT) Euna - PowerPoint PPT Presentation
Fast and Scalable Subgraph Isomorphism using Dynamic Graph Techniques James Fox Collaborators Oded Green, Research Scientist (GT) Euna Kim, PhD student (GT) Federico Busato, PhD student (Universita di Verona) Dr. Nicola
Fast and Scalable Subgraph Isomorphism using Dynamic Graph Techniques James Fox
Collaborators • Oded Green, Research Scientist (GT) • Euna Kim, PhD student (GT) • Federico Busato, PhD student (Universita di Verona) • Dr. Nicola Bombieri (Universita di Verona) • Kartik Lakhotia, PhD student (USC) • Shijie Zhou, PhD student (USC) • Shreyas Singapura, PhD student (USC) • Hanqing Zeng, PhD student (USC) • Dr. Rajgopal Kannan, (USC) • Prof. Viktor Prasanna (USC) • Prof. David Bader (GT) Quickly Finding a Truss in Haystack 2
Outline • K-Truss – Introduction – Sequential Approaches • Our new algorithm – Dynamic Triangle Counting – Hornet: data structure for dynamic graphs • Performance Analysis Quickly Finding a Truss in Haystack 3
K-Truss : for given 𝑙 , the 𝑙 − 𝑢𝑠𝑣𝑡𝑡 is a • Definition: subgraph such that each edge closes at least 𝑙 − 2 triangles, i.e. “ support ” of 𝑙 − 2 • A well-connected subgraph – “Relaxation of k-clique, stricter than k-core” [Cohen; 2008] – Computationally efficient to find • Maximal k-truss: focus of our work Quickly Finding a Truss in Haystack 4
Example 2 2 2 1 2 3 3 3 7 7 1 1 1 2 0 3 6 4 6 1 4 1 5 5 2 2 2 2 1 2 2 3 3 2 3 7 7 1 1 2 1 2 2 3 6 4 6 1 4 K=4 K=3 1 5 Truss 5 Truss Quickly Finding a Truss in Haystack 5
Over 1000x time faster Graph Challenge Innovation Award (HPEC’17) Three main factors • Algorithmic Optimization 1. Uses dynamic graph data structure 2. Novel algorithm for dynamically updating triangle counts • Parallelization • Programming model – vertex centric more efficient than linear algebra Quickly Finding a Truss in Haystack 6
Simple Vertex Centric 𝑙 ← 3 𝑥ℎ𝑗𝑚𝑓 𝐹 ≠ ∅ 𝑠𝑓𝑞𝑓𝑏𝑢 𝑣𝑜𝑢𝑗𝑚 𝑜𝑝 𝑛𝑝𝑠𝑓 𝑑ℎ𝑏𝑜𝑓𝑡 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 𝑗𝑔 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 < 𝑙 − 2 𝑒𝑓𝑚𝑓𝑢𝑓 𝑓 𝑔𝑠𝑝𝑛 𝐹 𝑙 ← 𝑙 + 1 𝑙 ← 𝑙 − 1 Quickly Finding a Truss in Haystack 7
Linear Algebra Formulation • Given k • Bold letters refer to vectors and matrices 𝑺 = 𝑭𝑩 𝒚 = 𝑔𝑗𝑜𝑒 𝑆 == 2 ⋅ 𝟐 < 𝑙 − 2 𝑥ℎ𝑗𝑚𝑓 𝒚 𝑭 𝒚 = 𝑭 𝒚, : 𝑭 = 𝑭 𝒚 𝒅 , : 𝑺 = 𝑭 𝒚 𝒅 , : 𝑩 𝑼 − 𝑒𝑗𝑏 𝑭 𝒚 𝑭 𝒚 𝑼 𝑺 = 𝑺 − 𝑭 𝑭 𝒚 𝑭 𝒚 𝒚 = 𝑔𝑗𝑜𝑒 𝑆 == 2 ⋅ 𝟐 < 𝑙 − 2 Quickly Finding a Truss in Haystack 8
New Algorithm for finding Maximal Truss 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 ü - par paral allel w e ← 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 𝑙 ← 3 𝑥ℎ𝑗𝑚𝑓 𝐹 ≠ ∅ 𝑠𝑓𝑞𝑓𝑏𝑢 𝑣𝑜𝑢𝑗𝑚 𝑜𝑝 𝑛𝑝𝑠𝑓 𝑑ℎ𝑏𝑜𝑓𝑡 𝑚𝑗𝑡𝑢 ← ∅ 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 𝑗𝑔 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 < 𝑙 − 2 ü - par paral allel 𝑏𝑞𝑞𝑓𝑜𝑒 𝑚𝑗𝑡𝑢, 𝑓 𝐻 RST ← CreateGraph(𝑚𝑗𝑡𝑢) ü - par paral allel 𝑠𝑓𝑛𝑝𝑤𝑓𝐹𝑒𝑓𝑡 𝐻, 𝐻 RST ü - par paral allel 𝑉𝑞𝑒𝑏𝑢𝑓𝑈𝑠𝑗𝑏𝑜𝑚𝑓𝐷𝑝𝑣𝑜𝑢 𝐻, 𝐻 RST ü - par paral allel 𝑙 ← 𝑙 + 1 𝑙 ← 𝑙 − 1 Quickly Finding a Truss in Haystack 9
𝐻 RST ← CreateGraph(𝑚𝑗𝑡𝑢) • We will create a graph from all the deleted edges • Adjacencies will be sorted 2 2 2 1 1 2 3 3 3 7 7 1 1 1 1 3 1 1 6 4 6 4 1 1 5 5 𝐻 RST 𝐻 Quickly Finding a Truss in Haystack 10
𝑉𝑞𝑒𝑏𝑢𝑓𝑈𝑠𝑗𝑏𝑜𝑚𝑓𝐷𝑝𝑣𝑜𝑢 𝐻, 𝐻 RST • Must update counts of non-removed edges • Don’t want to re-compute globally After deletion (incorrect triangle counts) Updated triangle counts 2 2 2 2 2 2 3 3 3 2 7 7 1 2 1 2 3 2 6 4 6 4 5 5 Quickly Finding a Truss in Haystack 11
Three “types” of triangles affected v 1. One edge removed u w v 2. Two edges removed u w v 3. All three edges removed u w [Makkar; HiPC’17] Quickly Finding a Truss in Haystack 12
One edge removed v • 𝑣, 𝑤 deleted u w • By intersecting the list of 𝑣 with the list of 𝑤 we can find all common neighbors – Decrement support by 1 • For all 𝑓 = 𝑣, 𝑤 ∈ 𝐻 RST – 𝐽𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢(𝑣, 𝐻, 𝑤, 𝐻) Quickly Finding a Truss in Haystack 13
Two edges removed v • 𝑣, 𝑤 and 𝑣, 𝑥 deleted u • Intersecting the adjacencies like w before won’t work. • Instead we will intersect adjacencies from the two graphs: 𝐻 and 𝐻 RST • For all 𝑓 = 𝑣, 𝑤 ∈ 𝐻 RST – 𝐽𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢(𝑣, 𝐻, 𝑤, 𝐻 RST ) • Can handle double-counting Quickly Finding a Truss in Haystack 14
Three edges removed v • 𝑣, 𝑤 , 𝑣, 𝑥 , 𝑥, 𝑤 deleted u w • No need to update supports! Quickly Finding a Truss in Haystack 15
So what else do we need? • We need a dynamic graph data structure • These data structures don’t cut it Na Names De Dense Li Linked COO ( OO (Edge CS CSR/CS /CSC Adjacency Ad li lists li list) Matrix Ma ❌ ❌ ❌ Good ü Locality ❌ ❌ Flexible ü ü Updates Quickly Finding a Truss in Haystack 16
Hornet… U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id Id 2 2 3 2 2 2 1 0 Used Us Over-allocated space 2 2 4 2 2 2 1 0 BS BSiz ize Pointer Po 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Dest./Col. 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Value • Supports updates • Efficient memory manager – Supports edge insertion\deletion – Memory reclamation and deletion. – Hidden from user – Supports vertex insertion\deletion. • Framework • Good locality – Edge list contiguous Quickly Finding a Truss in Haystack 17
Experimental Setup - CPU Intel Dual Processor • Intel Xeon E5-2695 • 16 cores / per processor (32 in total) – 64 threads with Hyperthreading • 45MB LLC • 1TB of DDR4 Quickly Finding a Truss in Haystack 18
Experimental Setup - GPU Single Pascal 𝑄100 • 56 processors (SMs) • 64 threads / per processors (SPs) • 3584 hardware threads • 16GB of HBM2 – 720 GB/s bandwidth Quickly Finding a Truss in Haystack 19
Inputs Graphs • HPEC Graph Challenge • SNAP – Stanford Network Analysis Project The following is only a subset of these graphs: |𝑾| |𝑭| * Na Name Network T Type 𝑑𝑗𝑢 − 𝐼𝑓𝑞𝑄ℎ Citation 35k 421k 𝑏𝑛𝑏𝑨𝑝𝑜0601 Co-purchasing 400𝑙 2.4𝑁 𝑠𝑝𝑏𝑒𝑂𝑓𝑢 − 𝑄𝐵 Road 1𝑁 1.5𝑁 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 𝑠𝑏𝑞ℎ500 − 𝑡𝑑𝑏𝑚𝑓21 Random 2.1𝑁 34𝑁 *largest: |E|= 134M Quickly Finding a Truss in Haystack 20
Benchmarks 1. Graph Challenge 1. Julia 2. Python 3. Matlab\Octave 2. Our algorithms tive - uses static triangle counting 1. 1. Ite Iterati ta - uses new algorithm 2. 2. Delta Quickly Finding a Truss in Haystack 21
Finding the Maximal Truss Time out – 8 hours Usually – 200X-500X faster Many times over 2000X faster Sometimes 10,000X faster Quickly Finding a Truss in Haystack 22
Execution time per iteration Quickly Finding a Truss in Haystack 23
Future Work • We still think that we can improve by another 10X… • New triangle counting kernel – Balanced and imbalanced intersections – Improved warp utilization Quickly Finding a Truss in Haystack 24
Summary • New algorithm for finding the maximal K- Truss • Given a static input we use techniques from dynamic graph algorithms • Hundreds to thousands of times faster than the benchmarks • We still think that we can improve by another 10X… Quickly Finding a Truss in Haystack 25
Thank you • Email: jfox43@gatech.edu Quickly Finding a Truss in Haystack 26
Backup Slides Quickly Finding a Truss in Haystack 27
Wang & Chang; 2012 • Modified version of Cohen’s algorithm • Sorts the edges based on their support – In each iteration, edges with a support smaller than 𝑙 − 2 are removed • Inherently sequential (due to update process) • Yet, significantly faster than Cohen’s algorithm • Uses hash maps for intersections Quickly Finding a Truss in Haystack 28
Hornet Data Layout • A scalable and dynamic data structure for graph algorithms and linear algebra based problems • Can support up-to 90 million updates per second • Low overhead in comparison with CSR – Initializing is also relatively in-expensive 20%-200% – Equal performance • Simple to use • Implemented for CUDA, yet portable for other architectures cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTIN INGER: S : Supporting d dynamic g graph a algorithms fo for G GPUs Quickly Finding a Truss in Haystack 29
Hornet – Property Graph Support U SER -I NTERFACE 0 1 2 3 4 5 6 7 Vertex Id Id 2 2 3 2 2 2 1 0 Used Us 2 2 4 2 2 2 1 0 BSiz BS ize Po Pointer Dest./Col. 3 1 2 0 5 2 6 2 5 1 4 0 3 4 Weight 2 2 5 2 7 1 2 4 1 7 1 4 1 4 Type Time 1 User 1 User 2 …. • These are optional fields Quickly Finding a Truss in Haystack 30
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.