Partial Re-streaming Approach For Massive Graph Partitioning - - PowerPoint PPT Presentation

partial re streaming approach for massive graph
SMART_READER_LITE
LIVE PREVIEW

Partial Re-streaming Approach For Massive Graph Partitioning - - PowerPoint PPT Presentation

Partial Re-streaming Approach For Massive Graph Partitioning Ghizlane ECHBARTHI Hamamache KHEDDOUCI L aboratoire d' I nfoRmatique en I mage et S ystmes d'information LIRIS UMR 5205 CNRS/INSA de Lyon/Universit Claude


slide-1
SLIDE 1

Laboratoire d'InfoRmatique en Image et Systèmes d'information

LIRIS UMR 5205 CNRS/INSA de Lyon/Université Claude Bernard Lyon 1/Université Lumière Lyon 2/Ecole Centrale de Lyon

http://liris.cnrs.fr

Partial Re-streaming Approach For Massive Graph Partitioning

Ghizlane ECHBARTHI Hamamache KHEDDOUCI

1

slide-2
SLIDE 2

Introduction

Actual graph datasets are huge ! ( World wide web, Facebook, Twitter, Biological networks, ...) Usual computations over these graphs become challenging !

slide-3
SLIDE 3

Application

Graph partitioning is an essential preprocessing step for distributed graph computations. Random Graph Partitioning is widely applied in distributed graph computation systems (Pregel, GraphLab, Horton, …) in

  • rder to run parallel algorithms.

Instead of random partitioning, can we think of a more sophisticated partitioning strategy ?

3

slide-4
SLIDE 4

Outlines

  • 1. Realted work
  • 2. Proposed approach
  • 3. Evaluation results
  • 4. Conclusion

4

slide-5
SLIDE 5

Related work: Streaming graph partitioning

Streaming GP was first introduced in [1]. Each vertex arrives with his adjacency list. The Partitioner is a heuristic deciding in which machine the current vertex will be placed. K*C must handle the whole graph with C is the capacity of each machine.

5

[1] : Stanton and Kliot, Streaming graph partitioning for large distributed graphs 2012.

slide-6
SLIDE 6

Related work: partitioning strategies

LDG [1]: Fennel [2]: Restreaming [3]: Strategy that streams the graph dataset several times in order to improve the partition quality: ReLDG and ReFENNEL.

6

[1] Stanton et al. 2012. [2] Charalampos et al. 2012. [3] Nishimura et al. 2013

slide-7
SLIDE 7

Related work summary

In 2012 In 2013 Since the 90’s

slide-8
SLIDE 8

Proposed approach: Partial Restreaming (PR)

PR method consists in re-streaming only a portion of the graph dataset. Advantages :

  • Less information to store
  • Less time to run
  • There exist two versions of PR method:
  • Simple Partial Restreaming partitioning
  • Selective Partial Restreaming partitioning

N.B: Partitioning heuristics used are Linear Deterministic Greedy LDG [1],

and Fennel [2].

8

[1] Stanton et al. 2012. [2] Charalampos et al. 2012.

slide-9
SLIDE 9

Proposed approach: Simple Partial Restreaming

  • Simple partial re-streaming method consists of two major

phases: Phase 1: The first loaded portion of size S of the graph dataset is re-streamed several times. Phase 2: The rest of the graph dataset is streamed once.

9

slide-10
SLIDE 10

Proposed approach: Selective Partial Restreaming

  • Selective partial re-streaming method consists of two major

phases: Phase 1: Select a portion with a high average degree and density and re-stream it several times. Phase 2: The rest of the graph dataset is streamed once.

10 10

slide-11
SLIDE 11

Evaluation Set up

Datasets used

All datasets are

  • btained from

the SNAP repository [1]

Parameters

  • k = 40 parts
  • s=10 streams
  • Half of the graph dataset is re-streamed

11 11

[1]: http://snap.stanford.edu/data/

slide-12
SLIDE 12

Evaluation results

Comparing Partial Restreaming and Full restreaming methods

12 12

Difference = 5.9% Difference = 2.3%

slide-13
SLIDE 13

Evaluation results

Comparing Simple partial restreaming and Selective partial restreaming

13 13

slide-14
SLIDE 14

Evaluation results

Computing the run time gain

The run time gain is approximately 50% when restreaming the half of the graph.

14 14

slide-15
SLIDE 15

Conclusion

Simple PR method reduces the runtime while delivering good quality partitions as in the setting of full restreaming. Selective PR method improves the partition quality compared to the simple PR method. However, selective PR is costly than Simple PR with regard to the runtime.

15 15

slide-16
SLIDE 16

Thanks for your attention !