Scalable and Robust Management of Dynamic Graph Data Alan G. - - PowerPoint PPT Presentation

scalable and robust management of dynamic graph data
SMART_READER_LITE
LIVE PREVIEW

Scalable and Robust Management of Dynamic Graph Data Alan G. - - PowerPoint PPT Presentation

BD 3 2013 Scalable and Robust Management of Dynamic Graph Data Alan G. Labouseur Paul W. Olsen Jr. Jeong-Hyon Hwang {alan, polsen, jhh}@cs.albany.edu Sunday, September 22, 2013 Large, Dynamic Networks BD 3 2013 2 Sunday,


slide-1
SLIDE 1

BD3 2013

Scalable and Robust Management of Dynamic Graph Data

Alan G. Labouseur Paul W. Olsen Jr. Jeong-Hyon Hwang {alan, polsen, jhh}@cs.albany.edu

Sunday, September 22, 2013

slide-2
SLIDE 2

BD3 2013

Large, Dynamic Networks

2

Sunday, September 22, 2013

slide-3
SLIDE 3

BD3 2013

Large, Dynamic Networks

  • Social Networks

2

Sunday, September 22, 2013

slide-4
SLIDE 4

BD3 2013

Large, Dynamic Networks

  • Social Networks
  • Consumer Commerce Networks

2

Sunday, September 22, 2013

slide-5
SLIDE 5

BD3 2013

Large, Dynamic Networks

  • Social Networks
  • Consumer Commerce Networks
  • Financial Networks

2

Sunday, September 22, 2013

slide-6
SLIDE 6

BD3 2013

Large, Dynamic Networks

  • Social Networks
  • Consumer Commerce Networks
  • Financial Networks
  • Road Networks

2

Sunday, September 22, 2013

slide-7
SLIDE 7

BD3 2013

Large, Dynamic Networks

  • Social Networks
  • Consumer Commerce Networks
  • Financial Networks
  • Road Networks
  • Internet / WWW

2

Sunday, September 22, 2013

slide-8
SLIDE 8

BD3 2013

Large, Dynamic Networks

  • Social Networks
  • Consumer Commerce Networks
  • Financial Networks
  • Road Networks
  • Internet / WWW
  • DNA Interactions

2

Sunday, September 22, 2013

slide-9
SLIDE 9

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

3

5:00 AM

Sunday, September 22, 2013

slide-10
SLIDE 10

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

3

9.1 mi, 20 mins 5:00 AM

Sunday, September 22, 2013

slide-11
SLIDE 11

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

4

9.1 mi, 20 mins 5:00 AM

Sunday, September 22, 2013

slide-12
SLIDE 12

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

4

9.1 mi, 20 mins 5:00 AM 6:00 AM 15 mi, 25 mins

Sunday, September 22, 2013

slide-13
SLIDE 13

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

5

9.1 mi, 20 mins 5:00 AM 6:00 AM 15 mi, 25 mins

Sunday, September 22, 2013

slide-14
SLIDE 14

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

5

9.1 mi, 20 mins 5:00 AM 6:00 AM 15 mi, 25 mins 7:00 AM 20 mi, 30 mins

Sunday, September 22, 2013

slide-15
SLIDE 15

BD3 2013

Analysis of Large, Dynamic Networks

  • Transportation

6

9.1 mi, 20 mins 5:00 AM 6:00 AM 15 mi, 25 mins 7:00 AM 20 mi, 30 mins

  • Social and Political Studies / Marketing / National Security
  • How do communities or the centrality of an entity change over time?
  • Who are rising stars?

Sunday, September 22, 2013

slide-16
SLIDE 16

BD3 2013

The G* System (1/2)

7

G1

a c b d

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-17
SLIDE 17

BD3 2013

The G* System (1/2)

8

G1

a c b d b d a c b c d

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-18
SLIDE 18

BD3 2013

The G* System (1/2)

9

G1

a c b d b d

G1

a c b

G1 G1

c d c d a b

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-19
SLIDE 19

BD3 2013

The G* System (1/2)

10

G1

a c b d b d

G1

a c b

G1 G1

c d c d a b

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-20
SLIDE 20

BD3 2013

The G* System (1/2)

10

G1

a c b d b d

G1

a c b

G1 G1

c d

G2

e c d a b c e

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-21
SLIDE 21

BD3 2013

The G* System (1/2)

11

G1

a c b d b d

G1∩G2

a c b

G1∩G2

G2

e c d a b c

G1-G2

c e

G2-G1

d

G1∩G2

c e d a b

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-22
SLIDE 22

BD3 2013

The G* System (1/2)

12

G1

a c b d b d

G1∩G2

a c b

G1∩G2

G2

e c d a b c

G1-G2

c e

G2-G1

d

G1∩G2

c e d a b

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-23
SLIDE 23

BD3 2013

The G* System (1/2)

12

G1

a c b d b d

G1∩G2

a c b

G1∩G2

G2

e c d a b c

G1-G2

c e

G2-G1

d

G1∩G2

G3

c e d a b f d f

  • distributed, deduplicated storage of graph snapshots

......

γ α β

Sunday, September 22, 2013

slide-24
SLIDE 24

BD3 2013

The G* System (1/2)

13

G1

a c b d b d

G1∩G2∩G3

a c b

G1∩G2∩G3

......

γ α β

G2

e c d a b c

G1-G2-G3

c e

(G2∩G3)-G1

d

(G1∩G2)-G3

f c e d a b d f

G3-G1-G2

G3

  • distributed, deduplicated storage of graph snapshots

Sunday, September 22, 2013

slide-25
SLIDE 25

BD3 2013

The G* System (2/2)

14

c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-26
SLIDE 26

BD3 2013

The G* System (2/2)

14

c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

vertex degree count, sum average union vertex degree count, sum vertex degree count, sum

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-27
SLIDE 27

BD3 2013

The G* System (2/2)

14

(c,♢,{G1}), (d,♢,{G1,G2}), (c,♢,{G2}), (e,♢,{G2}) (a,♢,{G1,G2}) (b,♢,{G1,G2}) c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

vertex degree count, sum average union vertex degree count, sum vertex degree count, sum vertex vertex vertex

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-28
SLIDE 28

BD3 2013

The G* System (2/2)

14

(a,2,{G1,G2}) (b,1,{G1,G2}) (c,♢,{G1}), (d,♢,{G1,G2}), (c,♢,{G2}), (e,♢,{G2}) (a,♢,{G1,G2}) (b,♢,{G1,G2}) c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

vertex degree count, sum average union vertex degree count, sum vertex degree count, sum vertex degree vertex degree vertex degree (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-29
SLIDE 29

BD3 2013

The G* System (2/2)

14

(1,1,{G1,G2}) (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2}) (c,♢,{G1}), (d,♢,{G1,G2}), (c,♢,{G2}), (e,♢,{G2}) (a,♢,{G1,G2}) (b,♢,{G1,G2}) (1,2,{G1,G2}) (2,0,{G1}), (3,1,{G2})) c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

vertex degree count, sum average union vertex degree count, sum vertex degree count, sum vertex degree count, sum vertex degree count, sum vertex degree count, sum (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-30
SLIDE 30

BD3 2013

The G* System (2/2)

14

(1,1,{G1,G2}) (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2}) (c,♢,{G1}), (d,♢,{G1,G2}), (c,♢,{G2}), (e,♢,{G2}) (a,♢,{G1,G2}) (b,♢,{G1,G2}) (1,2,{G1,G2}) (1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2}) (2,0,{G1}), (3,1,{G2})) c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

vertex degree count, sum average union vertex degree count, sum vertex degree count, sum vertex degree count, sum union vertex degree count, sum vertex degree count, sum (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}),

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-31
SLIDE 31

BD3 2013

The G* System (2/2)

14

(1,1,{G1,G2}) (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (a,2,{G1,G2}) (b,1,{G1,G2}) (c,♢,{G1}), (d,♢,{G1,G2}), (c,♢,{G2}), (e,♢,{G2}) (a,♢,{G1,G2}) (b,♢,{G1,G2}) (1,2,{G1,G2}) (3/4, G1), (4/5, G2) (1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2}) (2,0,{G1}), (3,1,{G2})) c b d

{G1,G2,G3}

a c b

{G1,G2,G3} {G1}

d f c e

{G2,G3} {G3}

......

d

{G1,G2}

γ β α

vertex degree count, sum average union vertex degree count, sum vertex degree count, sum vertex degree count, sum average union vertex degree count, sum vertex degree count, sum (c,0,{G1}), (d,0,{G1,G2}), (c,1,{G2}), (e,0,{G2}), (1,2,{G1,G2}), (1,1,{G1,G2}), (2,0,{G1}), (3,1,{G2})

  • sophisticated queries / sharing across graph snapshots

Sunday, September 22, 2013

slide-32
SLIDE 32

BD3 2013

Problem Statements

  • How to distribute graph snapshots on G* workers?
  • new graph snapshots generated continuously
  • must be efficient, scalable, and optimized for queries
  • How to replicate graph snapshots?
  • aim to maximize both availability and performance

15

Sunday, September 22, 2013

slide-33
SLIDE 33

BD3 2013

Impact of Snapshot Distribution (Example)

  • 100 similarly-sized graph snapshots
  • 100 G* workers
  • PageRank on one snapshot or all snapshots

16

query 1 worker/snapshot 100 workers/snapshot

  • ne snapshot

300 seconds 20 seconds all snapshots 300 seconds 2,000 seconds

Sunday, September 22, 2013

slide-34
SLIDE 34

BD3 2013

Impact of Snapshot Distribution (Example)

  • 100 similarly-sized graph snapshots
  • 100 G* workers
  • PageRank on one snapshot or all snapshots

16

query 1 worker/snapshot 100 workers/snapshot

  • ne snapshot

300 seconds 20 seconds all snapshots 300 seconds 2,000 seconds

loading: 200 seconds computation: 100 seconds

Sunday, September 22, 2013

slide-35
SLIDE 35

BD3 2013

Impact of Snapshot Distribution (Example)

  • 100 similarly-sized graph snapshots
  • 100 G* workers
  • PageRank on one snapshot or all snapshots

16

query 1 worker/snapshot 100 workers/snapshot

  • ne snapshot

300 seconds 20 seconds all snapshots 300 seconds 2,000 seconds

loading: 200 seconds computation: 100 seconds loading + comp.: 3 seconds transmission: 17 seconds

Sunday, September 22, 2013

slide-36
SLIDE 36

BD3 2013

Impact of Snapshot Distribution (Example)

  • 100 similarly-sized graph snapshots
  • 100 G* workers
  • PageRank on one snapshot or all snapshots

16

query 1 worker/snapshot 100 workers/snapshot

  • ne snapshot

300 seconds 20 seconds all snapshots 300 seconds 2,000 seconds

loading: 200 seconds computation: 100 seconds loading + comp.: 3 seconds transmission: 17 seconds

Sunday, September 22, 2013

slide-37
SLIDE 37

BD3 2013

Impact of Snapshot Distribution (Example)

  • 100 similarly-sized graph snapshots
  • 100 G* workers
  • PageRank on one snapshot or all snapshots

16

query 1 worker/snapshot 100 workers/snapshot

  • ne snapshot

300 seconds 20 seconds all snapshots 300 seconds 2,000 seconds

loading: 200 seconds computation: 100 seconds loading + comp.: 3 seconds transmission: 17 seconds

Sunday, September 22, 2013

slide-38
SLIDE 38

BD3 2013

Impact of Snapshot Distribution (Example)

  • 100 similarly-sized graph snapshots
  • 100 G* workers
  • PageRank on one snapshot or all snapshots

16

query 1 worker/snapshot 100 workers/snapshot

  • ne snapshot

300 seconds 20 seconds all snapshots 300 seconds 2,000 seconds

loading: 200 seconds computation: 100 seconds loading + comp.: 3 seconds transmission: 17 seconds

  • Lessons
  • balance correlated snapshots on many workers
  • distribute each snapshot on a few workers

Sunday, September 22, 2013

slide-39
SLIDE 39

BD3 2013

Snapshot Distribution Overview

17

Sunday, September 22, 2013

slide-40
SLIDE 40

BD3 2013

Snapshot Distribution Overview

  • partitions groups of snapshots (e.g., {G1, ..., G10},

{G11, ..., G20} ) into segments with a maximum size (e.g., 10GB)

17

Sunday, September 22, 2013

slide-41
SLIDE 41

BD3 2013

Snapshot Distribution Overview

  • partitions groups of snapshots (e.g., {G1, ..., G10},

{G11, ..., G20} ) into segments with a maximum size (e.g., 10GB)

  • workers exchange segments for higher query

speed

17

Sunday, September 22, 2013

slide-42
SLIDE 42

BD3 2013

Snapshot Distribution Overview

  • partitions groups of snapshots (e.g., {G1, ..., G10},

{G11, ..., G20} ) into segments with a maximum size (e.g., 10GB)

  • workers exchange segments for higher query

speed

  • whenever a segment becomes full, splits it into

two (e.g., METIS [SC 95])

17

Sunday, September 22, 2013

slide-43
SLIDE 43

BD3 2013

Segment Exchange (Example)

18

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

Sunday, September 22, 2013

slide-44
SLIDE 44

BD3 2013

Segment Exchange (Example)

18

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

Sunday, September 22, 2013

slide-45
SLIDE 45

BD3 2013

Segment Exchange (Example)

18

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

poor balancing low locality

Sunday, September 22, 2013

slide-46
SLIDE 46

BD3 2013

Segment Exchange (Example)

19

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

poor balancing low locality

Sunday, September 22, 2013

slide-47
SLIDE 47

BD3 2013

Segment Exchange (Example)

20

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

α

G3,1 G2,1 G2,2

β

G1,2 G1,1 G3,2

poor balancing low locality

Sunday, September 22, 2013

slide-48
SLIDE 48

BD3 2013

Segment Exchange (Example)

20

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

α

G3,1 G2,1 G2,2

β

G1,2 G1,1 G3,2

good balancing high locality poor balancing low locality

Sunday, September 22, 2013

slide-49
SLIDE 49

BD3 2013

Estimating Segment Migration Benefit

  • notation

21

α

G1,1 G2,1 G2,2

β

G1,2 G3,1 G3,2

Sα Sβ

s segment
to
move
from
worker
α
to
worker
β Sα segments
on
worker
α
=
{G1,1,
G2,1,
G2,2} Sβ segments
on
worker
β
=
{G1,2,
G3,1,
G3,2} Qk k
representative
query
patterns p(q) probability
that
query
pattern
q
is
executed time(q,
Sα,
Sβ) estimated
duration
of
query
pattern
q
 given
segment
placements
Sα
and
Sβ.

p(q)
(time(q,
Sα,
Sβ)
‐
time(q,
Sα­{s},
Sβ
∪
{s}))

q∈Qk

Sunday, September 22, 2013

slide-50
SLIDE 50

BD3 2013

Updates of Vertices and Edges

  • A vertex v and its edges initially assigned to a

worker w(v) corresponding to the hash value of the vertex ID.

  • worker w(v) stores vertex v and its edges in a

segment s and registers (v, s) in an index.

  • If segment s migrates to another worker, the

worker that created s maintains the current location of s.

22

Sunday, September 22, 2013

slide-51
SLIDE 51

BD3 2013

Graph Snapshot Replication

  • r copies of each snapshot to mask up to r-1

simultaneous worker failures

  • queries classified into r categories
  • j-th replica optimized for the j-th query category

(e.g., one replica distributed over many workers, another replica distributed over a few workers)

23

Sunday, September 22, 2013

slide-52
SLIDE 52

BD3 2013

Experimental Settings

  • 6 nodes
  • each node has 8 cores (2.67 GHz), 16 GB RAM,

and a 2TB hard drive

  • 500 cumulative graph snapshots, each with 20,000

additional edges.

  • SSSP, PageRank

24

Sunday, September 22, 2013

slide-53
SLIDE 53

BD3 2013

Experimental Results (SSSP)

  • Speedup
  • Impact of Graph Distribution

25

cores 1 2 4 8 16 24 48 speedup 1.0 1.9 3.7 5.9 9.7 12.5 14.7 query all workers subset of workers

  • ne snapshot

8.2 seconds 19.2 seconds all snapshots 80.5 seconds 53.2 seconds

Sunday, September 22, 2013

slide-54
SLIDE 54

BD3 2013

Related Work

  • Graph processing systems
  • Pregel [SIGMOD 10], GraphLab/GraphChi [OSDI 12],

DeltaGraph [ICDE 13]

  • Graph Partitioning
  • METIS [SC 95], GPS [SSDBM 13], CatchW [ICDE 13]
  • Data Replication
  • C-Store [VLDB 05]

26

Sunday, September 22, 2013

slide-55
SLIDE 55

BD3 2013

Future Work

  • full implementation of snapshot distribution/

replication techniques

  • experiments using various data/queries/

environments

  • fine grained splitting and migration of data
  • scheduling multiple queries

27

Sunday, September 22, 2013

slide-56
SLIDE 56

BD3 2013

Summary

  • Distribution of Graph Snapshots
  • balance correlated snapshots on many machines
  • store each snapshot on a few machines
  • Replication of Graph Snapshots
  • Optimize each replica for a different type of queries
  • Supported by NSF CAREER award IIS-1149372
  • G* demonstrated at ICDE 2013
  • G* available as open source at:

http://www.cs.albany.edu/~gstar/

28

Sunday, September 22, 2013

slide-57
SLIDE 57

BD3 2013

Thank You

29

http://www.cs.albany.edu/~gstar/

Sunday, September 22, 2013