Alekh Jindal
Graphs On Databases
Sam Madden Mike Stonebraker Amol Deshpande
MIT
University
- f Maryland
NEDB 2014
Talking on at Supervisors work work collaborate work sabbatical
MIT And they are growing bigger X X -> DATA DATABASE - - PowerPoint PPT Presentation
Graphs On Databases at Talking on NEDB 2014 Alekh Jindal collaborate Supervisors Amol Deshpande Sam Madden work sabbatical Mike Stonebraker work University work of Maryland MIT And they are growing bigger X X -> DATA
Sam Madden Mike Stonebraker Amol Deshpande
University
NEDB 2014
Talking on at Supervisors work work collaborate work sabbatical
And they are growing bigger
Physical Data Independence Logical Data Independence
5 1 2 3 4
id value 1 1 2 1 3 1 4 1 5 1
Nodes
fromId toId weight t 1 2 1 2 4 1 2 5 1 3 2 1 5 1 1
Edges
5 1 2 3 4
Select * From Nodes Where Id=ID
Select * From Edges Where fromId=ID
Select * From Edges Group By fromId
Select * From Edges e1,Edges e2 Where e1.toId=e2.fromId
FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId GROUP BY e.toId, n2.value HAVING min(n1.value+1) < n2.value SELECT e.toId AS Id, min(n1.value+1) AS value UPDATE Nodes AS node SET value= new_node.value FROM( ) AS new_node WHERE node.Id = new_node.Id;
FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId GROUP BY e.toId, n2.value HAVING min(n1.value+1) < n2.value SELECT e.toId AS Id, min(n1.value+1) AS value UPDATE Nodes AS node SET value= new_node.value FROM( ) AS new_node WHERE node.Id = new_node.Id;
Parallel Graph Exploration Nested Query Sorting/Indexing
Iterative Graph Queries
UDF / Stored Procedure
5 1 2 3 4
Initialization:
Loop:
The shortest paths SQL
Termination Condition:
UPDATE Nodes AS node SET value=new_node.value FROM( SELECT e.toId AS Id, min(n1.value+1) AS value FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId GROUP BY e.toId, n2.value HAVING min(n1.value+1) < n2.value ) AS new_node WHERE node.Id = new_node.Id;
No more nodes to Update
Time (seconds)
1 10 100 1000 10000Twitter GPlus LiveJournal
29.4 4.2 3.3 218.1 53.5 47.0 4,172.4 101.5 17.4 28.0 589.0Graph Database Main-memory Database Row Store Database Apache Giraph Column Store Database
Time (seconds)
1 10 100 1000 10000 100000Twitter GPlus LiveJournal
135.1 9.2 6.7 115.5 50.8 43.7 18,702.2 492.9 74.5 29.1 395.6Graph Database Main-memory Database Row Store Database Apache Giraph Column Store Database
UPDATE Nodes AS node SET value=new_node.value FROM( SELECT e.toId AS Id, min(n1.value+1) AS value FROM Nodes AS n1, Edges AS e, Nodes AS n2 WHERE n1.Id=e.fromId AND n2.Id=e.toId GROUP BY e.toId, n2.value HAVING min(n1.value+1) < n2.value ) AS new_node WHERE node.Id = new_node.Id;
SQL
void compute(vector<float> messages){ // get the minimum distance float mindist = id==START_NODE ? 0 : DBL_MAX; for(vector<float>::iterator it = messages.begin(); it != messages.end(); ++it) mindist = min(mindist,*it); // send messages to all edges if new minimum is found float vvalue = getVertexValue(); if(mindist < vvalue){ modifyVertexValue(mindist); vector<int> edges = getOutEdges(); for(vector<int>::iterator it = edges.begin(); it != edges.end(); ++it) sendMessage(*it, mindist+1); } // halt voteToHalt(); }
Pregel
SQL Pregel
id value 1 1 2 1 3 1 4 1 5 1
Nodes
fromId toId weight 1 2 1 2 4 1 2 5 1 3 2 1 5 1 1
Edges
5 1 2 3 4
DATA DATABASE
Physical Data Independence Logical Data Independence
APPLICATION
Vertex Programs
Pregel-style API:
Vertex UDF
Invokes the vertex program if:
Coordinator
Synchronizes supersteps Redistributes Messages
Vertex (V), Edge (E), Message (M)
Vertex-centric Query Interface
DATA DATABASE
Physical Data Independence Logical Data Independence
APPLICATION
Vertex Programs
Pregel-style API:
Vertex UDF
Invokes the vertex program if:
Coordinator
Synchronizes supersteps Redistributes Messages
Vertex (V), Edge (E), Message (M)
Vertex-centric Query Interface
Union Batching No in- place Updates
Time (seconds)
1 10 100 1000 10000 100000Twitter GPlus LiveJournal
335.5 47.7 10.9 218.1 53.5 47.0 2,071.0 421.5Main-memory Database Apache Giraph Column Store Database
Time (seconds)
1 10 100 1000 10000Twitter GPlus LiveJournal
146.3 23.8 10.6 115.5 50.8 43.7 7,950.1 712.2 121.0Main-memory Database Apache Giraph Column Store Database
Vertex-centric interface allows…
…. right within the database system!
Advantages of “Graphs on Databases”
queries (plus UDFs)
graph-natural query interfaces