Beyond Macrobenchmarks
Microbenchmark-based Graph Database Evaluation
Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis
Universiteit Utrecth
Beyond Macrobenchmarks Microbenchmark-based Graph Database - - PowerPoint PPT Presentation
Beyond Macrobenchmarks Microbenchmark-based Graph Database Evaluation Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis Universiteit Utrecth Knowledge Graph Protein Interaction Road Network Network Graphs are Everywhere Social
Matteo Lissandrini, Martin Brugnara, Yannis Velegrakis
Universiteit Utrecth
Graph Databases Evaluation – Matteo Lissandrini
Protein Interaction Network Road Network Social Network Knowledge Graph
Graph Databases Evaluation – Matteo Lissandrini
node02 node03 node01 Pr Presents in in
Na Name: Matteo Ro Role: Post-do doc In Interests: Graphs Ti Title: Beyond
Top Topic: Gr GraphDB On On : 2019-08 08-26 26
re refere rences
na name: VLDB’19 ye year ar: 2019
edge01 e d g e 2 edge03
Graph Databases Evaluation – Matteo Lissandrini
CosmosDB Oracle Graph
Graph Databases Evaluation – Matteo Lissandrini
OLTP
Updates Transaction Selectivity Indices User-interaction Concurrency Availability
OLAP*
Business-intelligence Batch Algorithms Processing Statistics Mining Complex Queries Pathfinding Connectivity Export/Import
GraphLab Giraph/Pregel GraphX ArangoDB Blazegraph Neo4j OrientDB Sparksee Titan/Janus Our Focus
[Ammar and Özsu, VLDB’18]
Graph Databases Evaluation – Matteo Lissandrini
OLTP
Updates Transaction Selectivity Indices User-interaction Concurrency Availability Complex Queries Pathfinding Connectivity Export/Import
ArangoDB Blazegraph Neo4j OrientDB Sparksee Titan/Janus What solution works best?
Graph Databases Evaluation – Matteo Lissandrini
Graph Databases Evaluation – Matteo Lissandrini
Native Native Non Native
Specialized Query-processing &Algorithms Specialized Data-structures & Indexes
How to implement a Graph Database
Graph Databases Evaluation – Matteo Lissandrini
9
FACTORS
System Architecture Query Workload Data Characteristics
OUTCOME
Evaluate Pros/Cons of each design decision Identify cause of underperformant operations 1 2
Graph Databases Evaluation – Matteo Lissandrini
Goals
Techniques
and output of previous queries
Limitations
performance
Goals
Techniques
Advantages
Our Proposal
Example:
Graph Databases Evaluation – Matteo Lissandrini
Insertions, updates, retrievals both for values stored on nodes and edges, and structural elements (add/remove/retrieve nodes/edges)
Access local structure around the node, verify reachability, as well as search for nodes with specific structural characteristics
Graph Databases Evaluation – Matteo Lissandrini
Insertions, updates, retrievals both for values stored on nodes and edges, and structural elements (add/remove/retrieve nodes/edges)
Graph Databases Evaluation – Matteo Lissandrini
Access local structure around the node, verify reachability, as well as search for nodes with specific structural characteristics
incoming/outgoing edges)
inbound connections
Graph Databases Evaluation – Matteo Lissandrini
# Query Description
Cat
1.
g.loadGraphSON("/path")
Load dataset into the graph ‘g’
L
2.
g.addVertex(p[])
Create new node with properties p
C
3.
g.addEdge(v1 , v2 , l)
Add edge from 1 to 2 4.
g.addEdge(v1 , v2 , l , p[])
Same as Q.3, but with properties p 5.
v.setProperty(Name, Value)
Add property Name=Value to node 6.
e.setProperty(Name, Value)
Add property Name=Value to edge e 7.
g.addVertex(. . . ); g.addEdge(. . . ) Add a new node, and then edges to it
8.
g.V.count()
Total number of nodes
R
9.
g.E.count()
Total number of edges
Existing edge labels (no duplicates)
Nodes with property Name=Value
Edges with property Name=Value
Edges with label l
The node with identifier d
The edge with identifier d
Update property Name for vertex
U
Update property Name for edge e
Delete node identified by d
D
Delete edge identified by d
Remove node property Name from
Remove edge property Name from e
Nodes adjacent to via incoming edges
T
Nodes adjacent to via outgoing edges
Nodes adjacent to via edges labeled l
Labels of in coming edges of (no dupl.)
Labels of outgoing edges of (no dupl.)
Labels of edges of (no dupl.)
Nodes of at least k-incoming-degree
Nodes of at least k-outgoing-degree
Nodes of at least k-degree
Nodes having an incoming edge
Nodes reached via breadth-First
.store(j).loop(‘i’)
traversal from
Nodes reached via breadth-First
.store(vs).loop(‘i’)
traversal from on labels s
.loop(’i’){!it.object.equals(v2)} .retain([v2]).path()
Same as Q.34, but only following label
∗
[ ] d e n
e s a H a s h M a p ; g i s t h e g r a p h ;
n d e a r e n
e / e d g e s .
3 5 d i s t i n c t C
c r e t e O p e r a t
s
Graph Databases Evaluation – Matteo Lissandrini
B a t t e r i e s I n c l u d e d
Connected Component Degree |V| |E| |L| # Maxim Density Modularity Avg Max
2.3K 7.1K 167 101 2.2K 1.34∗10−3 3.66∗10−2 6.1 66 11 MiCo 100K 1.1M 106 1.3K 93K 1.10∗10−6 5.45∗10−3 21.6 1.3K 23 Frb-O 1.9M 4.3M 424 133K 1.6M 1.19∗10−6 9.82∗10−1 4.3 92K 48 Frb-S 0.5M 0.3M 1814 0.16M 20K 1.20∗10−6 9.91∗10−1 1.3 13K 4 Frb-M 4M 3.1M 2912 1.1M 1.4M 1.94∗10−7 7.97∗10−1 1.5 139K 37 Frb-L 28.4M 31.2M 3821 2M 23M 3.87∗10−8 2.12∗10−1 2.2 1.4M 33 ldbc 184K 1.5M 15 1 184K 4.43∗10−5 16.6 48K 10
PREVIOUS TESTS ONLY 1M Nodes Various Sizes & Domains: Real and Synthetic Datasets Ready-to-go Systems & Configurations Most popular systems already integrated and ready to use
Graph Databases Evaluation – Matteo Lissandrini
Common Query Language Plug and Play setup & Controlled Environment
Graph Databases Evaluation – Matteo Lissandrini
Native systems with JOIN-free adjacency provide the best scalability for generic traversals (> 2 hops).
100 1.000 10.000 100.000 1.000.000
Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Q32 (depth 3) Q32 (depth 4) Q32 (depth 5)
Time (ms)
Blaze
Neo 3.0 Arango Neo 1.9 Orient Sparksee Pg
100 1.000 10.000 100.000
Fbr-S Fbr-O Fbr-M Fbr-L Q32 (depth 2)
Time (ms)
(b) (a)
1 sec 1 min 1sec 1min 1hour
BFS
Graph Databases Evaluation – Matteo Lissandrini
Depending on the nature of the query some systems perform best than others: e.g., relational systems perform best in high selectivity queries for attributes.
1 10 100 1.000 10.000 Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Q14 Q15 Time (ms)
Blaze
Neo 3.0 Arango Neo 1.9 Orient Sparksee Pg
10 100 1.000 10.000 100.000 1.000.000 10.000.000 Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Fbr-S Fbr-O Fbr-M Fbr-L Q8 Q9 Q10 Q11 Q12 Q13
Time (ms)
(b) (a)
1sec 1min 1hour 1 sec 10 sec 100ms
Count nodes, edges, and distinct labels Search by Property and Label Search by ID
Graph Databases Evaluation – Matteo Lissandrini
With large graphs and large intermediate results,
20 40 60 80 100 120
I B I B I B I B I B I B I B Orient
Pg Arango Blaze
# Timeouts DB Engine and Execution Method
(c)
Frb L Frb O Frb M Frb S
Timeouts
Graph Databases Evaluation – Matteo Lissandrini
A Micro-Benchmark for an in-depth understanding of Graph Databases Performance http:// graphbenchmark . com / results.html
& Reproducible
Graph Databases Evaluation – Matteo Lissandrini
10 100 1.000 10.000
max-iid max-oid create city company university friend1 friend2 friend-tags add-tags friend-of-friend triangle places Time (ms)
Neo 1.9 Neo 3.0 Orient
Sparksee Arango Sqlg
Current Macro-benchmarks do not provide sufficient insight to understand the real capabilities and limitations of a graph database
Global search Local search 1-hop + edge insertion Local search 2+ hops
Graph Databases Evaluation – Matteo Lissandrini
Specialized Query Languages & API:
Standard Query Language
MAJORITY OF VENDORS: SQL dialect is simpler for customers A COMMON STANDARD Gremlin is still the most supported but implementations are still “young”
Graph Databases Evaluation – Matteo Lissandrini
Graph Databases Evaluation – Matteo Lissandrini
FIND FRIENDS = 2 JOINS
THE NODE-EDGE STRUCTURE
IS STORED IN INTERMEDIATE TABLES AND ACCESSED VIA INDEXES
COST O(log(n))
One Relation for each Node-type & Edge type
Graph Databases Evaluation – Matteo Lissandrini
DATA SPLITTED IN FILES: Node/Edge/Label/Property stores RECORDS OF FIXED SIZE NODE/EDGE RECORDS CONTAIN ONLY DIRECT POINTERS
Graph Databases Evaluation – Matteo Lissandrini
VALUE IS EVERYTHING THAT IS SHARED BY MULTIPLE OBJECTS Node Types, Edge Types, Attribute Values
BIT-MAPS: From object IDs to values, And from values to object IDs
1 2 3 4 5
Graph Databases Evaluation – Matteo Lissandrini
https://docs.janusgraph.org/advanced-topics/data-model/
Graph Databases Evaluation – Matteo Lissandrini