1
1
Stefan Armbruster t: @darthvader42 e:stefan.armbruster@neotechnology.com
Introduction in Graph Databases and Neo4j
most slides from: Michael Hunger
Introduction in Graph Databases and Neo4j most slides from: Stefan - - PowerPoint PPT Presentation
Introduction in Graph Databases and Neo4j most slides from: Stefan Armbruster Michael Hunger t: @darthvader42 e:stefan.armbruster@neotechnology.com 1 1 The Path Forward 1. No .. NO .. NOSQL 2. Why graphs? 3. What's a graph database? 4. Some
1
1
Stefan Armbruster t: @darthvader42 e:stefan.armbruster@neotechnology.com
most slides from: Michael Hunger
2
2
3
3
5
5
6 6
7 7
htup://www.fmickr.com/photos/crazyneighborlady/355232758/
8 8
htup://gallery.nen.gov.uk/image82582-.html
9 9
htup://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale
1 1
1 1
RDBMS
1 1
Density ~= Complexity
Column Family
Volume ~= Size
Key-Value Store Document Databases Graph Databases
1 2
1 2
1 3 1 3
Patrik Runald @patrikrunald 3 Nov “@mgonto: The best explanation about what BigData is. Hilarious: pic.twitter.com/d8ZVP7xJFu
1 4 1 4
1 5
1 5
1 6 1 6
Leonhard Euler 1707-1783
1 7 1 7
1 8
1 8
1 9
1 9
http://www.bbc.co.uk/london/travel/downloads/tube_map.html
2
๏ Relationships in
๏ Biology, Chemistry, Physics, Sociology
๏ Internet
๏ Social Networks
2
2 1
๏ the world is rich, messy and related data ๏ relationships are as least as important as the things they connect ๏ Graphs = Whole > Σ parts ๏ complex interactions ๏ always changing, change of structures as well ๏ Graph: Relationships are part of the data ๏ RDBMS: Relationships part of the fixed schema
2 1
2 7 2 7
2 8 2 8
3 1
3 1
3 2
3 2
NO: not for charts & diagrams, or vector artwork YES: for storing data that is structured as a graph remember linked lists, trees? graphs are the general-purpose data structure “A relational database may tell you the average age of everyone in this session, but a graph database will tell you who is most likely to buy you a beer.”
3 3
3 3
foo bar foo_bar
3 4 3 4
foo bar foo_bar
3 5
3 5
Properties (each a key+value) + Indexes (for easy look-ups) + Labels (Neo4j 2.0)
3 6
๏ a sample social graph
๏ average 50 friends per person ๏ pathExists(a,b) limited to depth 4 ๏ caches warmed up to eliminate disk I/O
3 6
# persons query time Relational database 1.000 2000ms Neo4j 1.000 2ms Neo4j 1.000.000 2ms
3 7
๏ Strengths
๏ Weaknesses:
3 7
3 8
3 8
3 9
// lookup starting point in an index start n=node:People(name = ‘Andreas’)
3 9
// then traverse to find results start n=node:People(name = ‘Andreas’) match (n)--()--(foaf) return foaf
4
4
4 1 4 1
4 2 4 2
FRIEND_OF SHARED COMMENTED
ON LIKES
4 3 4 3
FRIEND_OF SHARED COMMENTED
ON LIKES
4 4 4 4
FRIEND_OF SHARED COMMENTED
ON LIKES
Photo Person Person
4 5
4 5
4 6 4 6
4 8
๏ A Graph Database:
๏ A Graph Database:
4 8
4 9
4 9
Andre as Peter Emil Alliso n knows knows knows knows // Cypher query - friend of a friend start n=node(0) match (n)--()--(foaf) return foaf
5
5
๏ 1. Embeddable on JVM
Show me some code, please Show me some code, please
GraphDatabaseService graphDb = new EmbeddedGraphDatabase(“var/neo4j”); Transaction tx = graphDb.beginTx(); try { Node steve = graphDb.createNode(); Node michael = graphDb.createNode(); steve.setProperty(“name”, “Steve Vinoski”); michael.setProperty(“name”, “Michael Hunger”); Relationship presentedWith = steve.createRelationshipT
michael, PresentationT ypes.PRESENTED_WITH); presentedWith.setProperty(“date”, today); tx.success(); } fjnally { tx.fjnish(); }
Spring Data Neo4j
@NodeEntity public class Movie { @Indexed private String title; @RelatedT
private Set<Role> cast; private Director director; } @NodeEntity public class Actor { @RelatedT
private Set<Movies> movies; } @RelationshipEntity public class Role { @StartNode private Actor actor; @EndNode private Movie movie; private String roleName; }
5 4
๏ Declarative query language
๏ Examples:
5 4
START david=node:people(name=”David”) # index lookup MATCH david-[:knows]-friends-[:knows]-new_friends WHERE new_friends.age > 18 RETURN new_friends START user=node(5, 15, 26, 28) # node IDs MATCH user--friend RETURN user, COUNT(friend), SUM(friend.money)
Create Graph with Cypher
CREATE (steve {name: “Steve Vinoski”})
(michael {name: “Michael Hunger”})
5 6
5 6
๏ 2. Server with REST API
5 7
5 7
REST://
5 8
5 8
๏ Server capability == Embedded capability
5 9
6
6
San Jose, CA
Cisco.com Industry: Communications Use case: Recommendations
the efficacy of online self service
cases, solutions, articles, forums, etc.
to be lowered
Support Services
master data management solution.
cross-reference links, and represented in Neo4j
reliance on customer support
Support Case Support Case Support Case Support Case Knowledg e Base Article Knowledg e Base Article Solution Solution Knowledg e Base Article Knowledg e Base Article Knowledg e Base Article Knowledg e Base Article Message Message
San Jose, CA
Cisco HMP Industry: Communications Use case: Master Data Management
Cisco’s needs
available, with zero downtime
manufacturers#91 Global 2000. $44B in annual sales.
data hierarchies in a performant way
heart is Neo4j. Data access services available 24x7 to applications companywide
Platform (HMP)
data access and business rules
Oracle RAC, serving master data in real time
needed to support Cisco’s business rules
include product hierarchy
Industry: Logistics Use case: Parcel Routing
to any point
better than relational
Sausalito, CA
GlassDoor Industry: Online Job Search Use case: Social / Recommendations
through personal & professional connections
competitive market
inside information to job seekers
their network of Facebook friends
graph
brought online as graph size and load have increased
Person Person Company Company
K N O W S
Person Person Person Person
KNOWS
Company Company
K N O W S WORKS_AT WORKS_AT
Paris, France
SFR Industry: Communications Use case: Network Management
because of the need to model network impacts
resilience during unplanned network outagesIdentify weaknesses in the network to uncover the need for additional redundancy
daily changes to network infrastructureBusiness needs sometimes changed very rapidly
modeling, aggregation & troubleshooting
applications to access network data
mapping between the real world and the graph
requirements
Router Router Service Service
D E P E N D S _ O NSwitch Switch Switch Switch Router Router Fiber Link Fiber Link Fiber Link Fiber Link Fiber Link Fiber Link
Oceanfloor Cable Oceanfloor Cable
D E P E N D S _ O N DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON DEPENDS_ON D E P E N D S _ O N LINKED LINKED L I N K E D DEPENDS_ONGlobal (U.S., France)
Hewlett Packard Industry: Web/ISV, Communications Use case: Network Management
problems causes on the network
System alarms by event correlation
services
key application inside HP’s OSS Assurance portfolio
determination, root cause & service impact analysis
and fast changing networks
Oslo, Norway
Telenor Industry: Communications Use case: Resource Authorization & Access Control
minutes while system retrieved access rights
Highly interconnected data set w/massive joins
problem, but meant data was no longer current
workaround projected to reach 9 hours by 2014: longer than the nightly batch window
Nordics
manage employee subscriptions and plans
responsiveness is critical to customer satisfaction
the domain is inherently a graph
responses: measured in milliseconds
retention risks fully mitigated
Subscripti
Subscripti
Account Account Customer Customer Customer Customer
SUBSCRIBED_BY CONTROLLED_BY PART_OF
User User
USER_ACCESS
Zürich, Switzerland
Junisphere Industry: Web/ISV, Communications Use case: Data Center Management
complex graph, covering: business processes--> business services--> IT infrastructure
into OEM application
relational database, with Java
Starts with the end user, mapped to business processes and services, and dependent infrastructure
as a graph
lost in translation”
language, and can model the domain with the database on a 1:1 ratio.”
San Francisco, CA
Teachscape Industry: Education Use case: Resource Authorization & Access Control
around Neo4j, will be used to support single sign-on, user management, contract management, and end-user access to their subscription entitlements.
teachers, school principals, and other instructional leaders.
MySQL and Oracle.
provides a more natural fit for managing organizational hierarchy and access to assets.
relationships are relatively complex.
support, and well-implemented Lucene indexing integration
complex walks quickly and efficiently. This was a major factor in the decision to use Neo4j.
commercial off-the-shelf applications, as well as ad-hoc use.
than relational
7 7 7 7
Recommendations MDM Systems Management Geospatial Social computing Business intelligence Biotechnology Making Sense of all that data your brain access control linguistics catalogs genealogy routing compensation market vectors
7 8
grab your free pdf version at http://www.graphdatabases.com visit http://www.neo4j.org and http://www.neotechnology.com Mar 06: Training in Zurich
http://www.eventbrite.com/e/graph-data-modeling-with-neo4j-zurich-registration-9741554251
7 8