Consistency of NoSQL Models Au Tran, Thy Nguyen, Chaz Chang, - - PowerPoint PPT Presentation

consistency of nosql models
SMART_READER_LITE
LIVE PREVIEW

Consistency of NoSQL Models Au Tran, Thy Nguyen, Chaz Chang, - - PowerPoint PPT Presentation

Consistency of NoSQL Models Au Tran, Thy Nguyen, Chaz Chang, Vijaypal Singh, Timothy To, Akash Budholia Introduction: From RBDMS to NoSQL In the past, ACID (Atomicity, Consistency, Isolation, and Durability) was a must have requirement for


slide-1
SLIDE 1

Consistency of NoSQL Models

Au Tran, Thy Nguyen, Chaz Chang, Vijaypal Singh, Timothy To, Akash Budholia

slide-2
SLIDE 2

Introduction: From RBDMS to NoSQL

  • In the past, ACID (Atomicity, Consistency, Isolation, and Durability) was a

must have requirement for all traditional monolithic database systems. Strong consistency was a must have for any database system, which only

  • ffers vertical-scalability and prevents horizontal-scalability.
  • As the demand grow, the need to scale for high availability become
  • necessary. For this reason, strong consistency can no longer be enforced

and databases must relax their consistency levels. Therefore, the NoSQL databases systems have emerged.

Strong consistency High Availability

slide-3
SLIDE 3

Introduction: Sub-categories of NoSQL Databases

Redis Key-value store Cassandra Column store MongoDB Document Neo4j Graph database OrientDB Multi-model In this study, we compare the consistency models of five most popular non-cloud database systems: Redis, Cassandra, MongoDB,

  • rientDB and Neo4j.
slide-4
SLIDE 4

Introduction: Data-centric and Client-centric

slide-5
SLIDE 5

Consistency Models

We will review 8 main consistency models:

  • Strong consistency
  • Weak consistency
  • Eventual consistency
  • Causal consistency
  • Read-your-writes consistency
  • Session consistency
  • Monotonic Reads consistency
  • Monotonic Writes consistency
slide-6
SLIDE 6

Strong consistency vs Weak consistency

Strong consistency (a.k.a Linearization)

  • Operation: must be

committed immediately => events in order and same data state for all clients

  • Read operation: After all write

commits are done => new version of data Weak consistency

  • Does not guarantee specific
  • rder of events
  • Read Operation: does not

guarantee to have the most updated value

  • Inconsistency window: The

time period between the write

  • peration and when every read
  • peration returns the updated

value

slide-7
SLIDE 7

Eventual consistency

  • Eventual consistency strengths Weak Consistency.
  • In this model, it is possible for read operations to retrieve the
  • lder version instead of the latest one, like Weak Consistency

while the replicas converge to the same data state

  • However, after the inconsistency window, the latest data will be

retrieved.

Strong consistency Weak Consistency Eventual Consistency

slide-8
SLIDE 8

Causal Consistency

  • If some process updates a given object:

○ Processes acknowledge the update: get updated value ○ Processes do not acknowledge the update: follow Eventual Consistency Model Weak Consistency Eventual Consistency Causal Consistency Sequential Consistency

slide-9
SLIDE 9

Read-your-writes Consistency

  • Read-your-writes consistency allows ensuring that a replica is at least

current enough to have the changes made by a specific transaction.

  • If some process updates a

given object, this same process will always consider the updated value.

  • Other processes will

eventually read the updated value after the inconsistency window

slide-10
SLIDE 10

Session Consistency

  • In the context of the existence of a session, read-your-writes

consistency model will be applied.

  • All reads are current writes from that session, but writes from other

sessions may lag.

  • Data from other sessions come in the correct order, just isn’t guaranteed

to be current.

  • Good performance and good availability at half the cost of strong

consistency

slide-11
SLIDE 11

Monotonic Reads Consistency

  • After a process reads some value, all the

successive reads will return that same value

  • r a more recent one.
  • Monotonic reads ensure that if a process

performs read x1, then x2, then x2 cannot

  • bserve a state prior to the writes which

were reflected in x1; intuitively, reads cannot go backward.

  • Monotonic

reads do not apply to

  • perations

performed by different processes, only reads by the same process.

slide-12
SLIDE 12

Monotonic Writes Consistency

  • A write operation invoked by a process on

a given object needs to be completed before any subsequent write operation

  • n the same object by the same process.
  • Monotonic writes ensure that if a process

performs write w1, then w2, then all processes observe w1 before w2.

  • Monotonic writes do not apply to
  • perations performed by different

processes, only writes by the same process.

slide-13
SLIDE 13

Redis

Description from the official website (https://redis.io/): "Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries"

slide-14
SLIDE 14

Redis - Background

  • Key-value store
  • Optimizes data in memory by:

○ prioritizing high performance ○ low computation complexity ○ high memory space efficiency ○ low application network traffic

  • Guarantees high availability by extending its architecture and introducing

the Redis Cluster

  • Strong consistent on a single instance configuration
  • Eventual Consistent in a cluster when the client reads from replica nodes
slide-15
SLIDE 15

Redis - Cluster specification

  • High performance and linear scalability up to 1000 nodes.
  • Relaxed write guarantees: Redis Cluster tries its best to retain all write
  • perations issued by the application, but some of these operations can be

lost.

  • Availability: Redis Cluster survives network partitions as long as the

majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable.

slide-16
SLIDE 16

Redis - Keys and master-slave model

  • Redis Cluster distributes keys into 16384 hash slots.
  • Each master stores a subset of the 16384 slots.
  • To compute the hash slot of a given key, the formula below is used

(CRC16 used as a hash algorithm): HASH_SLOT = CRC16(key) mod 16384

  • Architecture implements a master-slave model without proxies which

means that the application is redirected to the node that has the requested data. Redis nodes do not intermediate responses.

  • Each master node holds a hash slot. This slot has 1 to N replicas (including

the master and its replica nodes).

slide-17
SLIDE 17

Redis - Hash tags

  • Hash tags ensure that two keys are allocated in the same slot
  • Allows for multi-key operations
  • Part of the key has to be a common substring between the two keys and

inside brackets

  • These two keys end up in the same slot because only the substring inside

the brackets will be hashed {user:1000}following {user:1000}followers

slide-18
SLIDE 18

Redis - Redis Cluster

  • Redis Cluster is formed by N nodes

connected by TCP connections.

  • Each node has N-1 outgoing

connections and N-1 incoming connections.

  • A connection is kept alive as long

as the two connected nodes live. TCP connection Node

slide-19
SLIDE 19

Redis - Asynchronous replication and writes

  • When a master node receives

an application issued request, it handles it and asynchronously propagates any changes to its replicas.

  • master node by default

acknowledges the application without an assured replication.

S M W1 W1 S M W2 W1 OK to app

slide-20
SLIDE 20

Redis - Asynchronous replication and writes

  • On the asynchronous replication configuration (default), if the master

node dies before replicating and after acknowledging the client, the data is permanently lost. Therefore, the Redis Cluster is not able to guarantee write persistence at all times.

  • This behavior can be overwritten by explicitly making a request using the

WAIT command, but this profoundly compromises performance and scalability— the two main strong points of using Redis.

slide-21
SLIDE 21

Redis - Node failure

  • Suppose we have a master node A

and a single replica A1.

  • If A fails, A1 is promoted to master,

and the cluster will continue to

  • perate.
  • However, if A has no replicas or A and

A1 fail at the same time, the Redis Cluster will not be able to continue

  • perating.

S M A A1 M A A1

slide-22
SLIDE 22

Redis - Network partition

S M A A1 S A2 M S A A1 S A2

slide-23
SLIDE 23

Redis - Network partition

  • In the case of a network partition event, if the client is on the minority side

with master A, while on the majority side resides its replicas A1 and A2, if the partition holds for too long (NODE_TIMEOUT) the majority side starts an election process to elect a new master among them, either A1 or A2.

  • Node A is also aware of the timeout and its role change from master to
  • slave. Consequently, it will refuse any further write operations from the

client.

  • In this case, Redis Cluster is not the best solution for applications that

require high-availability, such as large network partition events.

slide-24
SLIDE 24

Redis - Replica migration

S M B B1 S B2 S M A A1 S M B A2 S B1 S M A A1

slide-25
SLIDE 25

Redis - Replica migration

  • Suppose that the majority side has N nodes and A and B and its replicas,

A1, B1, and B2, respectively, and a network partition event occurs in such way that the replica A1 is separated from the rest.

  • If the partition lasts long enough for assuming A1 as unreachable, Redis

Cluster uses a strategy called replicas migration to reorganize the cluster and because B has multiple slaves, one of B’s replicas will now replicate from A and not from B.

slide-26
SLIDE 26

Redis - Replica node read

  • There is also a possibility of reading from

replica nodes, instead of from master nodes in

  • rder to achieve a more read-scaled system.
  • By using the READONLY command, the client

assumes the possibility of reading stale data which is reasonable for situations where having the latest data is not critical.

  • Therefore, leads to an eventual consistency

model.

S M W2 W1 read W2

slide-27
SLIDE 27

Cassandra

  • Column-based NoSQL store. Initially developed by Facebook to improve

their Inbox Search performance

  • Built with distributed systems in mind. Cassandra is on the AP (Availability

& Partition Tolerance) side of the CAP Theorem.

  • Can be configured to be a CP (Consistency & Partition Tolerance)

database

  • Become a strong consistent database when subjected to network

partitions

slide-28
SLIDE 28

Cassandra

  • On default configuration, Cassandra is an AP database (Client may read

inconsistent data) but can be modified to behave like a CP database

slide-29
SLIDE 29

Cassandra

  • Describes data with columns
  • A keyspace, corresponding to the database, is composed of

column-families

  • A column-family represent a class of objects such as Car or Person. Each

column-family has variable entries of objects call rows.

  • A row is identified by the partition key or the column key and hold an

arbitrarily large amount of columns.

  • One column contain a key-value pair and a timestamp to resolve

consistency conflicts

slide-30
SLIDE 30

Cassandra

  • Scales up by distributing data across a cluster, or set, of nodes
  • Every node can handle client request. If the request data is not on

the node, the node become the “coordinator” responsible for retrieving the data from neighboring nodes and answering back to application

  • Partition data by hashing the row key, such that the largest hash

value wrapped over the smallest hash value to form a ring

slide-31
SLIDE 31

Cassandra Write Consistency Levels

Designed to be eventually consistent, high availability, and low-latency. However, write consistency levels can be modify using configuration constants to satisfy user requirements

  • ALL - Write succeeds on all replica nodes in the cluster before they respond to the client.

(Strong Consistency, high latency)

  • ONE - Write succeeds on one replica node before responding to client (Eventual consistency, low

latency)

  • LOCAL_ONE - Write succeed son one replica node in the same data center as coordinator node

before responding to client (Eventual consistency, low latency)

  • ANY - A single replica may respond, or the coordinator may store a hint. If a hint is stored, the

coordinator will later attempt to replay the hint and deliver the mutation to the replicas. This consistency level is only accepted for write operations.

slide-32
SLIDE 32

Cassandra Write Consistency Levels

  • Quorum - Write succeed on a given number of replica nodes before responding

back to client, the number is called the quorum. (Eventual consistency, low latency)

  • LOCAL_QUORUM - Write succeed on a given quorum of replica nodes in the

same data center as the coordinator node. (Eventual Consistency, Low Latency)

slide-33
SLIDE 33

Cassandra Read Consistency Levels

Similar to Write Consistency Levels, the following configuration constants describe Cassandra read consistency levels:

  • ALL - Require all replica nodes to confirm the data before responding to the client. (Strong

consistency, less availability)

  • ONE - Retrieves the data from the first node to respond and returns that data to client.

(Eventual consistency, high availability)

  • LOCAL ONE - Retrieves data from the first node to respond in the same datacenter and

returns that data to client. (Eventual consistency, high availability)

  • QUORUM - Require a given number of nodes to respond to read request before responding

to client, that number is the QUORUM. (Eventual consistency, high-availability)

  • LOCAL QUORUM - Require a given number of nodes in the same datacenter to response to

read request before responding to client. (Eventual consistency, high-availability)

slide-34
SLIDE 34

Cassandra Read Repair

  • On read configuration levels, other than ONE or LOCAL_ONE, Cassandra uses a Read Repair

routine to improve consistency.

  • A read request will performs a check on all of the queried replica nodes. For any replica nodes

with out-of-date value, Cassandra will issues read-repair requests to those nodes and update them to the latest value. After all Read-repair requests are done, then the Coordinator node responds back to the client

slide-35
SLIDE 35

Cassandra Read Repair Chance

If your read consistency level is set to ONE or LOCAL_ONE, the coordinator node only looks for one node to respond to its request. Since only one version of the data is checked, Read Repair requests does not happen normally.

  • A read repair chance mechanism is in place for consistency level ONE or

LOCAL_ONE.

  • Given a Read Repair Chance of 10% and a Replication Factor of 3.
  • Approximately 10% of the reads will trigger a Read Repair requests and

make sure the latest data is propagated to all 3 replicas.

slide-36
SLIDE 36

Cassandra - Probabilistically Bounded Staleness

  • Developed by UC Berkeley
  • A simulator to determine long it takes to reach 100% consistency under

different configurations or write and read consistency levels

  • The graph represents the probability of a client request receiving the

latest version of data over time (ms) for a given combination of available cluster hosts, read quorum, and write quorum.

  • We denote available cluster hosts as (N), read quorum as (R) , and write

quorum as (W)

  • All configurations for this simulation assume a Replication Factor above 1
slide-37
SLIDE 37

Cassandra - Probabilistically Bounded Staleness

ALL Write Consistency Level or ALL Read Consistency Level

  • All nodes must respond before write/read requests respond to client
  • Strong consistency
slide-38
SLIDE 38

Cassandra - Probabilistically Bounded Staleness

ONE Read Consistency Level and QUORUM Write Consistency Level

  • Need 3 node to respond to write and 1 node to respond to read
  • Eventually Consistent with low chance of inconsistent data being returned
slide-39
SLIDE 39

Cassandra - Probabilistically Bounded Staleness

QUORUM Read Consistency Level and ONE Write Consistency Level

  • Read request requires 3 node to respond while write request requires one
  • Time to reaches 100% consistency is only 4ms, much shorter than other configurations
  • Eventually Consistent
slide-40
SLIDE 40

Cassandra - Probabilistically Bounded Staleness

ONE Read Consistency Level and ONE Write Consistency Level

  • One node to respond to both read/write requests
  • Significantly higher chance of returning out-of-date data compared to other configuration
  • Can shorten time it takes to reach 100% consistency by increasing read repair chance and lowering Replication Factor
  • ‘Strongest’ form of eventual consistency in Cassandra
slide-41
SLIDE 41

MongoDB

  • Inspired by the limitations of RDBMS
  • Expressive Query Language, secondary indexes, strong consistency

are taken from RDBMS

  • Schema-less and easier horizontal scalability are the NoSQL

concepts

  • Document based data model
  • Documents are BSON format
  • Related documents are organized as collections
slide-42
SLIDE 42

MongoDB Sharding

  • Sharding allows horizontal scalability
  • Allows data to be distributed among many

nodes

  • Query router is responsible for redirecting

queries to the correct shard depending on the sharding strategy and shard value

  • Sharding uses config servers and mongos

to carry out its operations

slide-43
SLIDE 43

MongoDB Sharding Strategy

  • Range-based Sharding:

○ Documents are distributed based on the shared-key values. ○ Shard-key value close to each other will most likely end up on the same shard.

slide-44
SLIDE 44

MongoDB Sharding Strategy

  • Hash-based Sharding:

○ MD5 hash is used to hash the shard-keys ○ The idea behind hash-based is to distribute the data among shard evenly ○ Could be slower for ranged based queries ○ Good for monotonically increasing ids.

slide-45
SLIDE 45

MongoDB Sharding Strategy

  • Location-aware Sharding:

○ Users can specify a custom configuration to accomplish application requirements. ○ For example, high-demanding data can be stored in-memory and less demanding data on disk

slide-46
SLIDE 46

MongoDB ACID

  • Follows ACID properties similar to RDBMS:
  • Atomicity: supports single operation inserts and updates
  • Consistency: can be used on a strong consistency approach
  • Isolation: While a document is updated, it is entirely isolated. Any error

would result in a rollback operation, and no user will be reading stale data

  • Durability: MongoDB implements a feature called write concern. Write

concern are user-define policies that need to be fulfilled in order to commit

slide-47
SLIDE 47

MongoDB Replication

  • Allows configuring a replica set
  • Replica set has one primary and multiple secondary replica set members
  • A heartbeat or ping is used to check the health of connections in a cluster
  • A primary member of a cluster is elected through an election
  • Election occurs:

○ When a new replica set is initiated ○ Primary steps down ○ Node failover where primary can’t reach majority of the secondaries

slide-48
SLIDE 48

MongoDB Strong Consistency

  • Writes and reads are done from the

primary replica set member.

  • The primary member writes all the
  • perations of a transaction to oplog.
  • After the primary member

acknowledges the application of the committed data and operations logging, secondary replica set members can now read from this log and replay all

  • perations so that they can be on the

same state of the primary member.

  • Since applications can only read

from Primary, all the reads are consistent because read is done from the same node.

slide-49
SLIDE 49

MongoDB Write Concern

Allows designers to configure how many nodes the data need to be committed before acknowledging it complete.

  • Write Concern 0 = no acknowledgement
  • WC 1 = only primary needs to acknowledge
  • WC N = N-1 members must replicate to

acknowledge

  • WC majority = majority of the members need to

replicate the data before acknowledging commit

  • WC majority ensures no rollbacks

Figure: The majority write concern in practice.

slide-50
SLIDE 50

MongoDB Eventual Consistency

  • Applications are allowed to read from

secondary replica set members if they do not prioritize reading the latest data.

  • This can be achieved by specifying

secondary in read preference on query.

  • Reads from secondaries may return

data that does not reflect the state of the data on the primary

slide-51
SLIDE 51

MongoDB Node Failover

  • If primary fails, an election occurs and secondary replicas elect new

primary

  • Raft consensus algorithm is used to elect new primary.
  • The replica with most updated data which can reach majority of the

nodes gets elected as primary and is then responsible for updating the

  • plog read by secondary members.
  • if primary recovers from the failover, it becomes secondary member of

the replica set.

slide-52
SLIDE 52

MongoDB Oplog

  • Oplog has a configurable back-limit history.
  • It has 5% of the available disk space dedicated to store logs of

transactions.

  • In a case where secondary member fails to keep up with the primary and

the required transaction logs in the Oplog that the secondary to recover are replaced with newer transactions from primary, all the databases, collections and indexes directives are copied from the primary member or another secondary member.

  • The same process is done when a new member joins a replica set.
slide-53
SLIDE 53

Neo4j: Schema Design

  • Neo4j is schema optional. It is not necessary to create indexes and
  • constraints. Nodes, relationships, and properties can be created without

defining a schema.

  • Labels define domains by grouping nodes into sets. Nodes that have the

same label belongs to the same set. For example, all nodes representing cars could be labeled with the same label: Car. This allows Neo4j to perform operations only within a specific label, such as finding all cars with a given brand.

slide-54
SLIDE 54

Neo4j : Introduction

  • Neo4j is a reliable, scalable and

high-performing native graph database

  • Its proper ACID characteristics is

a foundation of data reliability

  • Neo4j ensures that operations

involving the modification of data happen within a transaction to guarantee consistent data.

slide-55
SLIDE 55

Neo4j: The Graph NoSQL Database

  • In Neo4j, a graph is defined by a node and a relationship. A node

represents an entity (i.e., the entity Person). It can have several node

  • attributes. (i.e., the Person with the name “Alice”). Two entities can be

linked by a relationship (i.e., the Person with name “Alice” likes the Person with name “Bob”). Relationships can also have properties

  • Neo4j uses linked lists of fixed size record on disk. Each property record

holds a key/value. Each node or relationship references its first property

  • record. Relationships are stored in a doubly linked list. A node references

its first relationship.

slide-56
SLIDE 56

Neo4j: Schema Design

  • Neo4j is schema optional. It is not necessary to create indexes and
  • constraints. Nodes, relationships, and properties can be created without

defining a schema.

  • Labels define domains by grouping nodes into sets. Nodes that have the

same label belongs to the same set. For e.g., all nodes representing cars could be labeled with the same label: Car. This allows Neo4j to perform

  • perations only within a specific label, such as finding all cars with a given

brand.

slide-57
SLIDE 57

Neo4j : Causal Consistency

Neo4j’s Causal Clustering provides three main features:

1.

Safety: Core Servers provide a fault tolerant platform for transaction processing which will remain available while a simple majority of those Core Servers are functioning.

2.

Scale: Read Replicas provide a massively scalable platform for graph queries that enables very large graph workloads to be executed in a widely distributed topology.

3.

Causal consistency: when invoked, a client application is guaranteed to read at least its own writes.

slide-58
SLIDE 58

Neo4j : Operational Overview

  • From an operational point of view, it is

useful to view the cluster as being composed of servers with two different roles: Cores and Read Replicas

  • The two roles are foundational in any

production deployment but are managed at different scales from one another and undertake different roles in managing the fault tolerance and scalability of the

  • verall cluster.
slide-59
SLIDE 59

Neo4j: Core Servers

  • The main responsibility of Core Servers is to safeguard data.
  • Raft ensures that the data is safely durable before confirming transaction

commit to the end user application. Majority of Core Servers in a cluster (N/2+1) have accepted the transaction, it is safe to acknowledge the commit to end user application.

  • In a Core Server cluster, there are enough nodes to provide sufficient fault

tolerance for the specific deployment. This is calculated with the formula M = 2F + 1, where M is the number of Core Servers required to tolerate F faults.

  • If cluster suffer enough Core failures then it can no longer process writes

and it will become read-only to preserve safety.

slide-60
SLIDE 60

Neo4j: Read Replicas

  • The main responsibility of Read Replicas is to scale out graph workloads.
  • Read Replicas act like caches for the graph data that the Core Servers

safeguard and are fully capable of executing arbitrary (read-only) queries and procedures.

  • Read Replicas are asynchronously replicated from Core Servers via

transaction log shipping. They will periodically poll an upstream server for new transactions and have these shipped over.

  • Losing a Read Replica does not impact the cluster’s availability, aside from

the loss of its fraction of graph query throughput.

slide-61
SLIDE 61

Neo4j: Causal Consistency

  • While the operational mechanics of the cluster are

interesting from an application point of view

  • In an application we typically want to read from the

graph and write to the graph

  • Causal consistency makes it possible to write to Core

Servers (where data is safe) and read those writes from a Read Replica (where graph operations are scaled out)

  • On executing a transaction, the client can ask for a

bookmark which it then presents as a parameter to subsequent transactions

slide-62
SLIDE 62

OrientDB

  • Multi-Model NoSQL database
  • Main data models are graphs and documents, but also support key/value

model

  • Supports strong, session and eventual consistencies through choice of

client load balancing configurations

  • Primarily favors the CA (Consistency and Availability) side of CAP Theorem
slide-63
SLIDE 63

OrientDB - Replication

  • Originally employed a master-slave strategy, resulting in a not-so-scalable

architecture

  • Swapped to multi-master replication
  • Multi-master replication: data stored by group and can be updated by any

member of group

  • System then propagates data to rest of members and deals with conflicts

caused by concurrent changes

slide-64
SLIDE 64

OrientDB - Sharding

  • Sharding is done at class level

using multiple clusters per class where each cluster has own list of server where data is replicated

  • All records stored in clusters are

part of the same class

  • Can also have multiple servers

per cluster where first server is it’s master

  • For each server, their records are

will be copied across all of them

slide-65
SLIDE 65

OrientDB - Strong Consistency

  • Default Configuration: Sticky

configuration

  • Client hits the same master over

and over

  • Stays connected to same server

until the DB closes

  • In exchange for high consistency,

it sacrifices performance and has high latency

slide-66
SLIDE 66

OrientDB - Session Consistency

  • Round Robin Connect

Configuration

  • Client connects to a different

server at each connection following a round robin schedule

  • Obviously strong performance

and availability provided it’s in the same session, but doesn’t have strong consistency with other sessions.

slide-67
SLIDE 67

OrientDB - Eventual Consistency

  • Round Robin Request

Configuration

  • Client connects to a different

server at each request following a round robin schedule

  • Consistency takes a while, but

it’s low latency

  • Has the same scaling limitations

like MongoDB in this configuration

slide-68
SLIDE 68

OrientDB - Concurrency

  • Uses Optimistic Concurrency Control
  • Used for both Atomic Operations and Transactions
  • Atomic Operations uses Multi-Version Concurrency Control (MVCC) in

OrientDB

  • Transactions don’t use locks, but checks record version to see if there

have been updates from other clients

slide-69
SLIDE 69

OrientDB - Multi Version Concurrency Control

  • Occurs when two threads

attempting to update same record

  • Every update increments version

number on record

  • If thread updating record doesn’t

have the newest version number, fails to update and returns exception

slide-70
SLIDE 70

Summary

We discussed different consistency implementations of several NoSQL databases.

slide-71
SLIDE 71

Summary

Below is where the default configurations of each database system stands on the CAP theorem

  • Neo4j, OrientDB favors strong consistency and

availability

  • Cassandra favors availability, low latency and

network partition

  • MongoDB and Redis favor strong consistency

and network partition tolerance

slide-72
SLIDE 72

Conclusion

  • Applications want high consistency and partition tolerance at the cost of

availability (writes), MongoDB is the best option

  • Applications that favor high availability and low latency over consistency

will want Cassandra

  • If you have the option of partition intolerance, both Neo4j and OrientDB

can provides high consistency

slide-73
SLIDE 73

References

Consistency Models of NoSQL Databases, Miguel Diogo, Bruno Cabral and Jorge Bernardino, Future Internet 2019, 11, 43