Making Non-Distributed Databases, Distributed Ioannis - - PowerPoint PPT Presentation
Making Non-Distributed Databases, Distributed Ioannis - - PowerPoint PPT Presentation
Making Non-Distributed Databases, Distributed Ioannis Papapanagiotou, PhD Shailesh Birari Dynomite Ecosystem Dynomite - Proxy layer Dyno - Client Dynomite-manager - Ecosystem orchestrator Dynomite-explorer - UI Problems &
Dynomite Ecosystem
- Dynomite - Proxy layer
- Dyno - Client
- Dynomite-manager - Ecosystem orchestrator
- Dynomite-explorer - UI
- Needed a data store:
- Scalable & highly available
- High throughput, low latency
- Netflix use case is active-active
- Master-slave storage engines:
- Do not support bi-directional replication
- Cannot withstand a Monkey attack
- Cannot easily perform maintenance
Problems & Observations
What is Dynomite?
A framework that makes non-distributed data stores,
- distributed. Can be used with many key-value storage
engines Features: highly available, automatic failover, node warmup, tunable consistency, backups/restores
Dynomite @ Netflix
- Running around 2.5 years in PROD
- 70 clusters
- ~1000 nodes used by internal microservices
- Microservices based on Java, Python,
NodeJS
Pluggable Storage Engines
RESP
- Layer on top of a non-
distributed key value data store ○ Peer-peer, Shared Nothing ○ Auto-Sharding ○ Multi-datacenter ○ Linear scale ○ Replication ○ Gossiping
RESP
- Each rack contains one
copy of data, partitioned across multiple nodes in that rack
- Multiple Racks == Higher
Availability (HA)
Topology
Replication
- A client can connect to any node on
the Dynomite cluster when sending requests.
- If node owns the data,
▪ data are written in local data-store and asynchronously replicated.
- If node does not own the data
▪ node acts as a coordinator and sends the data in the same rack & replicates to
- ther nodes in other racks
and DC.
Dyno Client - Java API
- Connection Pooling
- Load Balancing
- Effective failover
- Pipelining
- Scatter/Gather
- Metrics, e.g. Netflix Insights
Dyno Load Balancing
- Dyno client employs token
aware load balancing.
- Dyno client is aware of the
cluster topology of Dynomite within the region, can write to specific node using consistent hashing.
Dyno Failover
- Dyno will route
requests to different racks in failure scenarios.
Dynomite on the Cloud
RESP
Moving across engines
Rack A Rack B
Dynomite-manager: Warm up
- 1. Dynomite-manager identifies which node has the same token in the
same DC
- 2. Leverage master/slave replication
- 3. Checks for peer syncing
a. difference between master and slave offset
- 4. Once master and slave are in sync, Dynomite is set to allow write only
- 5. Dynomite is set back to normal state
- 6. Checks for health of the node - Done!
Dynomite-Explorer (UI)
- Node.js web app with a Polymer-based user-interface
- Support Redis’ rich data types
- Avoid operations that can negatively impact Redis server performance
- Extended for Dynomite awareness
- Allow extension of the server to integrate with the Netflix ecosystem
Dynomite-Explorer
Roadmap
- Data reconciliation & repair v2
- Optimizations of RocksDB configuration
- Optimizing backups through SST
- Others….
More information
- Netflix OSS:
- https://github.com/Netflix/dynomite
- https://github.com/Netflix/dyno
- https://github.com/Netflix/dynomite-
manager
- Chat: https://gitter.im/Netflix/dynomite
Dynomite: S3 backups/restores
- Why?
- Disaster recovery
- Data corruption
- How?
- Storage dumps data on the instance drive
- Dynomite-manager sends data to S3 buckets
- Data per node are not large so no need for incrementals.
- Use case:
- clusters that use Dynomite as a storage layer
- Not enabled in clusters that have short TTL or use Dynomite as a
cache
Dynomite-manager
- Token management for multi-region deployments
- Support AWS environment
- Automated security group update in multi-region environment
- Monitoring of Dynomite and the underlying storage engine
- Node cold bootstrap (warm up)
- S3 backups and restores
- REST API
Performance Setup
- Instance Type:
○ Dynomite: i3.2xlarge with NVMe ○ NDBench: m2.2xls (typical of an app@Netflix)
- Replication factor: 3
○ Deployed Dynomite in 3 zones in us-east-1 ○ Every zone had the same number of servers
- Demo app used simple workloads key/value pairs
○ Redis: GET and SET
- Payload
○ Size: 1024 Bytes ○ 80%/20% reads over writes
Throughput
Latencies
Consistency
- DC_ONE
- Reads and writes are propagated synchronously only to the node in local rack
and asynchronously replicated to other racks and data centers
- DC_QUORUM
- Reads and writes are propagated synchronously to quorum number of nodes
in the local data center and asynchronously to the rest. The DC_QUORUM configuration writes to the number of nodes that make up a quorum. A quorum is calculated, and then rounded down to a whole number. If all responses are different the first response that the coordinator received is returned.
- DC_SAFE_QUORUM
- Similarly to DC_QUORUM, but the operation succeeds only if the read/write
succeeded on a quorum number of nodes and the data checksum matches. If the quorum has not been achieved then an error response is generated by Dynomite.
Deploying Dynomite in PROD
- Unit testing in Github
- Building EC2 AMI in “experimental”
- Pipelines for performance analysis
- Promotion to “candidate”
- Beta Testing
- Promotion to “release”
Reconciliation
- Reconciliation is based timestamps (newest wins) and
is performed by a Spark cluster
- Jenkins job to avoid clock skewness
Reconciliation: Design Principles
We would prefer to alleviate the processing load of performing the reconciliation from each node in the cluster and off load it to a high performance computation in memory cluster based on Spark.
Reconciliation: Architecture
- Forcing Redis (or any other storage
engine) to dump data to the disk
- Encrypted communication between
Dynomite and Spark cluster
- Chunking the data - retry in case of
a failure.
- Bandwidth Throttler