Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda - PowerPoint PPT Presentation
Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2 3 Stories Direct Live Explore 4 5 Apache Cassandra Highly scalable partitioned data store
Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook
Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2
3
Stories Direct Live Explore 4
5
Apache Cassandra • Highly scalable partitioned data store • High performance • High availability • Tunable consistency 6
Cassandra at Instagram • Thousands of Cassandra servers • 5 DCs • 100+ product use cases • millions of requests per seconds 7
Top Line Metrics • Reliability • 5-9s, requests failure rate < 0.001% • Performance • Write throughput • Read latency 8
Read Latency
Read Latency 60ms 25ms 5ms 10
GC Stalls 2.5% 1% 11
Where do we play 12
Approach 1: GC Tuning 13
Approach 1: GC Tuning Pros: Cons: • No code changes • Hard to tune for both latency and throughput • Highly depend on the work load • Max 20% P99 latency drop 14
Approach 2: Off-heap Data Structures • Memtable • Caches • Indexes • Read/write path • Compaction • … 15
Approach 2: Off-heap Data Structures Pros: Cons: • Incremental improvements • Play with Java unsafe • Easier to be accepted by • Highly depend on the work load community • Max 20% P99 latency drop 16
Approach 3: C++ Storage Engines • Most memory consumed by storage engine • Memtable, compaction, read/write path, etc • Switch existing Java storage engine to C++ implementation • Pluggable storage engine 17
Approach 3: C++ Storage Engines Pros: Cons: • Greatly reduce JVM overhead • Non-trivial effort to make storage engine to be pluggable • CPU efficiency • JNI efficiency • Long term benefit from pluggable storage engine 18
C++ Storage Engine
20
RocksDB • Embedded C++ key-value database • Optimized for Flash with extremely low latencies • Popular storage engine for Mysql, MongoDB, etc • Open source, Apache 2.0 license 21
Prototype • Support single key-value case • Bypass C* own storage engine • No streaming support • Shadow one production use case 22
Prototype Latency 35ms 15ms 2ms-5ms 23
Prototype GC Stalls 1% 0.5% 0.1% 24
RocksDB + Cassandra = Rocksandra
Challenges 1. Cassandra data model 2. Streaming 26
Design: Data Model 27
Key Encoding 28
Key Encoding 29
Value Encoding 30
Merge operator/Compaction filter 31
Streaming 32
Feature Milestone Current Features: Future Features: • Most of non-nested data types • Multi-partition query • Table data model • Nested data types • Point query • Counters • Range query • Range tombstone • Mutations • Materialized views • Timestamp • Secondary indexes • TTL • Repair • Deletions/Cell tombstones 33
Performance metrics
Cluster A • Similar P99 read/write latency • Footprint reduced to 1/3 35
Cluster B • P99 read latency reduced 3X (60ms to 20ms) • Footprint reduced to 60% 36
Cluster C • High write and large fanout read • P99 read latency reduced from 1s to 10ms • Same footprint 37
Benchmark on AWS • C* cluster in one us-west-2a, replication factor 1. • 3 i3.8xlarge EC2 instance: 256GB memory, 32 core CPU, raid0 with 4 nvme flash disk • NDBench cluster, https://github.com/Netflix/ndbench, run from same AZ 38
Benchmark Metrics 39
Benchmark Metrics 40
Benchmark Metrics 41
Benchmark Metrics 42
Recap Switch to Rocksandra helped us: • Cuts down tail latency • Improves throughput 43
Try it! Don’t just believe what we said, download from github.com/instagram • Rocksandra code • Benchmark cloud formation template and scripts 44
Future work • Support more Cassandra features • Cassandra pluggable storage engine 45
Acknowledgement Thanks for all the support from Cassandra community and RocksDB community 46
Thank You!
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.