HyperLoop: NIC Offloaded Primitives to Accelerate Replicated - - PowerPoint PPT Presentation
HyperLoop: NIC Offloaded Primitives to Accelerate Replicated - - PowerPoint PPT Presentation
HyperLoop: NIC Offloaded Primitives to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu Jitendra Padhye, Shachar Raindel, Steven Swanson, Vyas
Multi-tenant Storage Systems
Replica servers Storage frontend
2
- Replicated transactions
§ Data availability and integrity § Consistent and atomic updates § e.g., Chain replication
- Multiple replica instances are
co-located on the same server
Problem: Large and Unpredictable Latency
- Both average and tail latencies increase
- Gap between average and tail latencies increases
3
40 80 120 160 9 12 15 18 21 24 27 Latency (ms) Number of tenants on a server Average 99th percentile YCSB on
Run by CPUs
CPU Involvement on Replicas
4
- CPU involvement for executing and forwarding transactional operations
- Arbitrary CPU scheduling à Unpredictable latency
- Replicas’ CPU utilization hits 100%
Storage/Net Library NIC Storage/Net Library NIC Storage Storage/Net Library NIC Storage Frontend Software Replica Software Replica Software
Group of replicas Logging DATA
Log
DATA DATA ACK
- 1. Execute Logging
- 2. Forward Logging
- 1. Execute Logging
- 2. Forward ACK
Critical path of operations
DATA DATA Logging DATA DATA
Our Goal
5
Storage/Net Library NIC Storage/Net Library NIC Storage Storage/Net Library NIC Storage Frontend Software Replica Software Replica Software NIC NIC Storage NIC Storage Frontend Software Replica Software Replica Software
Today’s storage system Removing replica CPUs from the critical path!
Storage/Net Library Storage/Net Library Storage/Net Library
Run by CPUs Critical path of operations
DATA DATA ACK
Our Work: HyperLoop
- Framework for building fast chain replicated transactional storage
systems enabled by three key ideas:
- 1. RDMA NIC + NVM
- 2. Leveraging the programmability with RDMA NICs
- 3. APIs covering key transactional operations
- Minimal modifications for existing applications
- e.g., 866 lines for MongoDB out of ~500K lines of code
- Up to 24x tail latency reduction in storage applications
6
Outline
- Motivation
- HyperLoop Design
- Implementation and Evaluation
7
Idea 1: RDMA NIC + NVM
- RDMA (Remote Direct Memory Access) NICs
- Enables direct memory access from the network without CPUs
- NVM (Non-Volatile Memory)
- Provides a durable storage medium for RDMA NICs
8
Replica Software RDMA NIC NVM NVM/RDMA Library Replica Software NIC Storage Storage/Net Library
Roadblock 1: Operation Forwarding
- RDMA NICs can execute the logging operation
- CPUs are still involved to request the RNICs to forward operations
9
NVM/RDMA Library RNIC NVM/RDMA Library RNIC NVM NVM/RDMA Library RNIC NVM Frontend Software Replica Software Replica Software
Group of replicas Logging DATA (WRITE)
Log DATA DATA
Run by CPUs
- 1. Execute Logging
- 2. Forward Logging
- 1. Execute Logging
- 2. Forward ACK
Roadblock 2: Transactional Operations
- CPUs are involved to execute and forward operations
- RDMA NIC primitives do not support some key transactional operations (e.g.,
locking, commit)
10
NVM/RDMA Library RNIC NVM/RDMA Library RNIC NVM NVM/RDMA Library RNIC NVM Frontend Software Replica Software Replica Software
Commit log
Log DATA Data DATA DATA DATA
Run by CPUs Group of replicas
- 1. Execute Commit
- 2. Forward Commit
- 1. Execute Commit
- 2. Forward ACK
Can We Avoid the Roadblocks?
11
Storage/Net Library NIC Storage/Net Library NIC Storage Storage/Net Library NIC SSD Frontend Software Replica Software Replica Software
Today’s storage system
NIC RNIC NVM RNIC NVM Frontend Software Replica Software Replica Software Storage/Net Library Storage/Net Library Storage/Net Library
Pushing replication primitives to RNICs!
Offload? Critical path of operations
Idea 2: Leveraging the Programmability of RNICs
- Commodity RDMA NICs are not fully programmable
- Opportunity: RDMA WAIT operation
- Supported by commodity RDMA NICs
- Allows a NIC to wait for a completion of an event (e.g., receiving)
- Triggers the NIC to perform an operation upon the completion
12
13
- Step 1: Frontend library collects the base addresses of memory regions
registered to replica RNICs
- Step 2: HyperLoop library programs replica RNICs with RDMA WAIT and the
template of target operation
HyperLoop Library RNIC HyperLoop Library RNIC NVM HyperLoop Library RNIC NVM Frontend Software Replica Software Replica Software
- 1. WAIT for receiving
- 2. Forward ACK
Bootstrapping – Program the NICs
- 1. WAIT for receiving
- 2. Forward WRITE with
Param (Src, Dst, Len) R1 R2
Forwarding Operations
- Idea: Manipulating parameter regions of programmed operations
- Replica NICs can forward operations with proper parameters
14
HyperLoop Library RNIC HyperLoop Library RNIC NVM HyperLoop Library RNIC NVM Frontend Software Replica Software Replica Software
Update log (WRITE)
- 1. WAIT for receiving
- 2. Forward ACK
- 1. WAIT for receiving
- 2. Forward WRITE with
Param (Src, Dst, Len) ✔ ✔ Param (R1-0xA, R2-0xB, 64) R1 R2 DATA
R1-0xA
DATA
R2-0xB
WRITE DATA WRITE ACK
Idle CPUs
SEND Param(R1-0xA, R2-0xB, 64) WRITE DATA
Idea 3: APIs for Transactional Operations
Transactional Operations Memory Operations HyperLoop Primitives Logging Memory write to log region Group Log Commit Memory copy from log to data region Group Commit Locking/Unlocking Compare and swap
- n lock region
Group Lock/Unlock
15
See our SIGCOMM paper for details!
HyperLoop Library Frontend Software
Transactions with HyperLoop Primitives
16
HyperLoop Library RNIC NVM HyperLoop Library RNIC NVM Replica Software Replica Software
- 1. Update log (Group Log)
- 2. Grab a lock (Group Lock)
- 3. Commit the log (Group Commit)
- 4. Release the lock (Group Unlock)
Log Data RNIC
Replica NICs can execute and forward operations!
Outline
- Motivation
- HyperLoop Design
- Implementation and Evaluation
18
Implementation and Case Studies
- HyperLoop library
- C APIs for HyperLoop group primitives
- Modify user-space RDMA verbs library and NIC driver
- Case Studies
- RocksDB: Add the replication logic using HyperLoop library
(modify 120 LOCs)
- MongoDB: Replace the existing replication logic with HyperLoop library
(modify 866 LOCs)
19
Result Highlights
- Latency reduction for memory operations on a group:
- Write: ~801.8x
- Memory copy: ~848x
- Tail latency reduction for RocksDB: 5.7 – 24.2x
- Latency reduction regardless of group sizes
- Zero CPU utilization in the data path
20
Summary
- Predictable low latency is lacking in replicated storage systems
- Root cause: CPU involvement on the critical path
- Our solution: HyperLoop
–Offloads transactional operations to commodity RNICs + NVM –Minimal modifications for existing applications
- Result: up to 24x tail latency reduction in in-memory storage apps
- More opportunities
–Other data center workloads –Efficient remote memory utilization
21