HyperLoop: NIC Offloaded Primitives to Accelerate Replicated - - PowerPoint PPT Presentation

hyperloop nic offloaded primitives to accelerate
SMART_READER_LITE
LIVE PREVIEW

HyperLoop: NIC Offloaded Primitives to Accelerate Replicated - - PowerPoint PPT Presentation

HyperLoop: NIC Offloaded Primitives to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu Jitendra Padhye, Shachar Raindel, Steven Swanson, Vyas


slide-1
SLIDE 1

HyperLoop: NIC Offloaded Primitives to Accelerate Replicated Transactions in Multi-tenant Storage Systems

Daehyeok Kim

Amirsaman Memaripour, Anirudh Badam, Yibo Zhu, Hongqiang Harry Liu Jitendra Padhye, Shachar Raindel, Steven Swanson, Vyas Sekar, Srinivasan Seshan Presented at ACM SIGCOMM 2018

slide-2
SLIDE 2

Multi-tenant Storage Systems

Replica servers Storage frontend

2

  • Replicated transactions

§ Data availability and integrity § Consistent and atomic updates § e.g., Chain replication

  • Multiple replica instances are

co-located on the same server

slide-3
SLIDE 3

Problem: Large and Unpredictable Latency

  • Both average and tail latencies increase
  • Gap between average and tail latencies increases

3

40 80 120 160 9 12 15 18 21 24 27 Latency (ms) Number of tenants on a server Average 99th percentile YCSB on

slide-4
SLIDE 4

Run by CPUs

CPU Involvement on Replicas

4

  • CPU involvement for executing and forwarding transactional operations
  • Arbitrary CPU scheduling à Unpredictable latency
  • Replicas’ CPU utilization hits 100%

Storage/Net Library NIC Storage/Net Library NIC Storage Storage/Net Library NIC Storage Frontend Software Replica Software Replica Software

Group of replicas Logging DATA

Log

DATA DATA ACK

  • 1. Execute Logging
  • 2. Forward Logging
  • 1. Execute Logging
  • 2. Forward ACK

Critical path of operations

DATA DATA Logging DATA DATA

slide-5
SLIDE 5

Our Goal

5

Storage/Net Library NIC Storage/Net Library NIC Storage Storage/Net Library NIC Storage Frontend Software Replica Software Replica Software NIC NIC Storage NIC Storage Frontend Software Replica Software Replica Software

Today’s storage system Removing replica CPUs from the critical path!

Storage/Net Library Storage/Net Library Storage/Net Library

Run by CPUs Critical path of operations

DATA DATA ACK

slide-6
SLIDE 6

Our Work: HyperLoop

  • Framework for building fast chain replicated transactional storage

systems enabled by three key ideas:

  • 1. RDMA NIC + NVM
  • 2. Leveraging the programmability with RDMA NICs
  • 3. APIs covering key transactional operations
  • Minimal modifications for existing applications
  • e.g., 866 lines for MongoDB out of ~500K lines of code
  • Up to 24x tail latency reduction in storage applications

6

slide-7
SLIDE 7

Outline

  • Motivation
  • HyperLoop Design
  • Implementation and Evaluation

7

slide-8
SLIDE 8

Idea 1: RDMA NIC + NVM

  • RDMA (Remote Direct Memory Access) NICs
  • Enables direct memory access from the network without CPUs
  • NVM (Non-Volatile Memory)
  • Provides a durable storage medium for RDMA NICs

8

Replica Software RDMA NIC NVM NVM/RDMA Library Replica Software NIC Storage Storage/Net Library

slide-9
SLIDE 9

Roadblock 1: Operation Forwarding

  • RDMA NICs can execute the logging operation
  • CPUs are still involved to request the RNICs to forward operations

9

NVM/RDMA Library RNIC NVM/RDMA Library RNIC NVM NVM/RDMA Library RNIC NVM Frontend Software Replica Software Replica Software

Group of replicas Logging DATA (WRITE)

Log DATA DATA

Run by CPUs

  • 1. Execute Logging
  • 2. Forward Logging
  • 1. Execute Logging
  • 2. Forward ACK
slide-10
SLIDE 10

Roadblock 2: Transactional Operations

  • CPUs are involved to execute and forward operations
  • RDMA NIC primitives do not support some key transactional operations (e.g.,

locking, commit)

10

NVM/RDMA Library RNIC NVM/RDMA Library RNIC NVM NVM/RDMA Library RNIC NVM Frontend Software Replica Software Replica Software

Commit log

Log DATA Data DATA DATA DATA

Run by CPUs Group of replicas

  • 1. Execute Commit
  • 2. Forward Commit
  • 1. Execute Commit
  • 2. Forward ACK
slide-11
SLIDE 11

Can We Avoid the Roadblocks?

11

Storage/Net Library NIC Storage/Net Library NIC Storage Storage/Net Library NIC SSD Frontend Software Replica Software Replica Software

Today’s storage system

NIC RNIC NVM RNIC NVM Frontend Software Replica Software Replica Software Storage/Net Library Storage/Net Library Storage/Net Library

Pushing replication primitives to RNICs!

Offload? Critical path of operations

slide-12
SLIDE 12

Idea 2: Leveraging the Programmability of RNICs

  • Commodity RDMA NICs are not fully programmable
  • Opportunity: RDMA WAIT operation
  • Supported by commodity RDMA NICs
  • Allows a NIC to wait for a completion of an event (e.g., receiving)
  • Triggers the NIC to perform an operation upon the completion

12

slide-13
SLIDE 13

13

  • Step 1: Frontend library collects the base addresses of memory regions

registered to replica RNICs

  • Step 2: HyperLoop library programs replica RNICs with RDMA WAIT and the

template of target operation

HyperLoop Library RNIC HyperLoop Library RNIC NVM HyperLoop Library RNIC NVM Frontend Software Replica Software Replica Software

  • 1. WAIT for receiving
  • 2. Forward ACK

Bootstrapping – Program the NICs

  • 1. WAIT for receiving
  • 2. Forward WRITE with

Param (Src, Dst, Len) R1 R2

slide-14
SLIDE 14

Forwarding Operations

  • Idea: Manipulating parameter regions of programmed operations
  • Replica NICs can forward operations with proper parameters

14

HyperLoop Library RNIC HyperLoop Library RNIC NVM HyperLoop Library RNIC NVM Frontend Software Replica Software Replica Software

Update log (WRITE)

  • 1. WAIT for receiving
  • 2. Forward ACK
  • 1. WAIT for receiving
  • 2. Forward WRITE with

Param (Src, Dst, Len) ✔ ✔ Param (R1-0xA, R2-0xB, 64) R1 R2 DATA

R1-0xA

DATA

R2-0xB

WRITE DATA WRITE ACK

Idle CPUs

SEND Param(R1-0xA, R2-0xB, 64) WRITE DATA

slide-15
SLIDE 15

Idea 3: APIs for Transactional Operations

Transactional Operations Memory Operations HyperLoop Primitives Logging Memory write to log region Group Log Commit Memory copy from log to data region Group Commit Locking/Unlocking Compare and swap

  • n lock region

Group Lock/Unlock

15

See our SIGCOMM paper for details!

slide-16
SLIDE 16

HyperLoop Library Frontend Software

Transactions with HyperLoop Primitives

16

HyperLoop Library RNIC NVM HyperLoop Library RNIC NVM Replica Software Replica Software

  • 1. Update log (Group Log)
  • 2. Grab a lock (Group Lock)
  • 3. Commit the log (Group Commit)
  • 4. Release the lock (Group Unlock)

Log Data RNIC

Replica NICs can execute and forward operations!

slide-17
SLIDE 17

Outline

  • Motivation
  • HyperLoop Design
  • Implementation and Evaluation

18

slide-18
SLIDE 18

Implementation and Case Studies

  • HyperLoop library
  • C APIs for HyperLoop group primitives
  • Modify user-space RDMA verbs library and NIC driver
  • Case Studies
  • RocksDB: Add the replication logic using HyperLoop library

(modify 120 LOCs)

  • MongoDB: Replace the existing replication logic with HyperLoop library

(modify 866 LOCs)

19

slide-19
SLIDE 19

Result Highlights

  • Latency reduction for memory operations on a group:
  • Write: ~801.8x
  • Memory copy: ~848x
  • Tail latency reduction for RocksDB: 5.7 – 24.2x
  • Latency reduction regardless of group sizes
  • Zero CPU utilization in the data path

20

slide-20
SLIDE 20

Summary

  • Predictable low latency is lacking in replicated storage systems
  • Root cause: CPU involvement on the critical path
  • Our solution: HyperLoop

–Offloads transactional operations to commodity RNICs + NVM –Minimal modifications for existing applications

  • Result: up to 24x tail latency reduction in in-memory storage apps
  • More opportunities

–Other data center workloads –Efficient remote memory utilization

21