with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store - - PowerPoint PPT Presentation

with fp fpgas cas ase stu tudy on on a a
SMART_READER_LITE
LIVE PREVIEW

with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store - - PowerPoint PPT Presentation

Zsolt Istvn * , Gustavo Alonso, Ankit Singla Systems Group, Computer Science Dept., ETH Zrich * Now at IMDEA Software Institute, Madrid Providing Multi-tenant Services with FP FPGAs: Cas ase Stu tudy on on a a Key-Value Store FPGAs in


slide-1
SLIDE 1

Providing Multi-tenant Services with FP FPGAs: Cas ase Stu tudy on

  • n a

a Key-Value Store

Zsolt István*, Gustavo Alonso, Ankit Singla

Systems Group, Computer Science Dept., ETH Zürich

* Now at IMDEA Software Institute, Madrid

slide-2
SLIDE 2

FPGAs in the Cloud

  • Wider adoption of FPGAs (e.g., Amazon F1, Microsoft Catapult, …)
  • Many promising use-cases but often singe-tenant designs
  • Clouds built on sharing and multi-tenancy

❑ High utilization

❑ Flexible provisioning ❑ Load isolation and QoS guarantees

2

slide-3
SLIDE 3

Providing multi-tenancy with FPGAs

Virtualization

  • General purpose (PR)
  • Few tenants
  • Trades off functionality
  • Course grained resource alloc.
  • Tenants “bring” applications

Multi-tenant applications

  • Domain-specific
  • Many tenants
  • Trades off performance (?)
  • Fine grained resource alloc.
  • Provider “brings” application

3

FPGA FPGA

slide-4
SLIDE 4

Multi-tenant application as a service

Key-value store

  • Widely deployed in the cloud and datacenters
  • Different tradeoffs but similar interfaces, e.g.:
  • Memcached – caching, no replication, latency-
  • ptimized, main-memory
  • Amazon S3 – BLOB store, replicated, BW-optimized,

needs large capacity

4

slide-5
SLIDE 5

Building a multi-tenant KVS (Multes)

  • Area well studied in related work
  • Several pipelined designs, all saturate network link
  • Caribou: Interfaces and functionality similar to SW [VLDB17]
  • FPGA can provide replication for fault-tolerance [NSDI16]
  • Requirements for multi-tenancy:
  • Performance isolation
  • Data isolation
  • Flexibility in resource allocation (focus on network bandwidth)
  • Efficient use of resources regardless of number of tenants

5

[VLDB17] Z. István, D. Sidler, G. Alonso Caribou: Intelligent Distributed Storage. [NSDI16] Z. István, D. Sidler, G. Alonso, M. Vukolic: Consensus in a Box: Inexpensive Coordination in Hardware.

slide-6
SLIDE 6

Designing for multi-tenancy

  • Caribou is composed of four modules
  • Requests can take various routes
  • Some traffic is inter-node
  • Hard to reason about load interactions!
  • Multes: Reorganized pipeline to ensure

all requests take same path (1)

  • Hash table implements parts of the

replication log features (multi-version)

  • More coupling between modules (op-

codes)

6

Multes (single pipeline) Memory

Traffic Shaper Traffic Shaper

Network Stack (TCP) Replication Multivers. Hash Table + Allocator Value Access + Processing Caribou Memory Value Access + Processing Hash Table + Allocator Replication + Log Manager Network Stack (TCP)

Replication messages Client messages

slide-7
SLIDE 7

Token buckets

  • Commonly used in networking scenarios
  • Max. number of tokens (D), adding C tokens

every T cycles

  • Limits data rate, burst size
  • Buffer space on the FPGA?
  • Queue commands before data movement
  • Token buckets can be configured with no
  • verhead at runtime (2)
  • Per-tenant allocations controlled by

software

7

Traffic Shaper

Token Bucket Token Bucket Token Bucket Extract tenant ID Round

  • robin

Input packets/commands Output packets/commands Configuration Per-tenant limits (D,C,T)

Meta- data Body

Encodes the “real cost” of the request Request/command

slide-8
SLIDE 8

Tenant 2 Tenant 1 replicated group

Replicated KVS

8

FPGA node FPGA node FPGA node FPGA node

Leader Leader

  • Caribou implements inter-FPGA replication (leader based algorithm)

Replica Replica Replica Replica

Tenant 3 replicated group

slide-9
SLIDE 9

Having multiple roles

  • Control state machine at heart of

replication protocol

  • Data and control handled separately
  • Multiple copies not an option
  • Complex logic + plumbing
  • SM extended to store state for each

tenant – can context switch per each packet (3)

  • Not all states need tenant context
  • Latency inside SM not on critical path
  • Now in registers, but could use BRAMs to

store state

9

  • Out. command

Input message Tenant 1 State Tenant 2 State Tenant 3 State Replication controller (atomic broadcast protocol)

List of peers, Role in protocol, Outstanding proposals, etc. Encodes key, data

  • p., socket

numbers, etc.

slide-10
SLIDE 10

Evaluation

  • f Multes
  • Multiple Xilinx VC709s connected to a 10Gbps switch
  • 9 load generating machines, Go-based benchmarking tool
  • Tenants connect to different TCP port numbers (e.g. 2880, 2881, …)

✓ Multes offers flexible multi-tenancy while efficiently using the network link

10

Network Tenant 2 Tenant 1 Client Client Client Client Client Client Multes Memory/ Storage Replication protocol

slide-11
SLIDE 11

No performance loss due to multi-tenancy

  • Read-only throughput on a single node

11

slide-12
SLIDE 12

Load isolation

  • Replicated write latency of Tenant0 (group = 3)
  • Additional tenants using their full read bandwidth (1/8 of 10Gbps)

12

Replicated write latency [us] (without client overhead)

slide-13
SLIDE 13

Resource Usage: Small cost for sharing

10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 % of VC709 resources

  • No. of max. supported tenants in Multes

13

Caribou 2x Caribou

Logic BRAM Logic BRAM

Multes T=2

The FPGA part on the VC709 is XC7VX690T-2FFG1761C

slide-14
SLIDE 14

Thoughts on the future

Platform-as-a-service

  • Customize KVS with tenant-defined

processing for different “flavors”

  • Combining multi-tenant application with

small PR regions

  • Simple streaming interfaces – can use HLS, OpenCL,

etc.

  • Misbehaving PR region does not impact others

14

Multes Memory

Traffic Shaper Traffic Shaper

Network Stack (TCP) Replication Multivers. Hash Table + Allocator Value Access + Processing

slide-15
SLIDE 15

Conclusion

Multes: multi-tenant KVS service that doesn’t sacrifice performance

Project on Github: https://github.com/fpgasystems/caribou

Relied on three techniques: 1) Single-pipeline architecture and traffic shapers → no load interaction 2) Runtime-parameterization of control modules → flexible allocations 3) “Contexts” in controlling state machines → no overhead when switching between tenants → Applicable to many network-facing applications on FPGAs

15