Draft Draft Dynamic Storage Provisioning of Manila/CephFS Shares - - PDF document

draft draft
SMART_READER_LITE
LIVE PREVIEW

Draft Draft Dynamic Storage Provisioning of Manila/CephFS Shares - - PDF document

Draft Draft Dynamic Storage Provisioning of Manila/CephFS Shares on Kubernetes Rbert Vaek <robert.vasek@codefreax.org> Ricardo Rocha <ricardo.rocha@cern.ch> @ahcorporto home.cern Table of Contents Draft CSI CephFS


slide-1
SLIDE 1

Draft

slide-2
SLIDE 2

Draft

Dynamic Storage Provisioning of Manila/CephFS Shares on Kubernetes

Róbert Vašek <robert.vasek@codefreax.org> Ricardo Rocha <ricardo.rocha@cern.ch> @ahcorporto

home.cern

slide-3
SLIDE 3

Draft

Table of Contents

Introduction Container Storage Interface CSI CephFS Manila shares with Kubernetes Results, numbers, plots...

3

slide-4
SLIDE 4

Draft

We are here!

Introduction Container Storage Interface CSI CephFS Manila shares with Kubernetes Results, numbers, plots...

4

slide-5
SLIDE 5

Draft

Founded in 1954 Fundamental Science What is 96% of the universe made of? What was the state of matter just after the Big Bang? Why isn’t there anti-matter in the universe?

slide-6
SLIDE 6

Draft

slide-7
SLIDE 7

Draft

slide-8
SLIDE 8

Draft

slide-9
SLIDE 9

Draft

Dynamic Storage Provisioning of Manila/CephFS Shares on Kubernetes

9

slide-10
SLIDE 10

Draft

...working title "From a train wreck to a train ride"

10

slide-11
SLIDE 11

Draft

10’000 CephFS clients [SPOILER ALERT]
slide-12
SLIDE 12

Draft

We are here!

Introduction Container Storage Interface CSI CephFS Manila shares with Kubernetes Results, numbers, plots...

12

slide-13
SLIDE 13

Draft

Container Storage Interface - motivation

Storage

13

slide-14
SLIDE 14

Draft

Container Storage Interface - motivation

CO 1 CO 2 CO 3 Storage

CO - Container Orchestrator

14

slide-15
SLIDE 15

Draft

Container Storage Interface - motivation

CO 1 CO 2 CO 3

driver 1 driver 3 driver 2

Storage

CO - Container Orchestrator

15

slide-16
SLIDE 16

Draft

Container Storage Interface - motivation

From driver’s POV:

◮ Lack of standardization ◮ Higher development and maintenance costs

From CO’s POV:

◮ Volume plugin development is tightly coupled with release cycles of the CO ◮ Bugs in volume plugins can crash critical components ◮ Volume plugins get full privileges ◮ Difficult dependency management 16

slide-17
SLIDE 17

Draft

Container Storage Interface - motivation

CO 1 CO 2 CO 3 Storage

CO - Container Orchestrator

17

slide-18
SLIDE 18

Draft

Container Storage Interface

Overview

◮ Industry standard for cluster-wide storage

plugins

◮ Collaboration of communities incl. Kubernetes,

Mesos, Docker and Cloud Foundry

◮ Defines the protocol between a CO and a

plugin

◮ Plugins are CO-agnostic ◮ Write once – use everywhere, just works™ 18

slide-19
SLIDE 19

Draft

Container Storage Interface

◮ First alpha released in Dec 2017 ◮ Working implementation in

Kubernetes 1.9 already, a lot of changes since then, some of those were breaking

◮ Other COs soon to follow

December 2017 • v0.1.0 March 2018 • v0.2.0 June 2018 • v0.3.0

  • • •

just today • v1.0.0-rc2 end of Nov 2018 • v1.0.0

19

slide-20
SLIDE 20

Draft

CSI Services

CSI RPC services (endpoints):

◮ Identity service: allows a CO to query for plugin’s

capabilities, health probes and other metadata. Must be implemented by both controller and node plugins, you’ll see why in a bit.

◮ Controller service: creates, deletes, lists volumes

and their snapshots.

◮ Node service: (un)stages, (un)publishes volumes

  • n a node.

CSI plugin Controller plugin Node plugin

20

slide-21
SLIDE 21

Draft

CSI Architecture

CO CSI plugin gRPC

21

slide-22
SLIDE 22

Draft

CSI RPCs quick overview

Controller Service*

◮ CreateVolume ◮ DeleteVolume ◮ ControllerPublishVol* ◮ ControllerGetCaps ◮ ...

Node Service

◮ NodeStageVolume* ◮ NodePublishVolume ◮ NodeGetCapabilities ◮ ...

Identity Service

◮ GetPluginInfo ◮ GetPluginCapabilities ◮ ...

* optional

22

slide-23
SLIDE 23

Draft

CSI in Kubernetes

◮ In-tree CSI volume plugin in kubelet ◆♦❞❡✭❯♥✮❙t❛❣❡❱♦❧✉♠❡ ◆♦❞❡✭❯♥✮P✉❜❧✐s❤❱♦❧✉♠❡ ◮ Side-car containers

◮ driver-registrar plugin discovery, registers the driver with kubelet ◮ external-provisioner ❈r❡❛t❡❱♦❧✉♠❡ ❉❡❧❡t❡❱♦❧✉♠❡ ◮ external-attacher ❈♦♥tr♦❧❧❡rP✉❜❧✐s❤❱♦❧✉♠❡ ❈♦♥tr♦❧❧❡r❯♥♣✉❜❧✐s❤❱♦❧✉♠❡

23

slide-24
SLIDE 24

Draft

We are here!

Introduction Container Storage Interface CSI CephFS Manila shares with Kubernetes Results, numbers, plots...

24

slide-25
SLIDE 25

Draft

CSI CephFS overview

❣✐t❤✉❜✳❝♦♠✴❝❡♣❤✴❝❡♣❤✲❝s✐ ◮ Provides an interface between a CSI-enabled Container

Orchestrator and the Ceph cluster

◮ Provisions and mounts CephFS volumes ◮ Supports both the kernel CephFS client and the CephFS

FUSE driver

+

25

slide-26
SLIDE 26

Draft

CSI CephFS overview

Compared to Kubernetes in-tree CephFS volume plugin

◮ In-tree volume plugins to be eventually migrated to CSI ◮ Decoupled from Kubernetes ◮ Ability to choose between mounting tools ◮ Planned support for volume expansion, snapshots 26

slide-27
SLIDE 27

Draft

We are here!

Introduction Container Storage Interface CSI CephFS Manila shares with Kubernetes Results, numbers, plots...

27

slide-28
SLIDE 28

Draft

Manila external provisioner for Kubernetes overview

❣✐t❤✉❜✳❝♦♠✴❦✉❜❡r♥❡t❡s✴❝❧♦✉❞✲♣r♦✈✐❞❡r✲♦♣❡♥st❛❝❦ ◮ Provisions new Manila shares, fetches existing ones ◮ Maps them to Kubernetes PersistentVolume objects ◮ Currently supports CephFS shares only

(both in-tree CephFS plugin and csi-cephfs)

◮ Supports authentication using both user credentials as well

as trustees ◮ Magnum → Kubernetes + manila-provisioner StorageClass

+ trustee secrets = Manila support out-of-the-box

◮ The future is in CSI

+

28

slide-29
SLIDE 29

Draft

We are here!

Introduction Container Storage Interface CSI CephFS Manila shares with Kubernetes Results, numbers, plots...

29

slide-30
SLIDE 30

Draft

Benchmarks

Goals

  • 1. Verify the CSI CephFS implementation for common use cases
  • 2. Verify the Manila Provisioner implementation
  • 3. Test CSI CephFS driver behavior on a heavy loaded cluster

Client

  • 1. Kubernetes v1.12.1, csi-cephfs 0.3.1

Clusters

  • 1. Dwight: 3x24 HDD OSDs, 3 MDS, Ceph Luminous Bluestore
  • 2. Jim: 300 SSD OSD, 2 MDS, Ceph Luminous Bluestore, hyper-converged

30

slide-31
SLIDE 31

Draft

Benchmarks

Methodology

  • 1. Provision s CephFS shares using manila-provisioner
  • 2. Create a Deployment with r replicas, sized so we get one pod per node
  • 3. Mount s provisioned shares into each pod using csi-cephfs (fuse)
  • 4. Measure time taken for all pods to become Running, MDS sessions, hcr/s

Tests

  • 1. idle: do nothing
  • 2. busy: unpack a large archive (linux kernel)

Parameters

  • 1. s = 100, r = 100; 10’000 idle clients
  • 2. s = 10, r = 100; 1’000 busy clients

31

slide-32
SLIDE 32

Draft

Idle benchmark - attempt #1

Our very first test of csi-cephfs with concurrent workloads Preparation

◮ 10 CephFS shares ◮ 100 replicas ◮ The goal is to have 1’000 idle clients running 32

slide-33
SLIDE 33

Draft

Idle benchmark - attempt #1

Our very first test of csi-cephfs with concurrent workloads Preparation

◮ 10 CephFS shares ◮ 100 replicas ◮ The goal is to have 1’000 idle clients running

Outcome

◮:( 33

slide-34
SLIDE 34

Draft

Idle benchmark - attempt #1

❆✉❣ ✸✵ ✵✾✿✺✶✿✹✺ ❝❝✐✲❝❡♣❤❢s✲s❝❛❧❡✲✵✵✸✲♥✸t❢✹♥q❧③✐s❦✲♠✐♥✐♦♥✲✺✻✳❝❡r♥✳❝❤ r✉♥❝❬✸✷✺✺❪✿ ❊✵✽✸✵ ✵✾✿✺✶✿✹✺✳✹✶✵✸✽✵ ✸✷✼✵ ❝s✐❴❛tt❛❝❤❡r✳❣♦✿✶✸✼❪ ❦✉❜❡r♥❡t❡s✳✐♦✴❝s✐✿ ❛tt❛❝❤❡r✳❲❛✐t❋♦r❆tt❛❝❤ ❢❛✐❧❡❞ ❢♦r ✈♦❧✉♠❡ ❬♣✈❝✲❝✺✽✹✽❢✸✷✲❛❝✸✾✲✶✶❡✽✲❜❜❢❜✲✵✷✶✻✸❡✵✶❜✼❝✺❪ ✭✇✐❧❧ ❝♦♥t✐♥✉❡ t♦ tr②✮✿ ✈♦❧✉♠❡❛tt❛❝❤♠❡♥ts✳st♦r❛❣❡✳❦✽s✳✐♦ ✧❝s✐✲✹❢✷❞❜❡✺❝❜✷✺✼❡✼❞✼❜✶✼✷❝✹❛✶❡✻❛✶❞✷✻❜❢❢❢✽✷❞❛❜❡❜✾✶❡✹✹✶❝✺✷✼❞✹✻❢✸✻✽❢✶✻✶✺✧ ✐s ❢♦r❜✐❞❞❡♥✿ ❯s❡r ✧s②st❡♠✿♥♦❞❡✿❝❝✐✲❝❡♣❤❢s✲s❝❛❧❡✲✵✵✸✲♥✸t❢✹♥q❧③✐s❦✲♠✐♥✐♦♥✲✺✻✧ ❝❛♥♥♦t ❣❡t ✈♦❧✉♠❡❛tt❛❝❤♠❡♥ts✳st♦r❛❣❡✳❦✽s✳✐♦ ❛t t❤❡ ❝❧✉st❡r s❝♦♣❡✿ ♥♦ ♣❛t❤ ❢♦✉♥❞ t♦ ♦❜❥❡❝t ֒ → ֒ → ֒ → ֒ → ֒ → ֒ → ֒ → ֒ →

34

slide-35
SLIDE 35

Draft

Idle benchmark - attempt #1

❯s❡r ✧s②st❡♠✿♥♦❞❡✿◆❖❉❊❴◆❆▼❊✧ ❝❛♥♥♦t ❣❡t ✈♦❧✉♠❡❛tt❛❝❤♠❡♥ts ❛t t❤❡ ❝❧✉st❡r s❝♦♣❡✿ ♥♦ ♣❛t❤ ❢♦✉♥❞ t♦ ♦❜❥❡❝t ◮ Provisioning of shares worked just fine ◮ Some pods survived ◮ Others that reported this error would never recover 35

slide-36
SLIDE 36

Draft

Idle benchmark - attempt #2

A script that:

◮ Scales the deployment in small increments ◮ Kills pods that take too long to create (got stuck in the VolumeAttachment error) 36

slide-37
SLIDE 37

Draft

slide-38
SLIDE 38

Draft

Idle benchmark - attempt #2

Outcome

◮ We’ve managed to get up to 655 concurrent clients (could be even more) ◮ Slow and ugly but somehow working 38

slide-39
SLIDE 39

Draft

Idle benchmark - attempt #3

"Third time’s the charm"?

◮ Kubernetes 1.12, driver-registrar 0.4 released ◮ Kubelet plugin registration of CSI drivers ◮ CSISkipAttach

◮ Skips the creation of VolumeAttachment objects ◮ Volumes are marked as attached immediately

39

slide-40
SLIDE 40

Draft

kubelet Node 1 CSI Controller plugin driver-registrar external-attacher kubelet Node 2 CSI Node plugin driver-registrar Pod 1

Waiting for VolumeAttachments

♣✈✲❝s✐✲✶

PersistentVolume For CSI, I’m going to:

  • 1. Create VolAttachment
  • 2. Wait for Attached=true
* external-provisioner omitted from image
slide-41
SLIDE 41

Draft

kubelet Node 1 CSI Controller plugin driver-registrar external-attacher kubelet Node 2 CSI Node plugin driver-registrar Pod 1

Waiting for VolumeAttachments

♣✈✲❝s✐✲✶

PersistentVolume For CSI, I’m going to:

  • 1. Create VolAttachment
  • 2. Wait for Attached=true

No, you’re not!

* external-provisioner omitted from image
slide-42
SLIDE 42

Draft

Idle benchmark - attempt #3

◮ We resumed our tests using the new versions of Kubernetes and driver-registrar ◮ Parameters: 100 CephFS Shares * 100 replicas = 10’000 idle clients ◮ Gradual, gentle scale up 42

slide-43
SLIDE 43

Draft

Idle benchmark - attempt #3

✶✺✿✷✾✿✷✽ ✿ s❝❛❧✐♥❣ t♦ ✺ r❡♣❧✐❝❛s ✶✺✿✸✷✿✸✸ ✿ s❝❛❧✐♥❣ t♦ ✶✵ r❡♣❧✐❝❛s ✳✳✳ ✶✺✿✺✼✿✶✽ ✿ s❝❛❧✐♥❣ t♦ ✽✵ r❡♣❧✐❝❛s ✶✺✿✺✽✿✺✶ ✿ s❝❛❧✐♥❣ t♦ ✽✺ r❡♣❧✐❝❛s ✶✻✿✵✵✿✷✾ ✿ s❝❛❧✐♥❣ t♦ ✾✵ r❡♣❧✐❝❛s ✶✻✿✵✷✿✷✻ ✿ s❝❛❧✐♥❣ t♦ ✾✺ r❡♣❧✐❝❛s ✶✻✿✵✹✿✶✻ ✿ s❝❛❧✐♥❣ t♦ ✶✵✵ r❡♣❧✐❝❛s

43

slide-44
SLIDE 44

Draft

10’000 idle clients in 30min

slide-45
SLIDE 45

Draft

10’000 idle clients in 30min

slide-46
SLIDE 46

Draft

10’000 idle clients in 30min

slide-47
SLIDE 47

Draft

10’000 idle clients in 30min

slide-48
SLIDE 48

Draft

Idle benchmark - attempt #3

Some bits still to be understood

♠❞s✳❝❡♣❤❞✇✐❣❤t♠❞s✷ ♠❞s✳✶ r❡❞❛❝t❡❞✿✻✽✵✵✴✷✷✹✻✾✺✻✼✻✼ ✷✷✸✽ ✿ ❝❧✉st❡r ❬❲❘◆❪ ❡✈✐❝t✐♥❣ ✉♥r❡s♣♦♥s✐✈❡ ❝❧✐❡♥t ❝❝✐✲❝❡♣❤❢s✲s❝❛❧❡✲✵✵✶✲✸✈✐♠❛❧❢♠✼✹❞❞✲♠✐♥✐♦♥✲✼✻✳❝❡r♥✳❝❤ ✭✻✼✹✾✺✸✸✻✻✮✱ ❛❢t❡r ✸✵✹✳✻✷✸✽✾✵ s❡❝♦♥❞s ֒ → ֒ → ♠❞s✳❝❡♣❤❞✇✐❣❤t♠❞s✵ ♠❞s✳✷ r❡❞❛❝t❡❞✿✻✽✵✵✴✷✾✹✷✸✼✶✶✵✽ ✷✶✻✼ ✿ ❝❧✉st❡r ❬❲❘◆❪ ❡✈✐❝t✐♥❣ ✉♥r❡s♣♦♥s✐✈❡ ❝❧✐❡♥t ❝❝✐✲❝❡♣❤❢s✲s❝❛❧❡✲✵✵✶✲✸✈✐♠❛❧❢♠✼✹❞❞✲♠✐♥✐♦♥✲✼✻✳❝❡r♥✳❝❤✿ ✭✻✼✹✾✺✸✸✻✻✮✱ ❛❢t❡r ✸✵✹✳✷✼✼✾✾✻ s❡❝♦♥❞s ֒ → ֒ →

48

slide-49
SLIDE 49

Draft

Idle benchmark - attempt #3

Some bits still to be understood

❝❝✐✲❝❡♣❤❢s✲s❝❛❧❡✲✵✵✶✲✸✈✐♠❛❧❢♠✼✹❞❞✲♠✐♥✐♦♥✲✶✸ ❘❡❛❞② ❁♥♦♥❡❃ ✽❞ ✈✶✳✶✷✳✶ ❝❝✐✲❝❡♣❤❢s✲s❝❛❧❡✲✵✵✶✲✸✈✐♠❛❧❢♠✼✹❞❞✲♠✐♥✐♦♥✲✶✹ ◆♦t❘❡❛❞② ❁♥♦♥❡❃ ✽❞ ✈✶✳✶✷✳✶

49

slide-50
SLIDE 50

Draft

Busy benchmark - attempt #1

◮ Each client extracting the linux kernel ◮ Slowly ramp up until it breaks 50

slide-51
SLIDE 51

Draft

slide-52
SLIDE 52

Draft

slide-53
SLIDE 53

Draft

Bonus benchmark - client deletion

◮ Stop 10’000 clients and delete their shares, simultaneously ◮ Kubernetes and driver OK, CephFS OK, Manila share daemon needed a kick

⑤ ✳✳✳ ⑤ ♣✈❝✲✵✽❞✳✳✳ ⑤ ✶ ⑤ ❈❊P❍❋❙ ⑤ ❞❡❧❡t✐♥❣ ⑤ ❋❛❧s❡ ⑤ ●❡♥❡✈❛ ❈❡♣❤❋❙ ❚❡st✐♥❣ ⑤ ♥♦✈❛ ⑤ ⑤ ✳✳✳ ⑤ ♣✈❝✲✼❛✾✳✳✳ ⑤ ✶ ⑤ ❈❊P❍❋❙ ⑤ ❞❡❧❡t✐♥❣ ⑤ ❋❛❧s❡ ⑤ ●❡♥❡✈❛ ❈❡♣❤❋❙ ❚❡st✐♥❣ ⑤ ♥♦✈❛ ⑤

53

slide-54
SLIDE 54

Draft

slide-55
SLIDE 55

Draft

Conclusion & Next Steps

To recap:

◮ Standardized storage interface for Container Orchestrators with CSI ◮ Works nicely in Kubernetes, others soon to follow ◮ manila-provisioner + csi-cephfs handle large concurrency and scaling well ◮ Already in production at CERN

Next Steps:

◮ Add support for volume expansion and snapshots ◮ Make the Manila Provisioner a CSI plugin 55

slide-56
SLIDE 56

Draft

Questions?

◮ Robert Vasek <robert.vasek@codefreax.org> ◮ Ricardo Rocha <ricardo.rocha@cern.ch> @ahcorporto ◮ CSI CephFS: https://github.com/ceph/ceph-csi ◮ Manila Provisioner: https://github.com/kubernetes/cloud-provider-openstack 56