Developing Kubernetes Services at Airbnb Scale @MELANIECEBULA What - - PowerPoint PPT Presentation

developing kubernetes services
SMART_READER_LITE
LIVE PREVIEW

Developing Kubernetes Services at Airbnb Scale @MELANIECEBULA What - - PowerPoint PPT Presentation

@MELANIECEBULA / MARCH 2019 / CON LONDON Developing Kubernetes Services at Airbnb Scale @MELANIECEBULA What is kubernetes? @MELANIECEBULA Who am I? A BRIEF HISTORY @MELANIECEBULA Why Microservices? 4000000 3000000 2000000 MONOLITH


slide-1
SLIDE 1

Developing Kubernetes Services

@MELANIECEBULA / MARCH 2019 / CON LONDON

at Airbnb Scale

slide-2
SLIDE 2

What is kubernetes?

@MELANIECEBULA

slide-3
SLIDE 3

Who am I?

@MELANIECEBULA

slide-4
SLIDE 4

A BRIEF HISTORY

slide-5
SLIDE 5

MONOLITH LOC 1000000 2000000 3000000 4000000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

😴

@MELANIECEBULA

Why Microservices?

slide-6
SLIDE 6

ENGINEERING TEAM 250 500 750 1000 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

😳

@MELANIECEBULA

Why Microservices?

slide-7
SLIDE 7

SCALING CONTINUOUS DELIVERY

@MELANIECEBULA

Why Microservices?

slide-8
SLIDE 8

Deploys per week (all apps, all environments)

@MELANIECEBULA

Why Microservices?

slide-9
SLIDE 9

Monolith production deploys per week

@MELANIECEBULA

Why Microservices?

slide-10
SLIDE 10

125,000

production deploys per year

@MELANIECEBULA

Why Microservices?

slide-11
SLIDE 11

Manually configuring boxes Automate configuration of applications with Chef Automate configuration and

  • rchestration of

containerized applications with Kubernetes

@MELANIECEBULA

Why kubernetes?

EVOLUTION OF CONFIGURATION MANAGEMENT

slide-12
SLIDE 12

@MELANIECEBULA

Why kubernetes?

  • portable
  • immutable
  • reproducible
  • declarative
  • efficient scheduling
  • extensible API
  • human-friendly data
  • standard format
slide-13
SLIDE 13

TODAY

slide-14
SLIDE 14

50% of services

in kubernetes

@MELANIECEBULA

migration progress

slide-15
SLIDE 15

250+ critical services

in kubernetes

@MELANIECEBULA

migration progress

slide-16
SLIDE 16
  • complex configuration
  • complex tooling
  • integrating with your current infrastructure
  • open issues
  • scaling
  • … and more!

@MELANIECEBULA

Challenges with kubernetes?

slide-17
SLIDE 17
  • complex configuration
  • complex tooling
  • integrating with your current infrastructure
  • open issues
  • scaling
  • … and more!

@MELANIECEBULA

Challenges with kubernetes?

solvable problems!

slide-18
SLIDE 18

@MELANIECEBULA

you are not alone.

slide-19
SLIDE 19
  • 1. abstract away k8s configuration
  • 2. generate service boilerplate
  • 3. version + refactor configuration
  • 4. opinionated kubectl
  • 5. custom ci/cd + validation

@MELANIECEBULA

Solutions?

slide-20
SLIDE 20

ABSTRACT AWAY CONFIGURATION

slide-21
SLIDE 21

P

kubernetes

@MELANIECEBULA

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service

kubernetes cluster

Dev Deployment Dev ConfigMap Dev Service

kubernetes config files

slide-22
SLIDE 22

P

kubernetes

@MELANIECEBULA

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service

kubernetes cluster

Dev Deployment Dev ConfigMap Dev Service

kubernetes config files lots of boilerplate repetitive by environment resources environments

slide-23
SLIDE 23

Reducing k8s boilerplate

OUR REQUIREMENTS

  • Prefer templating over file inheritance
  • Input should be templated YAML files
  • Make it easier to migrate 100s of legacy services
  • Make it easier to retrain 1000 engineers

@MELANIECEBULA

slide-24
SLIDE 24

Project Apps Containers Files Volumes Dockerfile kube-gen generate kubectl apply

kubernetes cluster

@MELANIECEBULA

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service Dev Deployment Dev ConfigMap Dev Service

kubernetes config files

generating k8s configs

slide-25
SLIDE 25

Project Apps Containers Files Volumes Dockerfile kube-gen generate kubectl apply

kubernetes cluster

@MELANIECEBULA

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service Dev Deployment Dev ConfigMap Dev Service

kubernetes config files kube-gen!

generating k8s configs

slide-26
SLIDE 26

Reducing k8s boilerplate

WHAT WE WENT WITH

@MELANIECEBULA

Project Apps Containers Files Volumes Dockerfile

sets params per environment

  • ther files

access params

slide-27
SLIDE 27

@MELANIECEBULA

Project Apps Containers Files Volumes Dockerfile kube-gen generate

generating k8s configs

slide-28
SLIDE 28

@MELANIECEBULA

Project Apps Containers Files Volumes Dockerfile kube-gen generate

standardized namespaces based on environments!

generating k8s configs

slide-29
SLIDE 29

kube-gen

COMPONENTS

@MELANIECEBULA

Project Apps Containers Files Volumes Dockerfile

Which shared components to use? nginx logging statsd example components

slide-30
SLIDE 30

kube-gen

COMPONENTS

@MELANIECEBULA

Main Container Containers Main App Volumes Files Dockerfile

nginx component

  • common pattern abstracted into component
  • component yams is merged into project on generate
  • components can require or set default params
slide-31
SLIDE 31

Reducing k8s boilerplate

OPEN SOURCE OPTIONS

@MELANIECEBULA

  • 1. Combine with package management (ex: helm)
  • 2. Override configuration via file inheritance (ex: kustomize)
  • 3. Override configuration via templating (ex: kapitan)
slide-32
SLIDE 32
  • Reduce kubernetes boilerplate
  • Standardize on environments and

namespaces

Takeaways

slide-33
SLIDE 33

GENERATE SERVICE BOILERPLATE

slide-34
SLIDE 34

Everything about a service is in one place in git, and managed with one process.

slide-35
SLIDE 35

Everything about a service is in one place in git

  • All configuration lives in _infra alongside project code
  • Edit code and configuration with one pull request
  • Easy to add new configuration
  • Statically validated in CI/CD

Configuration

LIVES IN ONE PLACE

@MELANIECEBULA

slide-36
SLIDE 36

What we support:

  • kube-gen files
  • framework boilerplate
  • API boilerplate
  • CI/CD
  • docs
  • AWS IAM roles
  • project ownership
  • storage
  • .. and more!

Configuration

LIVES IN ONE PLACE

@MELANIECEBULA

slide-37
SLIDE 37

Configuration

LIVES IN ONE PLACE

@MELANIECEBULA

this “hello world” service was created in one command

slide-38
SLIDE 38

Configuration

LIVES IN ONE PLACE

@MELANIECEBULA

collection of config generators (ex: docs, ci)

slide-39
SLIDE 39

Configuration

LIVES IN ONE PLACE

@MELANIECEBULA

collection of framework-specific generators (ex: Rails, Dropwizard)

slide-40
SLIDE 40
  • make best practices the default (ex: deploy pipeline,

autoscaling, docs)

  • run generators individually or as a group
  • support for review, update, commit

Configuration

CAN BE GENERATED

@MELANIECEBULA

slide-41
SLIDE 41
  • Everything about a service should

be in one place in git

  • Make best practices the default by

generating configuration

Takeaways

slide-42
SLIDE 42

VERSION CONFIGURATION

slide-43
SLIDE 43

Why do we version our kube configuration?

@MELANIECEBULA

  • add support for something

new (ex: k8s version)

  • want to change something

(ex: deployment strategy)

  • want to drop support for

something (breaking change)

  • know which versions are

bad when we make a regression 😆

  • support release cycle and

cadence

kube.yml v1

kubev1 deploy kubev2 deploy

kubernetes cluster

kube.yml v2

slide-44
SLIDE 44

How do we version our kube configuration?

@MELANIECEBULA

  • 1. version field
  • 2. publish binaries for each

version

  • 3. channels point to binaries

(ex: stable)

  • 4. generate and apply using

the appropriate binary bonk kube-gen.yml

slide-45
SLIDE 45

Why do we version our generated configuration?

@MELANIECEBULA

  • what our project generators

generate changes over time

  • best practices change
  • and bugs in the generators

are found! 😆

kube.yml generated by sha1 kubev1 deploy

kubernetes cluster

kube.yml generated by sha2

generator at sha2 has a bug

slide-46
SLIDE 46

How do we version our generated configuration?

@MELANIECEBULA

generator tags generated files with version, sha, and timestamp

slide-47
SLIDE 47

REFACTOR CONFIGURATION

slide-48
SLIDE 48
  • services should be up-to-date with latest best practices
  • update configuration to the latest supported versions
  • apply security patches to images
  • configuration migrations should be automated

Why do we refactor configuration?

FOR HUNDREDS OF SERVICES

slide-49
SLIDE 49

250+ critical services

in kubernetes

@MELANIECEBULA

(we don’t want to manually refactor)

slide-50
SLIDE 50

How do we refactor configuration?

@MELANIECEBULA

  • collection of general

purpose scripts

  • scripts are modular
  • scripts cover the lifecycle of

a refactor

list-pr-urls.py get-repos.py update- prs.py refactor.py close.py status.py

refactorator

slide-51
SLIDE 51

The lifecycle of a refactor

Checks out repo, finds project, runs refactor job, tags owners, creates PR Comments on the PR, reminding owners to verify, edit, and merge the PR Merges the PR with different levels of force Run Refactor Merge Update

@MELANIECEBULA

slide-52
SLIDE 52

How do we refactor configuration?

@MELANIECEBULA

  • refactorator will run a

refactor for all services given a refactor job

  • refactor job updates _infra

file(s)

  • ex: upgrade kube version to

stable

refactorator

upgrade- kube.py

refactor job

slide-53
SLIDE 53

Bumping stable version

@MELANIECEBULA

  • bump stable version
  • cron job uses refactorator

and with the upgrade- kube.py refactor job to create PRs

  • another cron job handling

updating and merging PRs

runs daily on weekdays

k8s cron job

refactorator upgrade- kube.py

refactor job

slide-54
SLIDE 54
  • Configuration should be versioned

and refactored automatically.

Takeaways

slide-55
SLIDE 55

OPINIONATED KUBECTL

slide-56
SLIDE 56

P

kubernetes

@MELANIECEBULA

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service

kubernetes cluster

Dev Deployment Dev ConfigMap Dev Service

kubernetes config files lots of boilerplate repetitive by environment resources environments

slide-57
SLIDE 57

P

kubernetes

@MELANIECEBULA

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service

kubernetes cluster

Dev Deployment Dev ConfigMap Dev Service

kubernetes config files verbose repetitive by namespace

slide-58
SLIDE 58

k tool

KUBECTL WRAPPER

kubernetes cluster

kubectl apply Production Deployment Canary Deployment Production ConfigMap Canary ConfigMap Production Service Canary Service Dev Deployment Dev ConfigMap Dev Service

@MELANIECEBULA

calls kubectl commands (incl. plugins)

slide-59
SLIDE 59

k tool

OPINIONATED KUBECTL

@MELANIECEBULA

slide-60
SLIDE 60

k tool

USES ENV VARS

  • Runs in the project home directory:


$ cd /path/to/bonk $ k status

  • Environment variables for arguments:

$ k status ENV=staging

  • Prints the command that it will execute:


$ k status ENV=staging kubectl get pods --namespace=bonk-staging

@MELANIECEBULA

standardized namespaces!

slide-61
SLIDE 61

k tool

SIMPLIFIES BUILDS AND DEPLOYS

  • k generate generates kubernetes fjles
  • k build performs project build, docker build and

docker push with tags

  • k deploy creates namespace, applies/replaces

kubernetes fjles, sleeps and checks deployment status

  • can chain commands; ex: k all

@MELANIECEBULA

slide-62
SLIDE 62

k tool

A DEBUGGING TOOL

  • defaults to random pod, main container:

$ k ssh ENV=staging

  • specify particular pod, specifjc container:

$ k logs ENV=staging POD=… CONTAINER=bonk

  • automates debugging with k diagnose ENV=staging

@MELANIECEBULA

slide-63
SLIDE 63

k tool

A DEBUGGING TOOL

  • defaults to random pod, main container:

$ k ssh ENV=staging

  • specify particular pod, specifjc container:

$ k logs ENV=staging POD=… CONTAINER=bonk

  • automates debugging with k diagnose ENV=staging

@MELANIECEBULA

call kubectl diagnose

slide-64
SLIDE 64

What are kubectl plugins?

@MELANIECEBULA

slide-65
SLIDE 65

What are kubectl plugins?

@MELANIECEBULA

slide-66
SLIDE 66

k diagnose

SETUP

@MELANIECEBULA

deploy bonk service with failing command new pod in CrashLoopBackoff

slide-67
SLIDE 67

k diagnose

MANUALLY

@MELANIECEBULA

  • 1. use “get pods -
  • =yaml” and look

for problems

  • 2. grab logs for

unready container

slide-68
SLIDE 68

k diagnose

MANUALLY

@MELANIECEBULA

  • 3. get k8s events

related to this pod

slide-69
SLIDE 69

kubectl podevents

KUBECTL PLUGIN

@MELANIECEBULA

kubectl podevents plugin

slide-70
SLIDE 70

kubectl diagnose

USES COBRA GO CLI

@MELANIECEBULA

// defines CLI command and flags var Namespace string var rootCmd = &cobra.Command{ Use: “kubectl diagnose —namespace<namespace>" Short: “diagnoses a namespace with pods in CrashLoopBackOff” Run: func(cmd *cobra.Command, arg[]string) { // Fill in with program logic } } func Execute() { rootCmd.Flags().StringVarP(&Namespace, "namespace", "n", “”) rootCmd.MarkFlagRequired("namespace") if err := rootCmd.Execute(); err != nil { fmt.Println(err)

  • s.Exit(1)

}

slide-71
SLIDE 71

kubectl diagnose

USES K8S CLIENT-GO

@MELANIECEBULA

// get pods (assume Namespace is defined) kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube","config") config, err := clientcmd.BuildConfigFromFlags("", kubeconfig) if err != nil { … } clientset, err := kubernetes.NewForConfig(config) if err != nil { … } pods, err := clientset.CoreV1().Pods(Namespace).List(metav1.ListOptions{}) fmt.Printf("There are %d pods in the namespace %s\n", len(pods.Items), Namespace) for _, pod := range pod.Items { podName := pod.Name for _, c := range pod.Status.ContainerStatuses { if c.Ready != true { // print c.LastTerminatedState and c.State } }

uses k8s client-go and Namespace param to get pods

slide-72
SLIDE 72

kubectl diagnose

USES K8S CLIENT-GO

@MELANIECEBULA

// get pods (assume Namespace is defined) kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube","config") config, err := clientcmd.BuildConfigFromFlags("", kubeconfig) if err != nil { … } clientset, err := kubernetes.NewForConfig(config) if err != nil { … } pods, err := clientset.CoreV1().Pods(Namespace).List(metav1.ListOptions{}) fmt.Printf("There are %d pods in the namespace %s\n", len(pods.Items), Namespace) for _, pod := range pod.Items { podName := pod.Name for _, c := range pod.Status.ContainerStatuses { if c.Ready != true { // print c.LastTerminatedState and c.State } }

prints info for all unready containers

slide-73
SLIDE 73

kubectl diagnose

USES OS/EXEC (WHEN LAZY)

@MELANIECEBULA

// get pod events for namespace and pod cmd := exec.Command("kubectl", "podevents", Namespace, podName) var out bytes.Buffer var stderr bytes.Buffer cmd.Stdout = &out cmd.Stderr = &stderr err := cmd.Run() if err != nil { fmt.Println(fmt.Sprint(err) + ": " + stderr.String()) log.Fatal(err) } else { fmt.Println("Events: \n" + out.String()) } } // also grab logs cmd = exec.Command("kubectl", "logs", podname, fmt.Sprintf("-- namespace=%s", Namespace), "-c", "bonk")

podevents kubectl plugin

slide-74
SLIDE 74

kubectl diagnose

GO KUBECTL PLUGIN

@MELANIECEBULA

slide-75
SLIDE 75

kubectl diagnose

GO KUBECTL PLUGIN

@MELANIECEBULA

  • 1. unready

container info

slide-76
SLIDE 76

kubectl diagnose

GO KUBECTL PLUGIN

@MELANIECEBULA

  • 1. unready

container info

  • 2. kubectl

podevents

slide-77
SLIDE 77

kubectl diagnose

GO KUBECTL PLUGIN

@MELANIECEBULA

  • 1. unready

container info

  • 2. kubectl

podevents

  • 3. pod logs for

unready containers

slide-78
SLIDE 78
  • Create an opinionated kubectl

wrapper

  • Automate common k8s workflows

with kubectl plugins

Takeaways

slide-79
SLIDE 79

CI/CD

slide-80
SLIDE 80
slide-81
SLIDE 81

Each step in our CI /CD jobs are RUN steps in a build Dockerfile

slide-82
SLIDE 82

runs k commands

slide-83
SLIDE 83

DEPLOY PROCESS

slide-84
SLIDE 84

A single deploy process for every change

Write code and config under your project Open a PR and merge your code to master Deploy all code and config changes Develop Deploy Merge

@MELANIECEBULA

slide-85
SLIDE 85

A single deploy process for every change

Deployment ConfigMap Service AWS Alerts Dashboards Project

  • wnership

Docs Secrets kubectl apply

kubernetes cluster

kubectl apply Storage Service Discovery API Gateway Routes

@MELANIECEBULA

slide-86
SLIDE 86

How do we apply k8s configuration?

Deployment ConfigMap Service “kubectl apply”

kubernetes cluster

  • kubectl apply all files
  • in some cases where apply

fails, replace files without force

  • always restart pods on

deploy to pick up changes

  • return atomic success or

failure state by sleeping and checking status

@MELANIECEBULA

slide-87
SLIDE 87

How do you always restart pods on deploy?

Deployment ConfigMap Service kubectl apply

kubernetes cluster

kubectl apply

We add a date label to the pod spec, which convinces k8s to relaunch all pods

@MELANIECEBULA

slide-88
SLIDE 88

How do we apply custom configuration?

@MELANIECEBULA

slide-89
SLIDE 89

How do we apply custom configuration?

aws.yml kubectl apply

kubernetes cluster

kubectl apply

AWS CRD AWS Controller AWS webhook

@MELANIECEBULA

slide-90
SLIDE 90

How do we apply custom configuration?

aws.yml kubectl apply

kubernetes cluster

kubectl apply

AWS CRD AWS Controller AWS webhook

  • 1. Create a custom

resource definition for aws.yml

@MELANIECEBULA

slide-91
SLIDE 91

How do we apply custom configuration?

aws.yml kubectl apply

kubernetes cluster

kubectl apply

AWS CRD AWS Controller AWS webhook

  • 2. Create a controller

that calls a web hook when aws.yml is applied

@MELANIECEBULA

slide-92
SLIDE 92

How do we apply custom configuration?

aws.yml kubectl apply

kubernetes cluster

kubectl apply

AWS CRD AWS Controller AWS webhook

  • 3. Create a web hook

that updates a custom resource

@MELANIECEBULA

slide-93
SLIDE 93

How do we apply custom configuration?

AWS CRD AWS Controller AWS webhook

@MELANIECEBULA

AWS lambda

  • 4. AWS lambda

exposes web hook to be called

slide-94
SLIDE 94
  • Code and configuration should be

deployed with the same process

  • Use custom resources and custom

controllers to integrate k8s with your infra

Takeaways

slide-95
SLIDE 95

VALIDATION

slide-96
SLIDE 96
  • enforce best practices
  • at build time with validation scripts
  • at deploy time with admission controller

Configuration

SHOULD BE VALIDATED

@MELANIECEBULA

slide-97
SLIDE 97

How do we validate configuration at build time?

@MELANIECEBULA

slide-98
SLIDE 98

How do we validate configuration at build time?

@MELANIECEBULA

kube validation script

job dispatcher

project.yml validation script aws .yml validation script

global jobs repo project build global validation jobs docs build bonk CI jobs

slide-99
SLIDE 99

How do we validate configuration at build time?

@MELANIECEBULA

kube validation script

job dispatcher

project.yml validation script aws .yml validation script

global jobs repo project build global validation jobs docs build bonk CI jobs

  • 1. Define global job in

global jobs repo

slide-100
SLIDE 100

How do we validate configuration at build time?

@MELANIECEBULA

kube validation script

job dispatcher

project.yml validation script aws .yml validation script

global jobs repo project build global validation jobs docs build bonk CI jobs

  • 2. job dispatcher

always dispatches global jobs to projects

slide-101
SLIDE 101

How do we validate configuration at build time?

@MELANIECEBULA

kube validation script

job dispatcher

project.yml validation script aws .yml validation script

global jobs repo project build global validation jobs docs build bonk CI jobs

  • 3. global job runs

alongside project jobs

slide-102
SLIDE 102

What do we validate at build time?

@MELANIECEBULA

  • invalid yaml
  • invalid k8s configuration
  • bad configuration versions
  • max namespace length (63

chars)

  • valid project name
  • valid team owner in

project.yml

kube validation script

job dispatcher

project.yml validation script aws .yml validation script

global jobs repo

slide-103
SLIDE 103

How do we validate configuration at deploy time?

project.yml kubectl apply

kubernetes cluster

kubectl apply

@MELANIECEBULA

admission controller

admission controller intercepts requests to the k8s api server prior to persistence

  • f the object
slide-104
SLIDE 104

How do we validate configuration at deploy time?

project.yml

@MELANIECEBULA

admission controller

  • metadata is encoded as

annotations at generate time

  • admission controller

checks for required annotations

  • reject any update to

resources that are missing required annotations

slide-105
SLIDE 105

What do we validate with admission controller?

project.yml

@MELANIECEBULA

admission controller

  • project ownership

annotations

  • configuration stored in git
  • configuration uses

minimally supported version

slide-106
SLIDE 106

What do we validate with admission controller?

project.yml

@MELANIECEBULA

admission controller

  • production images must be

uploaded to production ECR

  • prevent deployment of

unsafe workloads

  • prevent deployment of

development namespaces to production clusters

slide-107
SLIDE 107

What do we validate with admission controller?

project.yml

@MELANIECEBULA

admission controller

  • production images must be

uploaded to production ECR

  • prevent deployment of

unsafe workloads

  • prevent deployment of

development namespaces to production clusters standardized namespaces!

slide-108
SLIDE 108
  • CI/CD should run the same

commands that engineers run locally

  • CI/CD should run in a container
  • Validate configuration as part of CI/

CD

Takeaways

slide-109
SLIDE 109
  • 1. Abstract away complex kubernetes configuration
  • 2. Standardize on environments and namespaces
  • 3. Everything about a service should be in one place in git
  • 4. Make best practices the default by generating configuration
  • 5. Configuration should be versioned and refactored automatically.
  • 6. Create an opinionated kubectl wrapper that automates common workflows
  • 7. CI/CD should run the same commands that engineers run locally, in a container
  • 8. Code and configuration should be deployed with the same process
  • 9. Use custom resources and custom controllers to integrate with your infrastructure

10.Validate configuration as part of CI/CD

10 Takeaways

@MELANIECEBULA

slide-110
SLIDE 110

keno

  • thousands of services running in k8s
  • moving all configuration to gitops workflow w/ custom controllers
  • scaling the cluster / scaling etcd / multi cluster support
  • stateful services / high memory requirements
  • tighter integration with kubectl plugins
  • … and more!

2019 Challenges

@MELANIECEBULA

slide-111
SLIDE 111