@coreoslinux About Me @brandonphilips CTO/CO-FOUNDER - - PowerPoint PPT Presentation

▶

coreoslinux about me brandonphilips cto co founder github

Sep 28, 2022 513 likes •1.35k views

@coreoslinux About Me @brandonphilips CTO/CO-FOUNDER github.com/philips systems engineer etcd /etc distributed open source software failure tolerant durable watchable exposed via HTTP runtime reconfigurable Data Store API -X GET Get

SLIDE 1

@coreoslinux

SLIDE 2

About Me CTO/CO-FOUNDER systems engineer @brandonphilips github.com/philips

SLIDE 3

etcd

SLIDE 4

/etc distributed

SLIDE 5

pen source software

failure tolerant durable watchable exposed via HTTP runtime reconfigurable

SLIDE 6

Data Store API

X GET

Get Wait

X PUT

Put Create CAS

X DELETE

Delete CAD

SLIDE 7

Leader Follower

etcd Cluster

SLIDE 8

Applications

locksmith

SLIDE 9

SLIDE 10

SLIDE 11

Cluster Wide Reboot Lock

1. Need reboot to reboot? Decrement the

semaphore key atomically with etcd.

2. manager.Reboot() and wait...
3. After rebooting increment the semaphore

key in etcd atomically.

SLIDE 12

Applications

kubernetes and fleet

SLIDE 13

You Scheduler API Scheduler Machine(s)

SLIDE 14

Cluster Work Scheduling

1. Cluster API writes desired work into etcd

keyspace.

2. Agents running on individual machines pick

up work assigned to them.

3. Agents report where work is running and

current status.

SLIDE 15

Applications

vulcan, confd, dns and distributed git

SLIDE 16

Example Leader Election

using TTL and atomic operations

SLIDE 17

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’

SLIDE 18

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Entry

/6eadeac2d/f1d2df

http://10.1.2.3:7001

SLIDE 19

/6eadeac2d/f1d2df

http://10.1.2.3:7001

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Index

SLIDE 20

/6eadeac2d/f1d2df

http://10.1.2.3:7001

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Key

SLIDE 21

/6eadeac2d/f1d2df

http://10.1.2.3:7001

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Value

SLIDE 22

Idx Key Value Expiration Time 18 sched m3 Sept 18 2:11:30

SLIDE 23

Idx Key Value Expiration Time 18 sched m3 Sept 18 2:11:30

schedlr m3

SLIDE 24

c a s ( s c h e d , 1 8 , m 3 ) c a s ( s c h e d , 1 8 , m 3 ) schedlr m3

Idx Key Value Expiration Time 18 sched m3 Sept 18 2:11:30

SLIDE 25

c a s ( s c h e d , 3 , m 3 ) c a s ( s c h e d , 3 , m 3 ) schedlr m3

Idx Key Value Expiration Time 30 sched m3 Sept 18 2:12:50

SLIDE 26

c a s ( s c h e d , 4 5 , m 3 ) c a s ( s c h e d , 4 5 , m 3 ) schedlr m3

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

SLIDE 27

s y n c ( 2 : 1 3 : ) s y n c ( 2 : 1 3 : )

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

SLIDE 28

s y n c ( 2 : 1 3 : 1 5 ) s y n c ( 2 : 1 3 : 1 5 )

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

SLIDE 29

s y n c ( 2 : 1 3 : 3 ) s y n c ( 2 : 1 3 : 3 )

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

SLIDE 30

s y n c ( 2 : 1 3 : 3 ) s y n c ( 2 : 1 3 : 3 )

Idx Key Value Expiration Time

SLIDE 31

c r e a t e ( s c h e d , m 5 ) c r e a t e ( s c h e d , m 5 )

Idx Key Value Expiration Time 50 sched m5 Sept 18 2:13:35

schedlr m5

SLIDE 32

etcd basics

clusters and bootstrapping

SLIDE 33

Leader Follower

etcd Cluster

SLIDE 34

bootstrapping

Candidate

SLIDE 35

GET discovery.etcd.io/new

SLIDE 36

discovery.etcd.io/6eadeac2

6eadeac2d

SLIDE 37

6eadeac2d/state CREATE

SLIDE 38

6eadeac2d/state

Key Value Index state started 5890 n0 10.0.2.1 5891 n1 10.0.2.4 5898 ...

SLIDE 39

bootstrapped

Leader Follower

SLIDE 40

SLIDE 41

6eadeac2d/state CREATE

SLIDE 42

SLIDE 43

1 2 3 4

{

Log

SLIDE 44

1 2 3 4

Entries

SLIDE 45

1 2 3 4

Indexes

SLIDE 46

Sequential Consistency

Operations* are atomically executed in the same sequential order on all machines.

SLIDE 47 1 1 1 2 Pet=dog Pet=cat Pet=cat 1 2

PUT Pet = cat PUT Pet = dog

SLIDE 48 1 1 1 2 2 1 2

PUT Pet = cat PUT Pet = dog

Pet=dog Pet=dog Pet=cat

SLIDE 49 1 1 1 2 2 2 1 2

PUT Pet = cat PUT Pet = dog

Pet=dog Pet=dog Pet=dog

SLIDE 50

Sequential Consistency

Real-time

SLIDE 51 1 1 1 2

GET Pet @ 10:00.0 -> 1[cat]!? GET Pet @ 10:00.0 -> 2[dog]

SLIDE 52 1 1 1 2 2 2

GET Pet @ 10:00.1 -> 1[dog]

SLIDE 53

Sequential Consistency

Index Time

SLIDE 54 1 1 1 2

GET Pet @ 2 -> blocking GET Pet @ 2 -> 2[dog]

SLIDE 55 1 1 1 2

GET Pet @ 2 -> 2[dog]

2 2

SLIDE 56

etcd guarantees that a get at index X will always return the same result.

Avoid thinking in terms of real time because with network latency the result is always out-of-date.

SLIDE 57

Quorum GETs

GET via Raft

SLIDE 58 1 1 1 2 2

SLIDE 59 1 1 1 2

QGET A

SLIDE 60 1 1 1 2

QGET A -> 2[dog]

2 2

SLIDE 61 1 1 1 2

QGET A -> 2[dog]

2 2 3 3

SLIDE 62

Watchable Changes

HTTP Long-poll

SLIDE 63 1 2 3

> GET asdf?waitIndex=4&wait=true HTTP/1.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < X-Etcd-Index: 3 < X-Raft-Index: 97 < X-Raft-Term: 0 < BLOCK

SLIDE 64 1 2 3 4

> GET asdf?waitIndex=4&wait=true HTTP/1.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < X-Etcd-Index: 3 < X-Raft-Index: 97 < X-Raft-Term: 0 < {"action":"set","node":{"key":"/asdf","value":"foobar"," modifiedIndex":4,"createdIndex":4}}

SLIDE 65 1 2 3 4

> GET asdf?waitIndex=4&wait=true HTTP/1.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < X-Etcd-Index: 4 < X-Raft-Index: 516 < X-Raft-Term: 0 < {"action":"set","node":{"key":"/asdf","value":"foobar"," modifiedIndex":4,"createdIndex":4}}

SLIDE 66

Event History

History isn’t forever, prepare!

SLIDE 67

Availability

In a 2F+1 cluster tolerate F machine failures

SLIDE 68

Available

SLIDE 69

Available

SLIDE 70

Available

SLIDE 71

Unavailable

SLIDE 72

Master Election

Fast recovery (5-10*typical RTT) from temporarily unavailable

SLIDE 73

Available

Leader Follower

SLIDE 74

Leader Follower

Available

SLIDE 75

Leader Follower

Temporarily Unavailable

SLIDE 76

Leader Follower

Available

SLIDE 77

Durable

log files, snapshots and backups

SLIDE 78

Mistakes so far...

SLIDE 79

Log files

Filesystems truncate and corrupt data. Solutions:

Must use checksumming in the file to ensure

sanity

Throwing out broken log files must be

handled by the server

SLIDE 80

etcd machine naming

Trusted users to manage unique names across the cluster. This went poorly.

Misconfiguration from bugs
Misconfiguration by users
Machine cloning on the cloud

Solution: etcd data-dir owns a unique uuid.

SLIDE 81

sync() in the cloud

Slow, slow, slow:

User #1 OpenStack on spinning disk: 6s
User #2 AWS EBS backed: 1.5s

Solution:

Tune etcd to expect this long latency.
Write batching and handling of behind

machines.

SLIDE 82

Wednesday 10:40am LCA CoreOS: An Introduction Wednesday 6:00pm AKL Continuous Delivery Meetup. CoreOS: An Introduction Thursday 6:00 PM Go AKL Meetup Something about Go Friday 10:40am LCA CoreOS Tutorial

SLIDE 83

Thanks

we like pull requests github.com/coreos/etcd