@coreoslinux About Me @brandonphilips CTO/CO-FOUNDER - - PowerPoint PPT Presentation

coreoslinux about me brandonphilips cto co founder github
SMART_READER_LITE
LIVE PREVIEW

@coreoslinux About Me @brandonphilips CTO/CO-FOUNDER - - PowerPoint PPT Presentation

@coreoslinux About Me @brandonphilips CTO/CO-FOUNDER github.com/philips systems engineer etcd /etc distributed open source software failure tolerant durable watchable exposed via HTTP runtime reconfigurable Data Store API -X GET Get


slide-1
SLIDE 1

@coreoslinux

slide-2
SLIDE 2

About Me CTO/CO-FOUNDER systems engineer @brandonphilips github.com/philips

slide-3
SLIDE 3

etcd

slide-4
SLIDE 4

/etc distributed

slide-5
SLIDE 5
  • pen source software

failure tolerant durable watchable exposed via HTTP runtime reconfigurable

slide-6
SLIDE 6

Data Store API

  • X GET

Get Wait

  • X PUT

Put Create CAS

  • X DELETE

Delete CAD

slide-7
SLIDE 7

Leader Follower

etcd Cluster

slide-8
SLIDE 8

Applications

locksmith

slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11

Cluster Wide Reboot Lock

  • 1. Need reboot to reboot? Decrement the

semaphore key atomically with etcd.

  • 2. manager.Reboot() and wait...
  • 3. After rebooting increment the semaphore

key in etcd atomically.

slide-12
SLIDE 12

Applications

kubernetes and fleet

slide-13
SLIDE 13

You Scheduler API Scheduler Machine(s)

slide-14
SLIDE 14

Cluster Work Scheduling

  • 1. Cluster API writes desired work into etcd

keyspace.

  • 2. Agents running on individual machines pick

up work assigned to them.

  • 3. Agents report where work is running and

current status.

slide-15
SLIDE 15

Applications

vulcan, confd, dns and distributed git

slide-16
SLIDE 16

Example Leader Election

using TTL and atomic operations

slide-17
SLIDE 17

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’

slide-18
SLIDE 18

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Entry

1

/6eadeac2d/f1d2df

http://10.1.2.3:7001

slide-19
SLIDE 19

1

/6eadeac2d/f1d2df

http://10.1.2.3:7001

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Index

slide-20
SLIDE 20

1

/6eadeac2d/f1d2df

http://10.1.2.3:7001

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Key

slide-21
SLIDE 21

1

/6eadeac2d/f1d2df

http://10.1.2.3:7001

PUT /6eadeac2d/f1d2d2f924e98 ‘http://10.1.2.3:7001’ Value

slide-22
SLIDE 22

Idx Key Value Expiration Time 18 sched m3 Sept 18 2:11:30

slide-23
SLIDE 23

Idx Key Value Expiration Time 18 sched m3 Sept 18 2:11:30

schedlr m3

slide-24
SLIDE 24

c a s ( s c h e d , 1 8 , m 3 ) c a s ( s c h e d , 1 8 , m 3 ) schedlr m3

Idx Key Value Expiration Time 18 sched m3 Sept 18 2:11:30

slide-25
SLIDE 25

c a s ( s c h e d , 3 , m 3 ) c a s ( s c h e d , 3 , m 3 ) schedlr m3

Idx Key Value Expiration Time 30 sched m3 Sept 18 2:12:50

slide-26
SLIDE 26

c a s ( s c h e d , 4 5 , m 3 ) c a s ( s c h e d , 4 5 , m 3 ) schedlr m3

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

slide-27
SLIDE 27

s y n c ( 2 : 1 3 : ) s y n c ( 2 : 1 3 : )

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

slide-28
SLIDE 28

s y n c ( 2 : 1 3 : 1 5 ) s y n c ( 2 : 1 3 : 1 5 )

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

slide-29
SLIDE 29

s y n c ( 2 : 1 3 : 3 ) s y n c ( 2 : 1 3 : 3 )

Idx Key Value Expiration Time 45 sched m3 Sept 18 2:13:30

slide-30
SLIDE 30

s y n c ( 2 : 1 3 : 3 ) s y n c ( 2 : 1 3 : 3 )

Idx Key Value Expiration Time

slide-31
SLIDE 31

c r e a t e ( s c h e d , m 5 ) c r e a t e ( s c h e d , m 5 )

Idx Key Value Expiration Time 50 sched m5 Sept 18 2:13:35

schedlr m5

slide-32
SLIDE 32

etcd basics

clusters and bootstrapping

slide-33
SLIDE 33

Leader Follower

etcd Cluster

slide-34
SLIDE 34

bootstrapping

Candidate

slide-35
SLIDE 35

GET discovery.etcd.io/new

slide-36
SLIDE 36

discovery.etcd.io/6eadeac2

6eadeac2d

slide-37
SLIDE 37

6eadeac2d/state CREATE

slide-38
SLIDE 38

6eadeac2d/state

Key Value Index state started 5890 n0 10.0.2.1 5891 n1 10.0.2.4 5898 ...

slide-39
SLIDE 39

bootstrapped

Leader Follower

slide-40
SLIDE 40
slide-41
SLIDE 41

6eadeac2d/state CREATE

slide-42
SLIDE 42
slide-43
SLIDE 43

1 2 3 4

{

Log

slide-44
SLIDE 44

1 2 3 4

Entries

slide-45
SLIDE 45

1 2 3 4

Indexes

slide-46
SLIDE 46

Sequential Consistency

Operations* are atomically executed in the same sequential order on all machines.

slide-47
SLIDE 47 1 1 1 2 Pet=dog Pet=cat Pet=cat 1 2

PUT Pet = cat PUT Pet = dog

slide-48
SLIDE 48 1 1 1 2 2 1 2

PUT Pet = cat PUT Pet = dog

Pet=dog Pet=dog Pet=cat
slide-49
SLIDE 49 1 1 1 2 2 2 1 2

PUT Pet = cat PUT Pet = dog

Pet=dog Pet=dog Pet=dog
slide-50
SLIDE 50

Sequential Consistency

Real-time

slide-51
SLIDE 51 1 1 1 2

GET Pet @ 10:00.0 -> 1[cat]!? GET Pet @ 10:00.0 -> 2[dog]

2
slide-52
SLIDE 52 1 1 1 2 2 2

GET Pet @ 10:00.1 -> 1[dog]

slide-53
SLIDE 53

Sequential Consistency

Index Time

slide-54
SLIDE 54 1 1 1 2

GET Pet @ 2 -> blocking GET Pet @ 2 -> 2[dog]

2
slide-55
SLIDE 55 1 1 1 2

GET Pet @ 2 -> 2[dog]

2 2
slide-56
SLIDE 56

etcd guarantees that a get at index X will always return the same result.

Avoid thinking in terms of real time because with network latency the result is always out-of-date.

slide-57
SLIDE 57

Quorum GETs

GET via Raft

slide-58
SLIDE 58 1 1 1 2 2
slide-59
SLIDE 59 1 1 1 2

QGET A

2
slide-60
SLIDE 60 1 1 1 2

QGET A -> 2[dog]

2 2
slide-61
SLIDE 61 1 1 1 2

QGET A -> 2[dog]

2 2 3 3
slide-62
SLIDE 62

Watchable Changes

HTTP Long-poll

slide-63
SLIDE 63 1 2 3

> GET asdf?waitIndex=4&wait=true HTTP/1.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < X-Etcd-Index: 3 < X-Raft-Index: 97 < X-Raft-Term: 0 < BLOCK

slide-64
SLIDE 64 1 2 3 4

> GET asdf?waitIndex=4&wait=true HTTP/1.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < X-Etcd-Index: 3 < X-Raft-Index: 97 < X-Raft-Term: 0 < {"action":"set","node":{"key":"/asdf","value":"foobar"," modifiedIndex":4,"createdIndex":4}}

slide-65
SLIDE 65 1 2 3 4

> GET asdf?waitIndex=4&wait=true HTTP/1.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: application/json < X-Etcd-Index: 4 < X-Raft-Index: 516 < X-Raft-Term: 0 < {"action":"set","node":{"key":"/asdf","value":"foobar"," modifiedIndex":4,"createdIndex":4}}

slide-66
SLIDE 66

Event History

History isn’t forever, prepare!

slide-67
SLIDE 67

Availability

In a 2F+1 cluster tolerate F machine failures

slide-68
SLIDE 68

Available

slide-69
SLIDE 69

Available

slide-70
SLIDE 70

Available

slide-71
SLIDE 71

Unavailable

slide-72
SLIDE 72

Master Election

Fast recovery (5-10*typical RTT) from temporarily unavailable

slide-73
SLIDE 73

Available

Leader Follower

slide-74
SLIDE 74

Leader Follower

Available

slide-75
SLIDE 75

Leader Follower

Temporarily Unavailable

slide-76
SLIDE 76

Leader Follower

Available

slide-77
SLIDE 77

Durable

log files, snapshots and backups

slide-78
SLIDE 78

Mistakes so far...

slide-79
SLIDE 79

Log files

Filesystems truncate and corrupt data. Solutions:

  • Must use checksumming in the file to ensure

sanity

  • Throwing out broken log files must be

handled by the server

slide-80
SLIDE 80

etcd machine naming

Trusted users to manage unique names across the cluster. This went poorly.

  • Misconfiguration from bugs
  • Misconfiguration by users
  • Machine cloning on the cloud

Solution: etcd data-dir owns a unique uuid.

slide-81
SLIDE 81

sync() in the cloud

Slow, slow, slow:

  • User #1 OpenStack on spinning disk: 6s
  • User #2 AWS EBS backed: 1.5s

Solution:

  • Tune etcd to expect this long latency.
  • Write batching and handling of behind

machines.

slide-82
SLIDE 82

Wednesday 10:40am LCA CoreOS: An Introduction Wednesday 6:00pm AKL Continuous Delivery Meetup. CoreOS: An Introduction Thursday 6:00 PM Go AKL Meetup Something about Go Friday 10:40am LCA CoreOS Tutorial

slide-83
SLIDE 83

Thanks

we like pull requests github.com/coreos/etcd