@lenadro id Fundamental techniques and building blocks @lenadro - - PowerPoint PPT Presentation

lenadro id fundamental techniques and building blocks
SMART_READER_LITE
LIVE PREVIEW

@lenadro id Fundamental techniques and building blocks @lenadro - - PowerPoint PPT Presentation

Channel Into Universe Of Eventually Perfect Distributed Systems @lenadro id Fundamental techniques and building blocks @lenadro id Are fundamentals still important? @lenadro id Your System Your Trade-Offs @lenadro id hm Gap


slide-1
SLIDE 1

@lenadro id

Channel Into Universe Of Eventually Perfect Distributed Systems

slide-2
SLIDE 2

@lenadro id

Fundamental techniques 
 and building blocks

slide-3
SLIDE 3

@lenadro id

Are fundamentals still important?

slide-4
SLIDE 4

@lenadro id

Your System Your Trade-Offs

slide-5
SLIDE 5

@lenadro id

Theory Practice

hm…

Gap

slide-6
SLIDE 6

@lenadro id

There are challenges

slide-7
SLIDE 7

@lenadro id

There are challenges Road to Correctness and Understanding

slide-8
SLIDE 8

@lenadro id

Simple problems become hard

slide-9
SLIDE 9

@lenadro id

Ordering is Hard

slide-10
SLIDE 10

@lenadro id

A B 1 2 2 3 3 1 2

Lamport Clock

2 and 3 could be concurrent

slide-11
SLIDE 11

@lenadro id

A B C

{A:2, B:1, C:1} {A:2, B:1, C:2} {A:3, B:1, {A:4, B:1, {A:2, B:1, {A:2, B:2, { A:1, B:1 } {A:2, B:1 } { B:1 }

X Y

Vector Clock

slide-12
SLIDE 12

@lenadro id

Agreement In Distributed Systems

slide-13
SLIDE 13

@lenadro id

Two Phase Commit


slide-14
SLIDE 14

@lenadro id

Blocking Failure in Two Phase Commit

Committed

X

Crashed

OK OK OK

Not Committed Not Committed

Crashed

X

Nodes are blocked! Can’t decide!

? ? ?

Can’t decide Can’t decide Can’t decide Not Committed

slide-15
SLIDE 15

@lenadro id

Three Phase Two Phase

Hm… it’s blocking when there’re

slide-16
SLIDE 16

@lenadro id

X O O O

? ? ?

Not Pre- Not Pre- Not Pre-

Pre- committ

?

Crash

slide-17
SLIDE 17

@lenadro id

Abor Abor Abor

Comm

X

Crash

slide-18
SLIDE 18

@lenadro id

Three Phase Two Phase

Hm… it’s blocking when there are Might be inconsistent in asynchronous environment

slide-19
SLIDE 19

@lenadro id “Distributed consensus is impossible in asynchronous system where at least one node can fail.”

???

X

FLP: Impossibility Result

slide-20
SLIDE 20

@lenadro id FLP

Three Phase Two Phase

Might be inconsistent in asynchrono Hm… it’s blocking when there’re

X

Paxos

Fast Paxos Multi-Paxos Vertical Classical Cheap Paxos Zab Raf Chandra- Toueg

slide-21
SLIDE 21

@lenadro id

Paxos

slide-22
SLIDE 22

@lenadro id

Trade-offs Optimizations

Weak or strong leader? Proposal Copying Quorum size? Distinguished Proposer Number of Failures Combining Roles Tolerated? Strategies for Proposal Numbers

slide-23
SLIDE 23

@lenadro id

Discovering New Trade-offs and Optimizations

Quorum intersection revised Quorum based value selection Proposal numbers uniqueness And many more…

cl.cam.ac.uk/techreports/UCAM-CL-TR-935.pdf by Dr. Heidi Howard

slide-24
SLIDE 24

@lenadro id

Consistent Replication?

slide-25
SLIDE 25

@lenadro id

dl.acm.org/citation.cfm?id=3183713.3196937

slide-26
SLIDE 26

@lenadro id

Conflict-Free Replicated Data Types

slide-27
SLIDE 27

@lenadro id

(1, 0, 0) (0, 1, 1) (0, 0, 0) (1, 1, 1)

(_, 1, 1) (_, 1, 1) (1, _, _) (1, _, _)

slide-28
SLIDE 28

@lenadro id

add X delete X delete X add X ?

slide-29
SLIDE 29

@lenadro id

Cosmos DB

x: { a } x: { b, c } x: { a, b, c }

slide-30
SLIDE 30

@lenadro id

Failure Detection

slide-31
SLIDE 31

@lenadro id Completeness: can all nodes discover all the failures? Accuracy: how precise can a node be in its failure suspicions?

slide-32
SLIDE 32

@lenadro id

Understanding Trade-offs Helps

✓ To make the right choices
 ✓ To know what correct means for us
 ✓ To verify and maintain correctness in practice

slide-33
SLIDE 33

@lenadro id

Maintaining Correctness In Real Systems

slide-34
SLIDE 34

@lenadro id

Model Checking

slide-35
SLIDE 35

@lenadro id

Verifying and Maintaining Correctness in Practical Real-World Systems

slide-36
SLIDE 36

@lenadro id

Kafka System Tests

testing.confluent.io/confluent-kafka-system-test-results 450+ system tests, 6800+ unit tests, 600+ integration tests confluent.io/blog/apache-kafka-tested

slide-37
SLIDE 37

@lenadro id

Cassandra Tests

cassandra.apache.org/blog/2018/10/17/finding_bugs_with_property_based_testing.html 
 cassandra.apache.org/blog/2018/08/21/testing_apache_cassandra.html

Replay testing Dynamic test generation Property-based testing and fuzzing Distributed tests and fault-injection Upgrade testing

slide-38
SLIDE 38

@lenadro id + Model-checking + Property-based testing and fuzzing + Performance and upgrade testing + Unit and integration testing + Fault injection + Attention to exception handling logic + Сode reviews

slide-39
SLIDE 39

@lenadro id

Take-Aways

slide-40
SLIDE 40

@lenadro id

✓ Know your trade-offs ✓ Create understandable systems
 ✓ Invest in correctness, it doesn’t come for free
 ✓ Don’t trust: test and verify
 ✓ Automate, but be ready when things fail
 ✓ Remember the real problem you are solving

slide-41
SLIDE 41

@lenadro id

slide-42
SLIDE 42

@lenadro id

#SystemsYouUnderstand