EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 Slides - - PowerPoint PPT Presentation

eecs 591
SMART_READER_LITE
LIVE PREVIEW

EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 Slides - - PowerPoint PPT Presentation

EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 Slides by: Lorenzo Alvisi B YZANTINE F AULT T OLERANCE A HIERARCHY OF FAILURE MODELS Fail-stop Crash Send omission Receive omission = benign failures General omission Arbitrary


slide-1
SLIDE 1

EECS 591 DISTRIBUTED SYSTEMS

Manos Kapritsos Fall 2020 Slides by: Lorenzo Alvisi

slide-2
SLIDE 2

BYZANTINE FAULT TOLERANCE

slide-3
SLIDE 3

A HIERARCHY OF FAILURE MODELS

Crash Fail-stop Send omission Receive omission General omission Arbitrary (Byzantine) failures = benign failures

slide-4
SLIDE 4

WHAT ARE BYZANTINE FAILURES

The short answer: they can be anything! Examples of commission failures

A bit flip in memory Manufacturing defect Alpha particles Network card malfunction Intentional behavior Rational node: trying to game the system for personal gain Malicious node: trying to bring the system down

(they can even be crash/omission failures)

slide-5
SLIDE 5
slide-6
SLIDE 6

THE BYZANTINE GENERALS

Synchronous communication One general may be a traitor

slide-7
SLIDE 7

THE BYZANTINE GENERALS

Synchronous communication One general may be a traitor One of the generals is the commander C The commander decides Attack or Retreat Goals 1.If C is trustworthy, every trustworthy general must follow C’s orders 2.Every trustworthy general must follow the same battle plan

slide-8
SLIDE 8

REMEMBER WHEN THINGS WERE SIMPLER?

C G1 G2

Attack A t t a c k

slide-9
SLIDE 9

C G1 G2

Attack A t t a c k

YOU CAN’T TRUST ANYONE THESE DAYS…

slide-10
SLIDE 10

YOU CAN’T TRUST ANYONE THESE DAYS…

C G1 G2

Attack R e t r e a t He said “retreat” He said “attack”

slide-11
SLIDE 11

YOU CAN’T TRUST ANYONE THESE DAYS…

C G1 G2

Attack R e t r e a t He said “retreat” He said “attack”

C G1 G2

Attack A t t a c k He said “retreat”

slide-12
SLIDE 12

“BUT THEY WERE ALL OF THEM DECEIVED…”

C G1 G2

Attack A t t a c k He said “retreat”

C G1 G2

Retreat R e t r e a t He said “attack”

C G1 G2

Attack R e t r e a t He said “attack” He said “retreat”

slide-13
SLIDE 13

A LOWER BOUND

Theorem There is no algorithm that solves TRB for Byzantine failures if

Lamport, Shostak and Pease, The Byzantine Generals Problem, 1982

slide-14
SLIDE 14

ADMINISTRIVIA

Presentations

Start on Monday 10/19

Midterm

Wednesday 10/21, 3-5pm, in class

Problem set #2

Due Monday 10/12, before class, by email to both Eli and Manos

Project topic declaration

Due tomorrow

slide-15
SLIDE 15

PBFT: A BYZANTINE RENAISSANCE

Practical Byzantine Fault Tolerance

(Castro, Liskov 1999-2000)

First practical protocol for asynchronous BFT replication Like Paxos, PBFT is safe all the time, and live during periods of synchrony

slide-16
SLIDE 16

Barbara Liskov Turing Award 2008

slide-17
SLIDE 17

THE SETUP

System model

Asynchronous system Unreliable channels

Crypto

Public/private key pairs Signatures

Service

Byzantine clients Up to Byzantine servers total servers

System goals

Always safe Live during periods of
 synchrony Collision-resistant hashes

slide-18
SLIDE 18

THE GENERAL IDEA

General idea.

One primary, 3f replicas Execution proceeds as a sequence of views A view is a configuration with a well-defined primary Client sends signed commands to primary of current view Primary assigns sequence number to client’s command Primary is responsible for the command eventually being decided

Primary Replicas

1 2 3 4 5 6 7 8

A A

slide-19
SLIDE 19

WHAT COULD POSSIBLY GO WRONG!?

The primary could be faulty!

could ignore commands, assign same sequence number to different requests, skip sequence numbers, etc. Backups monitor primary’s behavior and trigger view changes to replace a faulty primary

Replicas could be faulty!

could incorrectly forward commands received by a correct primary any single request may be misleading; need to rely on quorums of requests could send incorrect responses to the client client waits for matching responses before accepting

slide-20
SLIDE 20

CERTIFICATES

Protocol steps are justified by certificates

Sets (quorums) of signed messages from distinct replicas proving that a property holds

Certificates are of size at least

Any two quorums intersect in at least one correct replica (for safety) There is always a quorum of correct replicas (for liveness)

slide-21
SLIDE 21

PBFT: NORMAL OPERATION

Three phases:

Pre-prepare Prepare
 Commit assigns sequence number to request ensures consistent ordering of requests within views ensures consistent ordering of requests across views

Each replica maintains the following state:

Service state A message log with all messages sent or received An integer representing the replica’s current view

slide-22
SLIDE 22

CLIENT ISSUES REQUEST

Primary Replica 1 Replica 2 Replica 3

<REQUEST, o, t, c> σc

slide-23
SLIDE 23

CLIENT ISSUES REQUEST

Primary Replica 1 Replica 2 Replica 3

<REQUEST, o, t, c> σc

state machine operation

slide-24
SLIDE 24

CLIENT ISSUES REQUEST

Primary Replica 1 Replica 2 Replica 3

<REQUEST, o, t, c> σc

timestamp

slide-25
SLIDE 25

σc

CLIENT ISSUES REQUEST

Primary Replica 1 Replica 2 Replica 3

<REQUEST, o, t, c>

client ID

slide-26
SLIDE 26

CLIENT ISSUES REQUEST

Primary Replica 1 Replica 2 Replica 3

<REQUEST, o, t, c> σc

client signature

slide-27
SLIDE 27

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

slide-28
SLIDE 28

σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas

current view

slide-29
SLIDE 29

σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas

sequence number

1 2 3 4 5 6 7 8

slide-30
SLIDE 30

σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas

client request

slide-31
SLIDE 31

σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas

digest of m

slide-32
SLIDE 32

σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas

Correct backup k accepts PRE-PREPARE if:

message is well formed k is in view v k has not accepted another PRE-PREPARE message for v, n with a different d n is between two watermarks L and H 
 (to prevent sequence number exhaustion)

slide-33
SLIDE 33

σp

PRE-PREPARE

Primary Replica 1 Replica 2 Replica 3

Primary sends <<PRE-PREPARE, v, n, d> , m> to all replicas

Each accepted PRE-PREPARE message is stored in the accepting replica’s message log (including the primary’s)

slide-34
SLIDE 34

Replica k sends <PREPARE, v, n, d, k> to all replicas σk

PREPARE

Primary Replica 1 Replica 2 Replica 3

Pre-prepare phase

slide-35
SLIDE 35

Replica k sends <PREPARE, v, n, d, k> to all replicas σk

PREPARE

Primary Replica 1 Replica 2 Replica 3

Pre-prepare phase

Correct backup k accepts PREPARE if:

message is well formed k is in view v n is between two watermarks L and H

slide-36
SLIDE 36

Replica k sends <PREPARE, v, n, d, k> to all replicas σk

PREPARE

Primary Replica 1 Replica 2 Replica 3

Pre-prepare phase

Each accepted PREPARE message is stored in the accepting replica’s message log Replicas that send a PREPARE accept the assignment of m to sequence number n in view v

slide-37
SLIDE 37

P CERTIFICATE

REPARE

P-Certificates ensure consistent order of requests within views A replica produces a P-Certificate(m,v,n) iff its log holds:

the request m A PRE-PREPARE for m in view v with sequence number n PREPARE from distinct backups that match the PRE-PREPARE

A P-Certificate(m,v,n) means that a quorum agrees to assign m to sequence number n in view v

No two non-faulty replicas with P-Certificate(m,v,n) and P- Certificate(m’,v,n)