Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi - - PDF document

comunicazione nei sistemi distribuiti
SMART_READER_LITE
LIVE PREVIEW

Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi - - PDF document

Universit degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Comunicazione nei Sistemi Distribuiti Parte 2 Corso di Sistemi Distribuiti e Cloud Computing A.A. 2018/19 Valeria Cardellini


slide-1
SLIDE 1

Comunicazione nei Sistemi Distribuiti

Parte 2

Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica

Corso di Sistemi Distribuiti e Cloud Computing A.A. 2018/19 Valeria Cardellini

Comunicazione orientata ai messaggi

  • RPC migliora la trasparenza della distribuzione
  • Ma non è un meccanismo sempre adatto a

supportare la comunicazione in un SD

– Ad es. quando non si può essere certi che il destinatario sia in esecuzione

  • Alternativa: comunicazione orientata ai messaggi

– Di tipo transiente

  • Berkeley socket: già esaminata in altri corsi
  • Message Passing Interface (MPI)

– Di tipo persistente

  • Message Oriented Middleware (MOM)

Valeria Cardellini – SDCC 2018/19 1

slide-2
SLIDE 2

Message Passing Interface (MPI)

  • Libreria per lo scambio di messaggi tra processi in

esecuzione su nodi diversi

– Specifica della sola interfaccia (http://www.mpi-forum.org/) – Diverse implementazioni, tra cui Open MPI (http:// www.open-mpi.org/) e MPICH (http://www.mcs.anl.gov/ research/projects/mpich2/) – Standard de facto per la comunicazione tra i nodi di un sistema che esegue un programma parallelo sviluppato per un’architettura a memoria distribuita

  • MPI definisce una serie di primitive per la

comunicazione tra processi; in particolare:

– Primitive per la comunicazione punto-punto: per l’invio e la ricezione di un messaggio tra due processi diversi – Primitive per la comunicazione collettiva

Valeria Cardellini – SDCC 2018/19 2

Comunicazione punto-punto in MPI

  • Principali primitive per la comunicazione punto-punto:

– MPI_Send e MPI_Recv: comunicazione bloccante

  • MPI_Send con modalità sincrona o bufferizzata a seconda

dell’implementazione

– MPI_Bsend: invio bloccante bufferizzato – MPI_Ssend: invio sincrono bloccante – MPI_Isend e MPI_Irecv: comunicazione non bloccante

Primitive MPI Significato MPI_Bsend Aggiunge il messaggio in uscita ad un buffer per l’invio MPI_Send Invia il messaggio e aspetta finché non viene copiato in un buffer locale o remoto MPI_Ssend Invia il messaggio e aspetta finché non inizia la ricezione MPI_Isend Invia il riferimento al messaggio in uscita e continua MPI_Recv Riceve il messaggio; si blocca se non ce ne sono

Valeria Cardellini – SDCC 2018/19 3

slide-3
SLIDE 3

Esempio di comunicazione in MPI

#include <stdio.h> #include <string.h> #include <mpi.h> int main (int argc, char **argv) { int myrank; char message[20]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf("Il mio rank e' : %d\n", myrank); if (myrank == 0) { //Invia un messaggio al processo 1 strcpy(message, "PROVA"); MPI_Send(message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD); printf("%d) Ho inviato: '%s'\n", myrank, message); } else if (myrank==1) { //Riceve il messaggio dal processo 0 MPI_Recv(message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status); printf("%d) Ho ricevuto: '%s'\n", myrank, message); } MPI_Finalize(); return 0; }

MPI_Send(buf, count, datatype, dest, tag, comm) MPI_Recv(buf, count, datatype, source, tag, comm, status)

Valeria Cardellini – SDCC 2018/19 4

Message-oriented middleware

  • Communication middleware that supports sending

and receiving messages in a persistent way

  • Loose coupling among system/application

components

– Decoupling in time and space – Can also support synchronization decoupling

  • Two patterns:

– Message queue – Publish-subscribe (pub/sub)

  • And two related types of systems:

– Message queue system (MQS) – Pub/sub system

Valeria Cardellini – SDCC 2018/19 5

slide-4
SLIDE 4

Queue message pattern

  • Messages are put into queue
  • Multiple consumers can read from the queue
  • Each message is delivered to only one consumer
  • Principles

– Loose coupling – Service statelessness

  • Services minimize resource consumption by deferring the

management of state information when necessary

  • Apps:

– Task scheduling, load balancing, collaboration

Valeria Cardellini – SDCC 2018/19 6

Queue message pattern

Valeria Cardellini – SDCC 2018/19 7

A sends a message to B B issues a response message back to A

slide-5
SLIDE 5

Message queue API

  • Basic interface to a queue in a MQS:

– put: nonblocking send

  • Append a message to a specified queue

– get: blocking receive

  • Block until the specified queue is nonempty and remove the

first message

  • Variations: allow searching for a specific message in the

queue, e.g., using a matching pattern

– poll: nonblocking receive

  • Check a specified queue for message and remove the first
  • Never block

– notify: nonblocking receive

  • Install a handler (callback function) to be automatically

called when a message is put into the specified queue

8 Valeria Cardellini – SDCC 2018/19

Publish/subscribe pattern

Valeria Cardellini – SDCC 2018/19 9

  • Application components can publish asynchronous

messages (e.g., event notifications), and/or declare their interest in message topics by issuing a subscription

slide-6
SLIDE 6

Publish/subscribe pattern

Valeria Cardellini – SDCC 2018/19 10

  • Multiple consumers can subscribe to topic with or

without filters

  • Subscriptions are collected by an event dispatcher

component, responsible for routing events to all matching subscribers

– For scalability reasons, its implementation can be distributed

  • High degree of decoupling among components

– Easy to add and remove components – Appropriate for dynamic environments

Publish/subscribe pattern

  • A sibling of message queue pattern but further

generalizes it by delivering a message to multiple consumers

– Message queue: delivers messages to only one receiver, i.e., one-to-one communication – Pub/sub channel: delivers messages to multiple receivers, i.e., one-to-many communication

11 Valeria Cardellini – SDCC 2018/19

slide-7
SLIDE 7

Publish/subscribe API

  • Calls that capture the core of any pub/sub system:

– publish(event): to publish an event

  • Events can be of any data type supported by the given

implementation languages and may also contain meta-data

– subscribe(filter expr, notify_cb, expiry) → sub handle: to subscribe to an event

  • Takes a filter expression, a reference to a notify callback for

event delivery, and an expiry time for the subscription registration.

  • Returns a subscription handle

– unsubscribe(sub handle) – notify_cb(sub_handle, event): called by the pub/sub system to deliver a matching event

12 Valeria Cardellini – SDCC 2018/19

MOM functionalities

  • MOM handles the complexity of addressing,

routing, availability of communicating application components (or applications), and message format transformations

Source: “Cloud Computing Patterns”, http://bit.ly/2hZv6Xs

Valeria Cardellini – SDCC 2018/19 13

slide-8
SLIDE 8

MOM functionalities

  • Let us analyze

– Semantics delivery – Message routing – Message transformations

Valeria Cardellini – SDCC 2018/19 14

Semantics delivery in MOM

  • At-least-once delivery

– How can MOM ensure that messages are received successfully? – By sending ack for each retrieved message and resending message if message is not received – Be careful: app should be tolerant to message duplications

Valeria Cardellini – SDCC 2018/19 15

slide-9
SLIDE 9

Semantics delivery in MOM

  • Exactly-once delivery

– How can MOM ensure that a message is delivered only exactly once to a receiver? – By filtering possible message duplicates automatically – Upon creation, each message is associated with a unique message ID, which is used to filter message duplicates during their traversal from sender to receiver – Messages must also survive MOM components’ crashes

Valeria Cardellini – SDCC 2018/19 16

Semantics delivery in MOM

  • Transaction-based delivery

– How can MOM ensure that messages are only deleted from a message queue if they have been received successfully? – MOM and the receiver participate in a transaction: all

  • perations involved in the reception of a message are

performed under one transactional context guaranteeing ACID behavior

Valeria Cardellini – SDCC 2018/19 17

slide-10
SLIDE 10

Semantics delivery in MOM

  • Timeout-based delivery

– How can MOM ensure that messages are only deleted from a message queue if they have been received successfully at least once? – Messages are not deleted immediately from the queue, but marked as being invisible – Invisible message cannot be read by another client – After client ack of message receipt, the message is deleted from the queue

Valeria Cardellini – SDCC 2018/19 18

Message routing: general model

  • Queues are managed by queue managers (QMs)

– An application can put messages only into a local queue – Getting a message is possible by extracting it from a local queue only

  • QMs need to route messages

– Function as message-queuing “relays” that interact with distributed applications & each other – Support the idea of an overlay network – Also special queue managers that operate as routers

Valeria Cardellini – SDCC 2018/19 19

slide-11
SLIDE 11

Message routing: overlay network

  • Overlay network is used to route messages

– By using routing tables – Routing tables stored and managed by QMs

Valeria Cardellini – SDCC 2018/19 20

  • The overlay network needs

to be maintained over time

– Routing tables are usually set up and managed manually – Dynamic overlay networks require to dynamically manage the mapping between queue names and their location

Message transformation: message broker

  • New/existing apps that need to be integrated into a

single, coherent system rarely agree on a common data format

  • How to handle data heterogeneity?

– We have already examined different solutions in the context of RPCs

– Now let’s focus on the message broker

  • Message broker: component that usually takes care of

application heterogeneity in a MOM

Valeria Cardellini – SDCC 2018/19 21

slide-12
SLIDE 12

Message broker: general architecture

  • Message broker handles application heterogeneity

– Converts incoming messages to target format providing access transparency – Very often acts as an application gateway – Manages a repository of conversion rules and programs to transform a message of one type to another – May provide subject-based routing capabilities – To be scalable and reliable can be implemented in a distributed way

Valeria Cardellini – SDCC 2018/19 22

MOM frameworks

  • Examples of MOM middleware

– IBM MQ – Microsoft Message Queueing (MSMQ) – Java Message Service (JMS): API MOM for Java – Open MQ – RabbitMQ – NATS https://nats.io – Apache ActiveMQ http://activemq.apache.org – Apache Kafka

  • Also Cloud-based products

– Amazon Simple Queue Service (SQS) – Google Cloud Pub/Sub

  • Not always a clear distinction between queue message

and pub/sub patterns

– Some frameworks (e.g., RabbitMQ, Kafka, NATS) support both – Others not (e.g., redis is only pub/sub)

Valeria Cardellini – SDCC 2018/19 23

slide-13
SLIDE 13

Some examples of MOM usage

1. Accept and forward messages which are sent by a producer and received by a consumer 2. Distribute time-consuming tasks among multiple workers 3. Deliver messages to many consumers at once (pub/sub pattern) 4. Receive messages selectively 5. Run a function on a remote node and wait for the result

Valeria Cardellini – SDCC 2018/19 24

Source: RabbitMQ tutorial http://bit.ly/2zPPMJO

IBM MQ

  • The first enterprise messaging technology, from 1993
  • Basic concepts:

– Application-specific messages are put into and removed from queues – Queues reside under the regime of a queue manager (QM) – Processes can put messages only in local queues, or through an RPC mechanism

  • Message transfer

– Messages are transferred between queues – Message transfer between process queues requires a channel

  • At each endpoint of channel is a

message channel agent (MCA)

Valeria Cardellini – SDCC 2018/19 25

https://www.ibm.com/products/mq

  • MCAs are responsible for:

– Setting up channels – Sending/receiving messages – Also message encryption

slide-14
SLIDE 14

IBM MQ (2)

  • Principles of operation:

– Channels are inherently unidirectional – Automatically start MCAs when messages arrive – Any network of queue managers can be created – Routes are set up manually (system administration) – Routing: by using logical names, in combination with name resolution to local queues, it is possible to route message to remote queue

Valeria Cardellini – SDCC 2018/19 26

Amazon Simple Queue Service (SQS)

  • Cloud-based message queue service based on polling

model

– Goal: to decouple the components of cloud applications – Message queues are hosted within AWS infrastructure – Messages are stored in queues for a limited period of time

  • Application components using SQS can run

independently and asynchronously and be developed with different technologies

  • Provides timeout-based delivery

– Messages are only deleted from a message queue if they have been received properly – A received message is locked during processing (visibility timeout); if processing fails, the lock expires and the message is available again

  • Can be combined with Amazon SNS

– To push a message to multiple SQS queues in parallel

Valeria Cardellini – SDCC 2018/19 27

slide-15
SLIDE 15

Amazon SQS: API

  • CreateQueue, ListQueues, DeleteQueue

– Create, list, delete queues

  • SendMessage, ReceiveMessage

– Add/receive messages to/from a specified queue (message size up to 256 KB)

  • DeleteMessage

– Remove a received message from a specified queue (the component must delete the message after receiving and processing it)

  • ChangeMessageVisibility

– Change the visibility timeout of a specified message in a queue (when received, the message remains in the queue upon it is deleted explicitly by the receiver)

  • SetQueueAttributes, GetQueueAttributes

– Control queue settings, get information about a queue

Valeria Cardellini – SDCC 2018/19 28

Amazon SQS: example

Valeria Cardellini – SDCC 2018/19

  • Example of application using SQS: online photo

processing service

http://bit.ly/2gwJFBw

29

slide-16
SLIDE 16

Apache Kafka

  • General-purpose, distributed pub/sub system
  • Originally developed in 2010 by LinkedIn
  • Written in Scala
  • Horizontally scalable
  • High throughput

– Billions of messages

  • Fault-tolerant

Kreps et al., “Kafka: A Distributed Messaging System for Log Processing”, NetDB’11.

Valeria Cardellini – SDCC 2018/19 30

  • Delivery guarantees

– At least once: guarantees no loss, but duplicated packets, possibly out-

  • f-order

– From 2017, exactly once: guarantees no-loss and no- duplicates, but requires expensive end-to-end 2PC

Kafka at a glance

  • Kafka maintains feeds of messages in categories called

topics

  • Producers: publish messages to a Kafka topic
  • Consumers: subscribe to topics and process the feed of

published message

  • Kafka cluster: distributed log of data over servers known

as brokers

– Brokers rely on Apache Zookeeper for coordination

Valeria Cardellini – SDCC 2018/19 31

slide-17
SLIDE 17

Kafka: topics

  • Topic: a category to which the message is published
  • For each topic, Kafka cluster maintains a partitioned log

– Log (data structure!): append-only, totally-ordered sequence of records ordered by time

  • Topics are split into a pre-defined number of partitions

– Partition: unit of parallelism of the topic

  • Each partition is replicated with some replication factor

Valeria Cardellini – SDCC 2018/19

> bin/kafka-topics.sh --create --zookeeper localhost: 2181 --replication-factor 1 --partitions 1 --topic test!

  • CLI command to create a topic with a single partition and one replica

32

Kafka: partitions

  • Each partition is an ordered, numbered, immutable

sequence of records that is continually appended to

– Like a commit log

  • Each record is associated with a sequence ID

number called offset

  • Partitions are distributed across brokers
  • Each partition is replicated for fault tolerance

Valeria Cardellini – SDCC 2018/19 33

slide-18
SLIDE 18

Kafka: partitions

  • Each partition is replicated across a configurable

number of brokers

  • Each partition has one leader broker and 0 or more

followers

  • The leader handles read and write requests

– Read from leader – Write to leader

  • A follower replicates the leader and acts as a backup
  • Each broker is a leader for some of it partitions and a

follower for others to load balance

  • ZooKeeper is used to keep the brokers consistent

Valeria Cardellini – SDCC 2018/19 34

Kafka: partitions

Valeria Cardellini – SDCC 2018/19 35

slide-19
SLIDE 19

Kafka: producers

  • Publish data to topics of their choice
  • Also responsible for choosing which record to assign

to which partition within the topic

– Round-robin or partitioned by keys

  • Producers = data sources

Valeria Cardellini – SDCC 2018/19

> bin/kafka-console-producer.sh --broker-list localhost: 9092 --topic test! This is a message! This is another message!

  • Run the producer

36

Kafka: consumers

Valeria Cardellini – SDCC 2018/19

  • Consumer Group: set of consumers sharing a common group ID

– A Consumer Group maps to a logical subscriber – Each group consists of multiple consumers for scalability and fault tolerance

  • Consumers use the offset to track which messages have been

consumed

– Messages can be replayed using the offset

  • Run the consumer

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning!

37

slide-20
SLIDE 20

Kafka: ZooKeeper

  • Zookeeper: hierarchical, distributed key-value store

– Widely used coordination and synchronization service for large distributed systems – Often used for leader election (we’ll study Paxos as consensus algorithm) – Used in Kafka, Mesos, Storm, …

  • Kafka uses ZooKeeper to coordinate between the

producers, consumers and brokers

Valeria Cardellini – SDCC 2018/19 38

  • ZooKeeper stores Kafka

metadata

  • List of brokers
  • List of consumers and

their offsets

  • List of producers

Kafka: ordering guarantees

  • Messages sent by a producer to a particular topic

partition will be appended in the order they are sent

  • Consumer sees records in the order they are stored

in the log

  • Strong guarantees about ordering only within a

partition

– Total order over messages within a partition, but Kafka cannot preserve order between different partitions in a topic

  • Per-partition ordering combined with the ability to

partition data by key is sufficient for most applications

Valeria Cardellini – SDCC 2018/19 39

slide-21
SLIDE 21

Kafka: fault tolerance

  • Replicates partitions for fault tolerance
  • Kafka makes a message available for

consumption only after all the followers acknowledge to the leader a successful write

– Implies that a message may not be immediately available for consumption

  • Kafka retains messages for a configured

period of time

– Messages can be “replayed” in case a consumer fails

Valeria Cardellini – SDCC 2018/19 40

Kafka: APIs

  • Four core APIs
  • Producer API: allows app to

publish streams of records to

  • ne or more Kafka topics
  • Consumer API: allows app to

subscribe to one or more topics and process the stream

  • f records produced to them
  • Connector API: allows building

and running reusable producers or consumers that

Valeria Cardellini – SDCC 2018/19

connect Kafka topics to existing applications or data systems so to move large collections of data into and

  • ut of Kafka

41

slide-22
SLIDE 22

Kafka: APIs

  • Streams API: allows app to

act as a stream processor, transforming an input stream from one or more topics to an

  • utput stream to one or more
  • utput topics
  • Can use Kafka Streams to

process data in pipelines consisting of multiple stages

Valeria Cardellini – SDCC 2018/19 42

Kafka: client library

  • JVM internal client
  • Plus rich ecosystem of clients, among which:

– Sarama: Go library for Kafka

https://shopify.github.io/sarama/

– Python library for Kafka

https://github.com/confluentinc/confluent-kafka-python/

  • NodeJS client

https://github.com/Blizzard/node-rdkafka

Valeria Cardellini – SDCC 2018/19 43

slide-23
SLIDE 23

Apache Kafka within Monasca

  • Monasca is a monitoring-as-a-service solution

integrated with OpenStack

– OpenStack: a set of software tools for building and managing Cloud platforms for public and private clouds

  • Monasca uses Kafka as message queue system

Valeria Cardellini – SDCC 2018/19 44

Protocols for MOM

  • Not only systems but also open standard protocols

for message queues

– AMQP (Advanced Message Queueing Protocol)

  • https://www.amqp.org
  • Binary protocol

– MQTT (Message Queue Telemetry Transport)

  • http://mqtt.org
  • Binary protocol

– STOMP (Simple (or Streaming) Text Oriented Messaging Protocol)

  • http://stomp.github.io
  • Text-based protocol
  • Goals:

– Platform- and vendor-agnostic – Provide interoperability between different MOMs

Valeria Cardellini – SDCC 2018/19 45

slide-24
SLIDE 24

Messaging protocols and IoT

  • Often used in Internet of Things (IoT) projects

– Use a message queueing protocol to send data from sensors to services that will process those data – Exploit all the MOM advantages seen so far:

  • Decoupling
  • Resiliency: a MOM provides a temporary message storage
  • Traffic spikes handling: data will be persisted in MOM and

processed eventually

Valeria Cardellini – SDCC 2018/19 46

AMQP: characteristics

  • Open-standard protocol for MOM, supported by

industry

– Current version: 1.0

http://docs.oasis-open.org/amqp/core/v1.0/amqp-core-complete-v1.0.pdf

– Approved in 2014 as ISO and IEC International Standard

  • Binary, application-level protocol

– Based on TCP protocol with additional reliability mechanisms (at-most once, at-least once, exactly once delivery)

  • Programmable protocol

– Several entities and routing schemes are primarily defined by apps

  • Implementations

– Apache ActiveMQ, RabbitMQ, Apache Qpid, …

Valeria Cardellini – SDCC 2018/19 47

slide-25
SLIDE 25

AMQP: model

  • The AMQP architecture involves three main actors:

– Publishers, subscribers, and brokers

  • AMQP entities (within the broker): queues, exchanges

and bindings

– Messages are published to exchanges (like post offices or mailboxes) – Exchanges then distribute message copies to queues using rules called bindings – Then AMQP brokers either deliver messages to consumers subscribed to queues, or consumers fetch/pull messages from queues on demand

Valeria Cardellini – SDCC 2018/19 48

https://bit.ly/2oP683F

AMQP: routing

  • Bindings:

– Direct exchange: delivers messages to queues based on the message routing key – Fanout exchange: delivers messages to all

  • f the queues that are

bound to it

Valeria Cardellini – SDCC 2018/19 49

slide-26
SLIDE 26

AMQP: routing

  • Bindings:

– Topic Exchange: delivers messages to one or many queues based on topic matching

  • Often used to implement various publish/subscribe

pattern variations

  • Commonly used for the multicast routing of messages
  • Example use: distributing data relevant to specific

geographic location (e.g., points of sale)

– Headers Exchange: delivers messages based on multiple attributes expressed as headers

  • To route on multiple attributes that are more easily

expressed as message headers than a routing key

Valeria Cardellini – SDCC 2018/19 50

AMQP: messages

  • The AMQP protocol defines two types of messages:

– Bare messages, that are supplied by the sender – Annotated messages, that are seen at the receiver and are added by intermediaries during transit

  • The header conveys the delivery parameters

– Including durability requirements, priority, time to live

Valeria Cardellini – SDCC 2018/19 51

Annotated message

slide-27
SLIDE 27

Comunicazione multicast

  • Comunicazione multicast: schema di comunicazione

in cui i dati sono inviati a molteplici destinatari

– Comunicazione broadcast: caso particolare della multicast, in cui i dati sono spediti a tutti i destinatari connessi in rete – Esempi di applicazioni multicast one-to-many: distribuzione di risorse audio/video, distribuzione di file – Esempi di applicazioni multicast many-to-many: servizi di conferenza, giochi multiplayer, simulazioni distribuite interattive

  • La tradizionale comunicazione one-to-one non scala

Unicast di un video a 1000 utenti Multicast di un video a 1000 utenti

Valeria Cardellini – SDCC 2018/19 52

Tipologie di multicast

  • Come realizzare il multicast?

– Multicast a livello di rete – Multicast a livello applicativo

Valeria Cardellini – SDCC 2018/19 53

slide-28
SLIDE 28

Multicast a livello di rete

  • Replicazione dei pacchetti e routing gestiti dai router
  • Multicast a livello IP (IPMC) basato sui gruppi

– Generalizza UDP con trasmissione uno-a-molti – Gruppo: insieme di host interessati alla stessa applicazione multicast, identificati da uno stesso indirizzo IP

  • Indirizzo IP da 224.0.0.0 a 239.255.255.255 assegnato al

gruppo

– Protocollo IGMP (Internet Group Management Protocol) per il join al gruppo

  • Uso limitato per:

– Mancanza di supporto su larga scala (solo ~5% degli AS) – Problema di tener traccia dell’appartenenza ad un gruppo – Ad es. disabilitato in tutte le piattaforme Cloud a causa del problema del broadcast storm (aumento esponenziale del traffico di rete con possibile saturazione)

Valeria Cardellini – SDCC 2018/19 54

Multicast applicativo

  • Replicazione dei pacchetti e routing gestiti

dagli end host

  • Idea di base:

– Organizzare i nodi in una overlay network – Usare l’overlay network per diffondere le informazioni

  • Multicast applicativo:

– Strutturato

  • Creazione di percorsi di comunicazione espliciti

nell’overlay network

– Non strutturato

  • Basato su flooding
  • Basato su gossiping

Valeria Cardellini – SDCC 2018/19 55

slide-29
SLIDE 29

Multicast applicativo strutturato

  • Come costruire in modo strutturato la rete
  • verlay?

– Albero

  • Unico percorso tra ogni coppia di nodi

– Mesh (rete a maglia)

  • Molti percorsi tra ogni coppia di nodi

Valeria Cardellini – SDCC 2018/19 56

Multicast applicativo strutturato: albero

  • Esempio: costruzione di un albero di multicast applicativo

in Scribe

– Scribe: sistema pub/sub con architettura decentralizzata e basato sulla DHT Pastry

  • 1. Il nodo che inizia la sessione multicast genera l’identificatore del

gruppo di multicast (mid)

  • 2. Cerca (tramite Pastry) il nodo responsabile per mid
  • 3. Tale nodo diventa la radice dell’albero di multicast
  • 4. Se il nodo P vuole unirsi all’albero di multicast identificato da mid

invia una richiesta di join

  • 5. Quando la richiesta di join arriva al nodo Q
  • Q non ha mai ricevuto una richiesta di join per mid ⇒ Q diventa

forwarder, P diventa figlio di Q e Q inoltra la richiesta di join verso la radice

  • oppure Q è già un forwarder per mid ⇒ P diventa figlio di Q; non
  • ccorre inoltrare la richiesta di join alla radice
  • M. Castro et al., “Scribe: A large-scale and decentralised application-

level multicast infrastructure”, IEEE JSAC, 2002.

Valeria Cardellini – SDCC 2018/19 57

slide-30
SLIDE 30

Multicast applicativo strutturato: albero

radice join() forwarder forwarder radice join() forwarder forwarder radice join() forwarder forwarder forwarder

Valeria Cardellini – SDCC 2018/19 58

Metriche di costo del multicast applicativo

  • Link stress: quante volte un messaggio di multicast

applicativo attraversa lo stesso collegamento fisico?

– Esempio: il messaggio da A a D attraversa <Ra,Rb> due volte

  • Stretch: rapporto tra il tempo di trasferimento

nell’overlay network e quello nella rete sottostante

– Esempio: i messaggi da B a C seguono un percorso con costo 71 a livello applicativo, ma 47 a livello di rete ⇒ stretch=71/47

Valeria Cardellini – SDCC 2018/19 59

slide-31
SLIDE 31

Multicast applicativo non strutturato

  • Come realizzare il multicast applicativo

non strutturato?

– Flooding: già esaminato

  • Un nodo P invia il messaggio di multicast a tutti i

suoi vicini

  • A sua volta, ogni vicino (se non ha già visto il

messaggio) lo inoltrerà a tutti i suoi vicini (tranne P)

– Gossiping

Valeria Cardellini – SDCC 2018/19 60

Protocolli basati su gossiping

  • Protocolli di tipo probabilistico, detti anche di

gossiping o epidemici

– Essendo basati sulla teoria del gossip nelle reti sociali o della diffusione delle epidemie

  • Permettono la rapida diffusione delle informazioni in

reti a larghissima scala attraverso la scelta casuale dei destinatari successivi tra quelli noti al mittente

– Ogni nodo invia il messaggio ad un sottoinsieme, scelto casualmente, di nodi nella rete – Ogni nodo che lo riceve ne rinvierà una copia ad un altro sottoinsieme, anch’esso scelto casualmente, e così via

Valeria Cardellini – SDCC 2018/19 61

slide-32
SLIDE 32

Le origini

  • Protocolli di gossiping definiti nel 1987 da Demers et al.

in un lavoro sulla garanzia di consistenza in database replicati su centinaia di server

  • Idea di base: assumendo che non vi siano conflitti di

scrittura (ovvero aggiornamenti indipendenti)

– Le operazioni di aggiornamento sono eseguite inizialmente su una o alcune repliche – Una replica comunica il suo stato aggiornato ad un numero limitato di vicini – La propagazione dell’aggiornamento è lazy (non immediata) – Al termine, ogni aggiornamento dovrebbe raggiungere tutte le repliche

  • A. Demers et al., “Epidemic Algorithms for Replicated Database Maintenance”,
  • Proc. 6th Symp. on Principles of Distributed Computing, 1987.

Valeria Cardellini – SDCC 2018/19 62

Why gossiping in large scale DSs?

  • Several attractive properties of gossip-based

information dissemination for large scale distributed systems

– Simplicity of gossiping algorithms – Lack of centralized control and bottlenecks – Scalability: each peer sends only a limited number

  • f messages, independently from the overall size
  • f the system

– Reliability and robustness: thanks to message redundancy

Valeria Cardellini – SDCC 2018/19 63

slide-33
SLIDE 33

Where gossiping is used today?

  • Some examples:

– “Amazon uses a gossip protocol to quickly spread information throughout the S3 system. This allows Amazon S3 to quickly route around failed or unreachable servers, among other things.”

http://amzn.to/1MgDVsl

– Amazon’s Dynamo uses a gossip-based failure detection service – The basic information exchange in BitTorrent is based on gossip

Valeria Cardellini – SDCC 2018/19 64

Modelli di propagazione

  • Consideriamo due modelli di propagazione

– Gossiping puro e anti-entropia

  • Gossiping puro (rumor spreading): un peer che è

stato appena aggiornato (infettato) contatta un altro peer scelto casualmente inviandogli il proprio aggiornamento (infettandolo a sua volta)

  • Anti-entropia: periodicamente ciascun peer sceglie

casualmente un altro peer ed i due peer si scambiano gli aggiornamenti, giungendo al termine ad uno stato simile su entrambi

Valeria Cardellini – SDCC 2018/19 65

slide-34
SLIDE 34

Gossiping puro

  • Un peer P che è stato appena aggiornato, contatta un

peer Q scelto a caso

  • Se Q ha già ricevuto l’aggiornamento (è già infetto), P

perde interesse a diffondere il gossip e con probabilità pari a 1/k smette di contattare altri peer

  • Se s è la frazione di peer non ancora aggiornati, si

dimostra che s = e−(k+1)(1−s)

  • Per garantire che un ampio numero di peer sia

aggiornato, occorre combinare il gossiping puro con l’anti-entropia

Al crescere di k aumenta la probabilità che l’aggiornamento si diffonda

Valeria Cardellini – SDCC 2018/19 66

Anti-entropia

  • Obiettivo: aumentare la similarità tra peer,

aumentando così “l’ordine” (il motivo del nome!)

  • Un peer P sceglie casualmente un altro peer Q nel

sistema; come lo aggiorna?

  • Tre strategie di aggiornamento:

– push: P invia soltanto i suoi aggiornamenti a Q – pull: P prende soltanto gli aggiornamenti da Q – push-pull: P e Q si scambiano reciprocamente gli aggiornamenti (dopodiché possiedono le stesse informazioni)

scelta dati

Valeria Cardellini – SDCC 2018/19 67

scelta dati scelta dati

slide-35
SLIDE 35

Anti-entropia: prestazioni

  • Push-pull

– E’ la strategia più veloce – Impiega O(log N) round per propagare un aggiornamento agli N peer del sistema

  • Round (o ciclo) di gossip: intervallo di tempo in cui ogni

peer ha preso almeno una volta l’iniziativa di scambiare aggiornamenti

Valeria Cardellini – SDCC 2018/19 68

Schema generale di un protocollo di gossiping

  • Due peer P e Q, con P che ha scelto Q per lo scambio

di dati; P è eseguito una volta ad ogni round (ogni Δ unità di tempo)

Active thread (peer P): Passive thread (peer Q): (1) selectPeer(&Q); (1) (2) selectToSend(&bufs); (2) (3) sendTo(Q, bufs);

  • ----> (3) receiveFromAny(&P, &bufr);

(4) (4) selectToSend(&bufs); (5) receiveFrom(Q, &bufr); <----- (5) sendTo(P, bufs); (6) selectToKeep(cache, bufr); (6) selectToKeep(cache, bufr); (7) processData(cache); (7) processData(cache)

  • Quali sono gli aspetti cruciali?

– La selezione dei peer – La selezione dei dati scambiati – Il processamento dei dati ricevuti

Riferimento: A.-M. Kermarrec, M. van Steen, “Gossiping in Distributed Systems”, ACM Operating System Review 41(5), Oct. 2007.

Valeria Cardellini – SDCC 2018/19 69

slide-36
SLIDE 36

Implementare un protocollo di gossiping

Quali problemi specifici occorre affrontare nell’implementare un protocollo di gossiping?

  • Membership: come i peer possono conoscersi tra loro

e quanti conoscenti avere

  • Consapevolezza della rete: come fare in modo che i

collegamenti fra peer riflettano la topologia della rete, in modo da ottenere prestazioni soddisfacenti

  • Gestione dei buffer: quali informazioni scartare

quando la memoria del peer è piena

  • Filtraggio dei messaggi: come considerare l’interesse

per il messaggio da parte dei peer e ridurre la probabilità che ricevano informazioni a cui non sono interessati

Valeria Cardellini – SDCC 2018/19 70

Gossiping e flooding a confronto

  • La diffusione dell’informazione è l’applicazione

classica e più popolare del gossiping nei SD

– Valida alternativa rispetto al flooding

  • Nel caso di flooding

– Ogni peer che riceve il messaggio lo invia a tutti i suoi vicini (possiamo considerarlo una degenerazione del gossiping) – Il messaggio viene scartato quando il suo TTL diviene nullo

Round 1 Round 2 Round 3 Messaggi inviati: 18 Peer raggiunti: 8 su 9

Valeria Cardellini – SDCC 2018/19 71

slide-37
SLIDE 37

Gossiping e flooding a confronto (2)

  • Nel caso di gossiping semplice

– Il messaggio viene inviato con una probabilità di gossiping p for each msg m if random(0,1) < p then send m

p p p p p p p p p p p Round 1 Round 2 Round 3 Messaggi inviati: 11 Peer raggiunti: 7 su 9

Valeria Cardellini – SDCC 2018/19 72

Gossiping vs flooding

  • Gossiping features

– Probabilistic – Takes a localized decision but results in a global state – Lightweight – Fault tolerant

  • Flooding has advantages

– Universal coverage and minimal state information

  • … but it floods the networks with redundant messages
  • Gossiping goals

– Reduce the number of redundant transmissions that occur with flooding while trying to retain its advantages – … but due to its probabilistic nature, gossiping cannot guarantee that all the peers are reached and it requires more time to complete than flooding

Valeria Cardellini – SDCC 2018/19 73

slide-38
SLIDE 38

Altre applicazioni del gossiping nei SD

  • Oltre alla diffusione dell’informazione…
  • Peer sampling

– Per fornire a ciascun peer una lista di peer da contattare

  • Monitoraggio di risorse in sistemi distribuiti a larga

scala

  • Computazioni distribuite per l’aggregazione di dati, in

particolare in reti di sensori

– Computazione di valori aggregati (ad es. somma, media, massimo, quantili) – Ad es. nel caso di calcolo della media

  • Siano x0,i e x0,j i valori al tempo t=0 posseduti dai nodi i e j
  • Dopo il gossiping tra i e j usando strategia push-pull:

x1,i, x1,j ←(x0,i + x0,j)/2

Valeria Cardellini – SDCC 2018/19 74

Two gossiping protocols

  • We now examine two examples of gossiping

protocols

– Blind counter rumor mongering – Bimodal multicast

Valeria Cardellini – SDCC 2018/19 75

slide-39
SLIDE 39

Blind counter rumor mongering

  • Why that name for this gossiping protocol?

– Rumor mongering (def: “the act of spreading rumours”, also known as gossip): a node with “hot rumor” will periodically infect other nodes – Blind: loses interest regardless of the recipient (why) – Counter: loses interest after F contacts (when) A node n initiates a broadcast by sending the message m to B of its neighbors, chosen at random. When node p receives a message m from node q If p has received m no more than F times p sends m to B uniformly randomly chosen neighbors that p knows have not yet seen m. – Note that p knows if its neighbor r has already seen the message m only if p has sent it to r previously, or if p received the message from r

Valeria Cardellini – SDCC 2018/19 76

Analysis of blind counter rumor mongering

  • Difficult to obtain analytical expressions to describe

the behavior of a gossiping protocol, even for relatively simple topologies simulation analysis

  • Assume Barabási network topology:

– 1000 nodes with an average node degree of 6 – Rumor mongering vs flooding scalability (F=2, B=2)

Source: “The cost of application-level broadcast in a fully decentralized peer-to-peer network”

Valeria Cardellini – SDCC 2018/19 77

slide-40
SLIDE 40

Bimodal multicast

  • Also called pbcast (probabilistic broadcast)
  • Composed by two phases:
  • 1. Message distribution phase: a process sends a

multicast with no particular reliability guarantees

  • IP multicast if available, otherwise some application-level

multicast (e.g., Scribe trees)

  • 2. Gossip repair phase: after a process receives a

message, it begins to gossip about the message to a set of peers (called fanout)

  • Gossip occurs at regular intervals and offers the

processes a chance to compare their states and fill any gaps in the message sequence

Source: K.P. Birman, M. Hayden, O. Ozkasap, Z. Xiao, M. Budiu, and Y. Minsky. Bimodal multicast. ACM Trans. Comput. Syst. 17, 2 (May 1999), 41-88.

Valeria Cardellini – SDCC 2018/19 78

Bimodal multicast: message distribution

  • Start by using unreliable multicast to rapidly distribute

the message

  • But some messages may not get through, and some

processes may be faulty

  • So initial state involves partial distribution of multicast(s)

Send messages

: failed messages

p1 p2 p3 p4 p5 p6

time

Valeria Cardellini – SDCC 2018/19 79

slide-41
SLIDE 41

Bimodal multicast: gossip repair

  • Periodically (e.g., every 100 ms) each process

sends a digest describing its state to some randomly selected process

  • The digest identifies messages: it does not

include them Send digests p1 p2 p3 p4 p5 p6

Valeria Cardellini – SDCC 2018/19 80

Bimodal multicast: gossip repair (2)

  • Recipient checks the gossip digest against its
  • wn history and solicits a copy of any missing

message from the process that sent the gossip Solicit message copies p1 p2 p3 p4 p5 p6

Valeria Cardellini – SDCC 2018/19 81

slide-42
SLIDE 42

Bimodal multicast: gossip repair (3)

  • Processes respond to solicitations received

during a round of gossip by retransmitting the requested message

  • Various optimizations (not examined)

Send message copies p1 p2 p3 p4 p5 p6

Valeria Cardellini – SDCC 2018/19 82

Bimodal multicast: why “bimodal”?

  • Are there two phases?
  • Nope; description of dual “modes” of result

Pbcast bimodal delivery distribution

1.E-30 1.E-25 1.E-20 1.E-15 1.E-10 1.E-05 1.E+00 5 10 15 20 25 30 35 40 45 50

number of processes to deliver pbcast p{#processes=k}

1. pbcast is almost always delivered to most or to few processes and almost never to some processes Atomicity = almost all or almost none 2. A second bimodal characteristic is due to delivery latencies, with

  • ne distribution of very

low latencies (messages that arrive without loss in the first phase) and a second distribution with higher latencies (messages that had to be repaired in the second phase)

Either sender fails… … or data gets through with high probability Valeria Cardellini – SDCC 2018/19 83