[PPT] - Transactional Memory Gokarna Sharma and Costas Busch Louisiana PowerPoint Presentation

SLIDE 1

Towards Load Balanced Distributed Transactional Memory

Gokarna Sharma and Costas Busch Louisiana State University

Euro-Par’12, August 31, 2012

SLIDE 2

Distributed Transactional Memory (DTM)

Transactions run on network nodes
They ask for shared objects distributed over the network

for either read or write

The reads and writes on shared objects are supported

through three operations:

 Publish  Lookup  Move

SLIDE 3

Predecessor node

Suppose the object ξ is at node and is a requesting node

ξ Requesting node Data-flow model: transactions are immobile and the objects are mobile

SLIDE 4

Read-only copy Main copy

Lookup operation

ξ ξ

Replicates the object to the requesting node

SLIDE 5

Read-only copy Main copy

Lookup operation

ξ ξ

Replicates the object to the requesting nodes

Read-only copy ξ

SLIDE 6

Main copy Invalidated

Move operation

ξ ξ

Relocates the object explicitly to the requesting node

SLIDE 7

Invalidated

Move operation

ξ

Relocates the object explicitly to the requesting node
Invalidates also the read-only copies (if available)

Main copy ξ Invalidated ξ

SLIDE 8

1

u

1

v

2

u

2

v

3

u

3

v

General routing: choose paths from sources to destinations

Routing in DTM: source node of the predecessor request in the total order is the destination of a successor request

SLIDE 9

Edge congestion

edge

C

maximum number of paths that use any edge

Node congestion

node

C

maximum number of paths that use any node

SLIDE 10

Length of chosen path Length of shortest path

u v

Stretch =

5 . 1 8 12   stretch

shortest path chosen path

SLIDE 11

Oblivious Routing

Each request path choice is independent

f other request path choices

SLIDE 12

Problem Statement

Given a d-dimensional mesh and a finite set of
perations R ={r0,r1,…,rl} on an object ξ
Design a DTM algorithm that:

– Minimizes congestion C = maxe |{i : 𝑞𝑗 ϶ e}| on any edge e – Minimizes total communication cost A(R) = σ𝑗=1

𝑚

|𝑞𝑗|

for all the operations

SLIDE 13

Related Work

Protocol Stretch Network Kind Runs on

Arrow [DISC’98] O(SST)=O(D) General Spanning tree Relay [OPODIS’09] O(SST)=O(D) General Spanning tree Combine [SSS’10] O(SOT)=O(D) General Overlay tree Ballistic [DISC’05] O(log D) Constant-doubling dimension Hierarchical directory with independent sets Spiral [IPDPS’12] O(log2 n log D) General Hierarchical directory with sparse covers

➢ D is the diameter of the network kind ➢ S* is the stretch of the tree used

SLIDE 14

Limitations and Motivations

These protocols only minimize stretch and they cannot

control congestion

Congestion can also be a major bottleneck

– may affect the overall performance of the algorithm

A natural question is whether stretch and congestion can

be controlled simultaneously

Congestion and stretch can not be minimized

simultaneously in arbitrary networks

SLIDE 15

Our Contributions

MultiBend DTM algorithm for mesh networks
For 2-dimensional mesh, MultiBend has both stretch

and (edge) congestion O(log n)

For d-dimensional mesh, MultiBend has stretch O(d

log n) and (edge) congestion O(d2 log n)

For fixed d,

– stretch is within O(log log n) factor and – congestion is within O(1) factor far from optimal

SLIDE 16

In the Remaining…

Model
General Approach
Analogy to a Distributed Queue
Hierarchical decomposition for MultiBend
MultiBend Analysis
Stretch
Congestion
Discussion

SLIDE 17

Model

Mesh network G = (V,E) of n reliable nodes
One shared object
Nodes receive-compute-send atomically
Nodes are uniquely identified
Node u can send to node v if it knows v
One node executes one request at a time

SLIDE 18

General Approach

SLIDE 19

Hierarchical clustering Network graph

SLIDE 20

Hierarchical clustering

Alternative representation as a hierarchy tree with leader nodes

SLIDE 21

At the lowest level (level 0) every node is a cluster

Directories at each level cluster, downward pointer if object locality known

SLIDE 22

Owner node

root

A Publish operation

➢ Assume that is the creator of which invokes the Publish operation ➢ Nodes know their parent in the hierarchy

ξ ξ

SLIDE 23

root

Send request to the leader

SLIDE 24

root

Continue up phase Sets downward pointer while going up

SLIDE 25

root

Continue up phase Sets downward pointer while going up

SLIDE 26

root

Root node found, stop up phase

SLIDE 27

root

A successful Publish operation Predecessor node ξ

SLIDE 28

Requesting node Predecessor node

root

Supporting a Move operation

➢ Initially, nodes point downward to object owner (predecessor node) due to Publish

peration

➢ Nodes know their parent in the hierarchy

ξ

SLIDE 29

Send request to leader node of the cluster upward in hierarchy

root

SLIDE 30

Continue up phase until downward pointer found

root

Sets downward path while going up

SLIDE 31

Continue up phase

root

Sets downward path while going up

SLIDE 32

Continue up phase

root

Sets downward path while going up

SLIDE 33

Downward pointer found, start down phase

root

Discards path while going down

SLIDE 34

Continue down phase

root

Discards path while going down

SLIDE 35

Continue down phase

root

Discards path while going down

SLIDE 36

Predecessor reached, object is moved from node to node

root

Lookup is similar without change in the directory structure and only a read-only copy of the object is sent

SLIDE 37

Distributed Queue Analogy

SLIDE 38

Distributed Queue root u u

tail head

SLIDE 39

Distributed Queue root u u

tail head

v v

SLIDE 40

root u v w Distributed Queue u

tail head

v w

SLIDE 41

root u v w Distributed Queue

tail head

v w

SLIDE 42

root u v w Distributed Queue

tail head

w

SLIDE 43

Results on Mesh Networks

SLIDE 44

Type-1 Mesh Decomposition

2-dimensional mesh

SLIDE 45

Type-1 Mesh Decomposition

SLIDE 46

Type-1 Mesh Decomposition

SLIDE 47

Type-2 Mesh Decomposition

SLIDE 48

Type-2 Mesh Decomposition

SLIDE 49

Decomposition for 23x23 2-dimensional mesh

(i+1,2) (i+1,1) (i,2) (i,1)

Hierarchy levels

SLIDE 50

MultiBend Hierarchy

Find a predecessor node via multi-bend paths for each

leaf node u

– by visiting leaders of all the clusters that contain u from level 0

to the root level root

u

p(u) p(v)

v

SLIDE 51

MultiBend Hierarchy (2)

The hierarchy guarantees:

(1) For any two nodes u,v, their multi-bend paths p(u) and p(v) meet at level min{h, log(dist(u,v))+2} (2) length(pi(u)) is at most 2i+3 root

u

p(u) p(v)

v

SLIDE 52

(Canonical) downward Paths

root u

p(u)

root u

p(u)

v

p(v) p(v) is a (canonical) downward path

SLIDE 53

Load Balancing

Through a leader election procedure

– Every time we access the leader of a sub-mesh, we replace it with another leader chosen uniformly at random among its nodes

The directory is updated appropriately by updating parent and child

leaders

– Locking may needed in concurrent executions

The update cost is low in comparison to the cost of serving requests
This step is necessary to control congestion

– With fixed leader, edge congestion can be O(l), the number of requests

If congestion requirement can be relaxed by a factor of ρ, the leader

change is needed after every ρ requests

SLIDE 54

Analysis of MultiBend

SLIDE 55

Analysis on (move) Stretch

Level Assume a sequential execution R of l+1 Move requests, where r0 is an initial Publish request.

A*(R) ≥ max1≤k≤h (Sk-1) 2k-1 A(R) ≥ σk=1

ℎ (Sk−1) 2k+3

C(R)/C*(R) = σk=1

ℎ (Sk−1) 2k+3 / max1≤k≤h (Sk-1) 2k-1

= 16 h max1≤k≤h (Sk-1) 2k-1 / max1≤k≤h (Sk-1) 2k-1 = O(log n)

h . . . k . . . 2 1

request x

r0 . . . r0 . . . r0 r0 r0 r1 . . r1 r1 r1

u v y w

r2 r2 r2 . . r2 r2 r2 rl-1 rl-1 rl-1 r2 . . rl . . . rl rl rl

. . . Thus,

SLIDE 56

Analysis on (Edge) Congestion

A sub-path uses edge e with probability 2/ml
P’: set of paths from M1 to M2 or vice-versa
C’(e): Congestion caused by P’ on e
E[C’(e)] ≤ 2|P’|/ml
B ≥ |P’|/out(M1)
ut(M1) ≤ 4ml
C* ≥ B

==> E[C’(e)] ≤ 8C*

M2 M1 e ml

Assume M1 is a type-1 submesh

SLIDE 57

Analysis on (Edge) Congestion (2)

As M1 at level (i,2) is always completely contained in M2 at level

(i+1,2)

log n +2 levels
E[C(e)] ≤ 8C*(log n + 2)
Considering type-2 submeshes

– exactly one type-2 submesh between every two type-1 submeshes – the type-2 submeshes may not be proper subset of type-1 submeshes – 4 possible type-2 submesh choices (path may bend at most 2 times) – Increases the load by a factor of 4 Thus, using standard Chernoff bound, C = O(C* log n), w.h.p.

SLIDE 58

d-Dimensional Mesh

SLIDE 59

Extensions to d-dimensional mesh

2-dimesional decomposition can be directly

generalized to a d-dimensional mesh

– Problem: the congestion become O(2d log n)

Another decomposition is used to control congestion

in O(C* d2 log n) and stretch O(d log n)

We set appropriate λ and shift the type-1 submeshes

by (j-1)λ nodes in each dimension to get type-j submeshes

SLIDE 60

Extensions to d-dimensional mesh (2)

3-dimensional mesh decomposition. Only 2 of the 3 dimensions are shown

O(d)-types of submeshes on each level

O(d) sub-levels in every level with type-1 submesh first and type-O(d) submesh last in the order

SLIDE 61

Summary

First load balanced distributed directory protocol with both

stretch and congestion O(log n) in 2-dimensional mesh

In d-dimensional mesh, stretch O(d log n) and (edge)

congestion O(d2 log n)

MultiBend is starvation-free and provides linearizability.
Future work: extend the results to

– dynamic networks – make it fault-tolerant

SLIDE 62