Transactional Memory Gokarna Sharma and Costas Busch Louisiana - - PowerPoint PPT Presentation

transactional memory
SMART_READER_LITE
LIVE PREVIEW

Transactional Memory Gokarna Sharma and Costas Busch Louisiana - - PowerPoint PPT Presentation

Towards Load Balanced Distributed Transactional Memory Gokarna Sharma and Costas Busch Louisiana State University Euro- Par12, August 31, 2012 Distributed Transactional Memory (DTM) Transactions run on network nodes They ask for


slide-1
SLIDE 1

Towards Load Balanced Distributed Transactional Memory

Gokarna Sharma and Costas Busch Louisiana State University

Euro-Par’12, August 31, 2012

slide-2
SLIDE 2

Distributed Transactional Memory (DTM)

  • Transactions run on network nodes
  • They ask for shared objects distributed over the network

for either read or write

  • The reads and writes on shared objects are supported

through three operations:

 Publish  Lookup  Move

slide-3
SLIDE 3

Predecessor node

Suppose the object ξ is at node and is a requesting node

ξ Requesting node Data-flow model: transactions are immobile and the objects are mobile

slide-4
SLIDE 4

Read-only copy Main copy

Lookup operation

ξ ξ

Replicates the object to the requesting node

slide-5
SLIDE 5

Read-only copy Main copy

Lookup operation

ξ ξ

Replicates the object to the requesting nodes

Read-only copy ξ

slide-6
SLIDE 6

Main copy Invalidated

Move operation

ξ ξ

Relocates the object explicitly to the requesting node

slide-7
SLIDE 7

Invalidated

Move operation

ξ

  • Relocates the object explicitly to the requesting node
  • Invalidates also the read-only copies (if available)

Main copy ξ Invalidated ξ

slide-8
SLIDE 8

1

u

1

v

2

u

2

v

3

u

3

v

General routing: choose paths from sources to destinations

Routing in DTM: source node of the predecessor request in the total order is the destination of a successor request

slide-9
SLIDE 9

Edge congestion

edge

C

maximum number of paths that use any edge

Node congestion

node

C

maximum number of paths that use any node

slide-10
SLIDE 10

Length of chosen path Length of shortest path

u v

Stretch =

5 . 1 8 12   stretch

shortest path chosen path

slide-11
SLIDE 11

Oblivious Routing

Each request path choice is independent

  • f other request path choices
slide-12
SLIDE 12

Problem Statement

  • Given a d-dimensional mesh and a finite set of
  • perations R ={r0,r1,…,rl} on an object ξ
  • Design a DTM algorithm that:

– Minimizes congestion C = maxe |{i : 𝑞𝑗 ϶ e}| on any edge e – Minimizes total communication cost A(R) = σ𝑗=1

𝑚

|𝑞𝑗|

for all the operations

slide-13
SLIDE 13

Related Work

Protocol Stretch Network Kind Runs on

Arrow [DISC’98] O(SST)=O(D) General Spanning tree Relay [OPODIS’09] O(SST)=O(D) General Spanning tree Combine [SSS’10] O(SOT)=O(D) General Overlay tree Ballistic [DISC’05] O(log D) Constant-doubling dimension Hierarchical directory with independent sets Spiral [IPDPS’12] O(log2 n log D) General Hierarchical directory with sparse covers

➢ D is the diameter of the network kind ➢ S* is the stretch of the tree used

slide-14
SLIDE 14

Limitations and Motivations

  • These protocols only minimize stretch and they cannot

control congestion

  • Congestion can also be a major bottleneck

– may affect the overall performance of the algorithm

  • A natural question is whether stretch and congestion can

be controlled simultaneously

  • Congestion and stretch can not be minimized

simultaneously in arbitrary networks

slide-15
SLIDE 15

Our Contributions

  • MultiBend DTM algorithm for mesh networks
  • For 2-dimensional mesh, MultiBend has both stretch

and (edge) congestion O(log n)

  • For d-dimensional mesh, MultiBend has stretch O(d

log n) and (edge) congestion O(d2 log n)

  • For fixed d,

– stretch is within O(log log n) factor and – congestion is within O(1) factor far from optimal

slide-16
SLIDE 16

In the Remaining…

  • Model
  • General Approach
  • Analogy to a Distributed Queue
  • Hierarchical decomposition for MultiBend
  • MultiBend Analysis
  • Stretch
  • Congestion
  • Discussion
slide-17
SLIDE 17

Model

  • Mesh network G = (V,E) of n reliable nodes
  • One shared object
  • Nodes receive-compute-send atomically
  • Nodes are uniquely identified
  • Node u can send to node v if it knows v
  • One node executes one request at a time
slide-18
SLIDE 18

General Approach

slide-19
SLIDE 19

Hierarchical clustering Network graph

slide-20
SLIDE 20

Hierarchical clustering

Alternative representation as a hierarchy tree with leader nodes

slide-21
SLIDE 21

At the lowest level (level 0) every node is a cluster

Directories at each level cluster, downward pointer if object locality known

slide-22
SLIDE 22

Owner node

root

A Publish operation

➢ Assume that is the creator of which invokes the Publish operation ➢ Nodes know their parent in the hierarchy

ξ ξ

slide-23
SLIDE 23

root

Send request to the leader

slide-24
SLIDE 24

root

Continue up phase Sets downward pointer while going up

slide-25
SLIDE 25

root

Continue up phase Sets downward pointer while going up

slide-26
SLIDE 26

root

Root node found, stop up phase

slide-27
SLIDE 27

root

A successful Publish operation Predecessor node ξ

slide-28
SLIDE 28

Requesting node Predecessor node

root

Supporting a Move operation

➢ Initially, nodes point downward to object owner (predecessor node) due to Publish

  • peration

➢ Nodes know their parent in the hierarchy

ξ

slide-29
SLIDE 29

Send request to leader node of the cluster upward in hierarchy

root

slide-30
SLIDE 30

Continue up phase until downward pointer found

root

Sets downward path while going up

slide-31
SLIDE 31

Continue up phase

root

Sets downward path while going up

slide-32
SLIDE 32

Continue up phase

root

Sets downward path while going up

slide-33
SLIDE 33

Downward pointer found, start down phase

root

Discards path while going down

slide-34
SLIDE 34

Continue down phase

root

Discards path while going down

slide-35
SLIDE 35

Continue down phase

root

Discards path while going down

slide-36
SLIDE 36

Predecessor reached, object is moved from node to node

root

Lookup is similar without change in the directory structure and only a read-only copy of the object is sent

slide-37
SLIDE 37

Distributed Queue Analogy

slide-38
SLIDE 38

Distributed Queue root u u

tail head

slide-39
SLIDE 39

Distributed Queue root u u

tail head

v v

slide-40
SLIDE 40

root u v w Distributed Queue u

tail head

v w

slide-41
SLIDE 41

root u v w Distributed Queue

tail head

v w

slide-42
SLIDE 42

root u v w Distributed Queue

tail head

w

slide-43
SLIDE 43

Results on Mesh Networks

slide-44
SLIDE 44

Type-1 Mesh Decomposition

2-dimensional mesh

slide-45
SLIDE 45

Type-1 Mesh Decomposition

slide-46
SLIDE 46

Type-1 Mesh Decomposition

slide-47
SLIDE 47

Type-2 Mesh Decomposition

slide-48
SLIDE 48

Type-2 Mesh Decomposition

slide-49
SLIDE 49

Decomposition for 23x23 2-dimensional mesh

(i+1,2) (i+1,1) (i,2) (i,1)

Hierarchy levels

slide-50
SLIDE 50

MultiBend Hierarchy

  • Find a predecessor node via multi-bend paths for each

leaf node u

– by visiting leaders of all the clusters that contain u from level 0

to the root level root

u

p(u) p(v)

v

slide-51
SLIDE 51

MultiBend Hierarchy (2)

  • The hierarchy guarantees:

(1) For any two nodes u,v, their multi-bend paths p(u) and p(v) meet at level min{h, log(dist(u,v))+2} (2) length(pi(u)) is at most 2i+3 root

u

p(u) p(v)

v

slide-52
SLIDE 52

(Canonical) downward Paths

root u

p(u)

root u

p(u)

v

p(v) p(v) is a (canonical) downward path

slide-53
SLIDE 53

Load Balancing

  • Through a leader election procedure

– Every time we access the leader of a sub-mesh, we replace it with another leader chosen uniformly at random among its nodes

  • The directory is updated appropriately by updating parent and child

leaders

– Locking may needed in concurrent executions

  • The update cost is low in comparison to the cost of serving requests
  • This step is necessary to control congestion

– With fixed leader, edge congestion can be O(l), the number of requests

  • If congestion requirement can be relaxed by a factor of ρ, the leader

change is needed after every ρ requests

slide-54
SLIDE 54

Analysis of MultiBend

slide-55
SLIDE 55

Analysis on (move) Stretch

Level Assume a sequential execution R of l+1 Move requests, where r0 is an initial Publish request.

A*(R) ≥ max1≤k≤h (Sk-1) 2k-1 A(R) ≥ σk=1

ℎ (Sk−1) 2k+3

C(R)/C*(R) = σk=1

ℎ (Sk−1) 2k+3 / max1≤k≤h (Sk-1) 2k-1

= 16 h max1≤k≤h (Sk-1) 2k-1 / max1≤k≤h (Sk-1) 2k-1 = O(log n)

h . . . k . . . 2 1

request x

r0 . . . r0 . . . r0 r0 r0 r1 . . r1 r1 r1

u v y w

r2 r2 r2 . . r2 r2 r2 rl-1 rl-1 rl-1 r2 . . rl . . . rl rl rl

. . . Thus,

slide-56
SLIDE 56

Analysis on (Edge) Congestion

  • A sub-path uses edge e with probability 2/ml
  • P’: set of paths from M1 to M2 or vice-versa
  • C’(e): Congestion caused by P’ on e
  • E[C’(e)] ≤ 2|P’|/ml
  • B ≥ |P’|/out(M1)
  • ut(M1) ≤ 4ml
  • C* ≥ B

==> E[C’(e)] ≤ 8C*

M2 M1 e ml

Assume M1 is a type-1 submesh

slide-57
SLIDE 57

Analysis on (Edge) Congestion (2)

  • As M1 at level (i,2) is always completely contained in M2 at level

(i+1,2)

  • log n +2 levels
  • E[C(e)] ≤ 8C*(log n + 2)
  • Considering type-2 submeshes

– exactly one type-2 submesh between every two type-1 submeshes – the type-2 submeshes may not be proper subset of type-1 submeshes – 4 possible type-2 submesh choices (path may bend at most 2 times) – Increases the load by a factor of 4 Thus, using standard Chernoff bound, C = O(C* log n), w.h.p.

slide-58
SLIDE 58

d-Dimensional Mesh

slide-59
SLIDE 59

Extensions to d-dimensional mesh

  • 2-dimesional decomposition can be directly

generalized to a d-dimensional mesh

– Problem: the congestion become O(2d log n)

  • Another decomposition is used to control congestion

in O(C* d2 log n) and stretch O(d log n)

  • We set appropriate λ and shift the type-1 submeshes

by (j-1)λ nodes in each dimension to get type-j submeshes

slide-60
SLIDE 60

Extensions to d-dimensional mesh (2)

3-dimensional mesh decomposition. Only 2 of the 3 dimensions are shown

O(d)-types of submeshes on each level

O(d) sub-levels in every level with type-1 submesh first and type-O(d) submesh last in the order

slide-61
SLIDE 61

Summary

  • First load balanced distributed directory protocol with both

stretch and congestion O(log n) in 2-dimensional mesh

  • In d-dimensional mesh, stretch O(d log n) and (edge)

congestion O(d2 log n)

  • MultiBend is starvation-free and provides linearizability.
  • Future work: extend the results to

– dynamic networks – make it fault-tolerant

slide-62
SLIDE 62

Thank you for your attention!