Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades - - PowerPoint PPT Presentation

leveraging heterogeneity to reduce the cost of data
SMART_READER_LITE
LIVE PREVIEW

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades - - PowerPoint PPT Presentation

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades Andy Curtis joint work with: S. Keshav Alejandro Lpez-Ortiz Tommy Carpenter Mustafa Elsheikh University of Waterloo Motivation Data centers critical part of IT


slide-1
SLIDE 1

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades

Andy Curtis joint work with:

  • S. Keshav

Alejandro López-Ortiz Tommy Carpenter Mustafa Elsheikh University of Waterloo

slide-2
SLIDE 2

Motivation

  • Data centers critical part of IT

infrastructure

  • Expensive
  • $1000/year/server
  • Data centers change over time
slide-3
SLIDE 3

Data centers constantly evolve

  • 63% of Data Center Knowledge readers are either in the

midst of data center expansion projects or have just completed a new facility

  • 59% continue to build and manage their data centers in-

house

http://www.datacenterknowledge.com/archives/2010/08/16/data-center-industry-expansion-in-full-swing/

slide-4
SLIDE 4

Network upgrade motivation

slide-5
SLIDE 5

Network upgrade motivation

  • Several prior solutions for greenfield data centers
  • VL2, flattened butterfly, HyperX, BCube, DCell,

Al-Fares et al., MDCube

slide-6
SLIDE 6

Network upgrade motivation

  • Several prior solutions for greenfield data centers
  • VL2, flattened butterfly, HyperX, BCube, DCell,

Al-Fares et al., MDCube

  • What about legacy data centers?
slide-7
SLIDE 7

Existing topologies are not flexible enough

slide-8
SLIDE 8

Existing topologies are not flexible enough

slide-9
SLIDE 9

Existing topologies are not flexible enough

?

slide-10
SLIDE 10

Goal

It should be easy and cost-effective to add capacity to a data center network

slide-11
SLIDE 11

Challenging problem

  • Designing a data center expansion or upgrade isn’t easy
  • Huge design space
  • Many constraints
slide-12
SLIDE 12

Problem 1

  • It’s hard to analyze and understand

heterogeneous topologies

Problem 2

  • How to design an upgraded topology?
slide-13
SLIDE 13

Problem 1

  • High performance network topologies are

based on rigid constructions

  • Homogeneous switches
  • Prescribed switch radix
  • Single link rate
slide-14
SLIDE 14

Problem 1

  • High performance network topologies are

based on rigid constructions

  • Homogeneous switches
  • Prescribed switch radix
  • Single link rate

Solutions:

  • 1. develop theory of heterogeneous Clos networks
  • 2. explore unstructured data center network topologies
slide-15
SLIDE 15

Two solutions:

LEGUP: output is a heterogeneous Clos network

[Curtis, Keshav, López-Ortiz; CoNEXT 2010]

REWIRE: designs unstructured DCN topologies

[Curtis et al.; INFOCOM 2012]

slide-16
SLIDE 16

Two solutions:

LEGUP: output is a heterogeneous Clos network

[Curtis, Keshav, López-Ortiz; CoNEXT 2010]

REWIRE: designs unstructured DCN topologies

[Curtis et al.; INFOCOM 2012]

slide-17
SLIDE 17

LEGUP in brief:

LEGUP designs upgraded/expanded networks for legacy data center networks

slide-18
SLIDE 18

LEGUP in brief:

LEGUP designs upgraded/expanded networks for legacy data center networks

Input

  • Budget
  • Existing network topology
  • List of switches & line cards
  • Optional: data center model

. . . . . .

slide-19
SLIDE 19

LEGUP in brief:

LEGUP designs upgraded/expanded networks for legacy data center networks

Input Output

. . . . . .

. . . . . .

slide-20
SLIDE 20

LEGUP in brief:

LEGUP designs upgraded/expanded networks for legacy data center networks

Input Output

. . . . . .

. . . . . .

slide-21
SLIDE 21

LEGUP in brief:

LEGUP designs upgraded/expanded networks for legacy data center networks

Input Output

. . . . . . . . . . . .

Difficult optimization problem

slide-22
SLIDE 22

Difficult optimization problem

First pass: limit solution space by finding

  • nly heterogeneous Clos networks
slide-23
SLIDE 23

Clos networks

This is a physical realization of a Clos network

. . . . . . Aggregation Core ToR Internet

slide-24
SLIDE 24

Clos networks

We can find a logical topology for this network 4 4 4 4 4 4 4 4 16 16

slide-25
SLIDE 25

Heterogeneous Clos networks

Logical topology is a forest 2 8 8 8 8 2

slide-26
SLIDE 26

Theoretical contributions

*optimal = uses same link capacity an equivalent stage Clos network

slide-27
SLIDE 27

Theoretical contributions

Lemma 1: How to construct all optimal logical

forests for a set of switches

*optimal = uses same link capacity an equivalent stage Clos network

slide-28
SLIDE 28

Theoretical contributions

Lemma 1: How to construct all optimal logical

forests for a set of switches

Lemma 2: How to build a physical realization

from a logical forest

*optimal = uses same link capacity an equivalent stage Clos network

slide-29
SLIDE 29

Theoretical contributions

Lemma 1: How to construct all optimal logical

forests for a set of switches

Lemma 2: How to build a physical realization

from a logical forest

Theorem: A characterization of heterogeneous

Clos networks

*optimal = uses same link capacity an equivalent stage Clos network

slide-30
SLIDE 30

Theoretical contributions

Lemma 1: How to construct all optimal logical

forests for a set of switches

Lemma 2: How to build a physical realization

from a logical forest

Theorem: A characterization of heterogeneous

Clos networks

This is the first optimal heterogeneous topology

*optimal = uses same link capacity an equivalent stage Clos network

slide-31
SLIDE 31

Problem 1

  • It’s hard to analyze and understand

heterogeneous topologies

Problem 2

  • How to design an upgraded topology?

more later...

slide-32
SLIDE 32

Problem 1

  • It’s hard to analyze and understand

heterogeneous topologies

Problem 2

  • How to design an upgraded topology?

heterogeneous Clos

slide-33
SLIDE 33

Problem 2

Upgraded network should:

  • Maximize performance, minimize cost
  • Be realized in the target data center
  • Incorporate existing network equipment if it

makes sense

Approach: use optimization

slide-34
SLIDE 34

LEGUP algorithm

  • Branch and bound search of solution space
  • Heuristics to map switches to a rack
  • See paper for details
  • Time is bottleneck in algorithm
  • Exponential in number of switch types and (worst-case) in

number ToRs

  • 760 server data center: 5–10 minutes to run algorithm
  • 7600 server data center: 1–2 days
  • But can be parallelized
slide-35
SLIDE 35

LEGUP summary

  • Developed theory of heterogeneous Clos networks
  • Implemented LEGUP design algorithm
  • On our data center, we see substantial cost savings:

spend less than half as much money as a fat-tree for same performance

slide-36
SLIDE 36

Two solutions:

LEGUP: output is a heterogeneous Clos network

[Curtis, Keshav, López-Ortiz; CoNEXT 2010]

REWIRE: designs unstructured DCN topologies

[Curtis et al.; INFOCOM 2012]

slide-37
SLIDE 37 36 77 43 31 28 20 32 27 21 41 72 68 39 70 69 47 22 48 29 23 6 30 24 64 46 10 11 13 53 52 51 5 8

Can we do better with unstructured networks?

slide-38
SLIDE 38

Problem

  • Now we have an even harder network design problem
slide-39
SLIDE 39

Problem

  • Now we have an even harder network design problem

Approach

  • Use local search heuristics to find a “good enough”

solution

slide-40
SLIDE 40

REWIRE

Uses simulated annealing to find a network that:

  • Maximizes performance

Subject to:

  • The budget
  • Physical constraints of the data center model

(thermal, power, space)

  • No topology restrictions
slide-41
SLIDE 41

REWIRE

Uses simulated annealing to find a network that:

  • Maximizes performance

Subject to:

  • The budget
  • Physical constraints of the data center model

(thermal, power, space)

  • No topology restrictions

Bisection bandwidth - Diameter

slide-42
SLIDE 42

REWIRE

Uses simulated annealing to find a network that:

  • Maximizes performance

Subject to:

  • The budget
  • Physical constraints of the data center model

(thermal, power, space)

  • No topology restrictions

Costs = new cables + moved cables + new switches

slide-43
SLIDE 43

Simulated annealing algorithm

  • At each iteration, computes
  • Performance of candidate solution
  • If accept this solution, then
  • Compute next neighbor to consider
slide-44
SLIDE 44

Simulated annealing algorithm

  • At each iteration, computes
  • Performance of candidate solution
  • If accept this solution, then
  • Compute next neighbor to consider

No known algorithm to find the bisection bandwidth of an arbitrary network!

slide-45
SLIDE 45

Bisection bandwidth computation

Easy for a single cut

slide-46
SLIDE 46

Bisection bandwidth computation

S S’

slide-47
SLIDE 47

Bisection bandwidth computation

S S’

bw(S,S’) = link cap(S,S’) min { server rates(S), server rates(S’) }

slide-48
SLIDE 48

Bisection bandwidth computation

S S’

bw(S,S’) = 4 min { 2, 6 }

slide-49
SLIDE 49

Bisection bandwidth computation

S S’

Then bisection bandwidth is the min over all cuts

slide-50
SLIDE 50

Bisection bandwidth computation

  • Easy on tree-like topologies because there

are O(n) cuts

slide-51
SLIDE 51

Bisection bandwidth computation

  • Easy on tree-like topologies because there

are O(n) cuts

slide-52
SLIDE 52

Bisection bandwidth computation

  • Easy on tree-like topologies because there

are O(n) cuts

slide-53
SLIDE 53

Bisection bandwidth computation

  • Easy on tree-like topologies because there

are O(n) cuts

slide-54
SLIDE 54

Bisection bandwidth computation

slide-55
SLIDE 55

Bisection bandwidth computation

Exponentially many cuts on arbitrary topologies

slide-56
SLIDE 56

Bisection bandwidth computation

Exponentially many cuts on arbitrary topologies Need: A min-cut, max-flow type theorem for multi- commodity flow

s t

slide-57
SLIDE 57

Bisection bandwidth computation

Need: A min-cut, max-flow type theorem for multi- commodity flow

s1 t1 s2 t2 s3

slide-58
SLIDE 58

Bisection bandwidth computation

slide-59
SLIDE 59

Bisection bandwidth computation

Theorem [Curtis and López-Ortiz, INFOCOM 2009]:

A network can feasibly route all traffic matrices feasible under the server NIC rates using multipath routing iff all its cuts have bandwidth ≥ a sum dependent

  • n αi for all nodes i
slide-60
SLIDE 60

Bisection bandwidth computation

Theorem [Curtis and López-Ortiz, INFOCOM 2009]:

A network can feasibly route all traffic matrices feasible under the server NIC rates using multipath routing iff all its cuts have bandwidth ≥ a sum dependent

  • n αi for all nodes i

We can compute the αi values using linear programming

[Kodialam et al. INFOCOM 2006]

slide-61
SLIDE 61

Bisection bandwidth computation

Theorem [Curtis and López-Ortiz, INFOCOM 2009]:

A network can feasibly route all traffic matrices feasible under the server NIC rates using multipath routing iff all its cuts have bandwidth ≥ a sum dependent

  • n αi for all nodes i

We can compute the αi values using linear programming

[Kodialam et al. INFOCOM 2006]

These two theoretical results give us a polynomial-time algorithm to find the bisection bandwidth of an arbitrary network

slide-62
SLIDE 62

Evaluation

How much performance do we gain with heterogeneous network equipment?

slide-63
SLIDE 63

Evaluation

  • U of Waterloo School of Computer Science

data center as input

  • Three scenarios:
  • Upgrading the network (see paper)
  • Expansion by adding servers
  • Greenfield data center
slide-64
SLIDE 64

Evaluation: input

  • SCS data center topology
  • 19 edge switches, 760 servers
  • Heterogeneous edge switches
  • All aggregation switches are HP 5406 models

. . . . . .

slide-65
SLIDE 65

Evaluation: input

  • The data center handles air poorly.

So, we add thermal constraints modeling this

Chiller Hot aisle Cold aisle Cold/hot aisle airflow

slide-66
SLIDE 66

Evaluation: cost model

Rate Short ($) Medium ($) Long ($) 1 Gb 5 10 20 10 Gb 50 100 200 Install cost 10 20 50

1 Gb ports 10 Gb ports Watts Cost ($) 24 100 250 48 150 1,500 48 4 235 5,000 24 300 6,000 48 600 10,000 144 5000 75,000

slide-67
SLIDE 67

Evaluation: comparison methods

  • Generalized fat-tree
  • Bounded best-case performance
  • Greedy algorithm
  • Finds link addition that improves performance the

most, adds it, and repeats

  • Random graph
  • Proposed by Singla et al., HotCloud 2011 as data

center network topology

slide-68
SLIDE 68

Expanding the Waterloo SCS data center

0.01 0.02 0.03 0.04 0.05 Original Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE

Diameter: 4 4 3 4 3 4 3 4 3 4 2 4 3 4 2 4 2 0 160 320 480 640 Cumulative number of servers added Bisection bandwidth

Starting servers = 760

Oversubscription ratio

slide-69
SLIDE 69

Expanding the Waterloo SCS data center

0.01 0.02 0.03 0.04 0.05 Original Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE

Diameter: 4 4 3 4 3 4 3 4 3 4 2 4 3 4 2 4 2 0 160 320 480 640 Cumulative number of servers added Bisection bandwidth Oversubscription ratio

slide-70
SLIDE 70

Expanding the Waterloo SCS data center

0.01 0.02 0.03 0.04 0.05 Original Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE Fat-tree 1Gb GREEDY LEGUP REWIRE

Diameter: 4 4 3 4 3 4 3 4 3 4 2 4 3 4 2 4 2 0 160 320 480 640 Cumulative number of servers added Bisection bandwidth Oversubscription ratio

slide-71
SLIDE 71

Greenfield network design

  • 1920 servers
  • Edges switches have 48 gigabit ports
  • Assume 24 servers per rack
slide-72
SLIDE 72

Greenfield network design

0.1 0.2 0.3 0.4 Fat-tree Random LEGUP REWIRE Fat-tree Random LEGUP REWIRE Fat-tree Random LEGUP REWIRE Fat-tree Random LEGUP REWIRE

Budget = $125/rack $250/rack $500/rack $1000/rack

Diameter: 4 4 0 4 4 3 4 3 4 3 4 3 4 2

Oversubscription ratio

slide-73
SLIDE 73

Greenfield network design

0.1 0.2 0.3 0.4 Fat-tree Random LEGUP REWIRE Fat-tree Random LEGUP REWIRE Fat-tree Random LEGUP REWIRE Fat-tree Random LEGUP REWIRE

Budget = $125/rack $250/rack $500/rack $1000/rack

Diameter: 4 4 0 4 4 3 4 3 4 3 4 3 4 2

Oversubscription ratio

slide-74
SLIDE 74 17 78 36 4 77 56 49 43 31 28 76 60 44 34 20 75 57 32 27 21 74 16 7 73 63 41 37 12 72 68 71 42 39 70 69 47 22 48 29 23 6 30 24 67 45 66 3 65 61 19 64 59 40 62 46 25 2 9 55 50 10 58 26 33 11 54 38 1 13 53 35 52 51 5 18 8 15
slide-75
SLIDE 75 79 48 44 36 25 15 12 8 4 3 78 64 49 47 46 42 39 29 17 77 54 38 37 34 31 24 14 76 72 69 63 40 6 75 43 30 16 74 41 27 22 19 11 7 73 62 45 5 70 32 71 26 35 33 68 58 52 21 67 66 65 56 2 59 51 61 50 1 60 10 28 57 9 20 55 53 18 23 13
slide-76
SLIDE 76

Greenfield network design

  • Expanding a greenfield network
  • 1600 servers initially
  • Grow by increments of 400 servers (10 racks)
  • $6000/rack budget
slide-77
SLIDE 77

Expanding a greenfield network

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE

1600 2000 2400 2800 3200 Total servers in data center Diameter: 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 2 Bisection bandwidth

Oversubscription ratio

slide-78
SLIDE 78

Expanding a greenfield network

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE

1600 2000 2400 2800 3200 Total servers in data center Diameter: 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 2 Bisection bandwidth

Oversubscription ratio

slide-79
SLIDE 79

Expanding a greenfield network

0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40

Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE Fat-tree 1Gb Fat-tree 10Gb LEGUP REWIRE

1600 2000 2400 2800 3200 Total servers in data center Diameter: 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 3 4 4 4 2 Bisection bandwidth

Oversubscription ratio

slide-80
SLIDE 80

Are unstructured topologies worth it?

  • Higher performance
  • Up to 10x more bisection bandwidth than heterogeneous

Clos for same cost

  • Lower latency

(can get 2 hops between racks instead of 4)

  • But difficult to manage
  • Cost to build/manage is unclear
  • Need to use Multipath TCP [Raiciu et al. SIGCOMM 2011] or SPAIN

[Mudigonda et al., NSDI 2010] to effectively use available

bandwidth

slide-81
SLIDE 81

REWIRE future work

  • Structural constraints on topology
  • Generalize greenfield topology design framework of

Mudigonda et al.,USENIX ATC 2011

  • Bisection bandwidth computation

algorithm numerically unstable

  • Scale local search approach to larger

networks

  • Relationship between spectral gap and

bisection bandwidth?

slide-82
SLIDE 82

Conclusions

  • Best practices are not enough for data center upgrades
  • Need theory to understand and effectively build

heterogeneous networks

  • Implemented LEGUP and REWIRE, optimization

algorithms to design heterogeneous DCNs

36 28 20 32 27 41 12 68 39 70 47 23 30 24 64 59 11 13 53 52 5 8
slide-83
SLIDE 83