Algorithms at Scale (Week 2) Puzzle of the Day: A bag contains a - - PowerPoint PPT Presentation

algorithms at scale
SMART_READER_LITE
LIVE PREVIEW

Algorithms at Scale (Week 2) Puzzle of the Day: A bag contains a - - PowerPoint PPT Presentation

Algorithms at Scale (Week 2) Puzzle of the Day: A bag contains a collection of blue and red balls. Repeat: Take two balls from the bag. If they are the same color, discard them both and add a blue ball. If they are different colors,


slide-1
SLIDE 1

Algorithms at Scale

(Week 2)

Puzzle of the Day:

A bag contains a collection of blue and red balls. Repeat:

  • Take two balls from the bag.
  • If they are the same color, discard them both and add a blue ball.
  • If they are different colors, discard the blue ball and put the red ball back.

What do you know about the color of the final ball?

slide-2
SLIDE 2

Summary

Today:

Number of connected components in a graph.

  • Additive approximation

algorithm. Weight of MST

  • Multiplicative approximation

algorithm.

Last Week:

Toy example 1: array all 0’s?

  • Gap-style question:

All 0’s or far from all 0’s? Toy example 2: Faction of 1’s?

  • Additive ± 𝜁 approximation
  • Hoeffding Bound

Is the graph connected?

  • Gap-style question.
  • O(1) time algorithm.
  • Correct with probability 2/3.

9 dots 4 lines

slide-3
SLIDE 3

Announcements / Reminders

Problem sets:

Problem Set 1 was due today. Problem Set 2 will be released tonight.

slide-4
SLIDE 4

Announcements / Reminders

Next Week: Guest Lecture

Arnab Bhattacharyya

Arnab’s research: “My research area is theoretical computer science, in a broad sense. More specifically, I am interested in algorithms for big data, computational complexity, analysis and extremal combinatorics on finite fields, and algorithmic models for natural systems.”

slide-5
SLIDE 5

Today’s Problem: Connected Components

Assumptions:

Graph G = (V,E)

  • Undirected
  • n nodes
  • m edges
  • maximum degree d

Error term: 𝜁

Output:

Number of connected components.

Example: output 3

A B c

slide-6
SLIDE 6

Today’s Problem: Connected Components

Approximation:

Output C such that: Alternate form: Correct output: w.p. > 2/3

Example: 𝜁 = 1/10 Output ∊ {2,3,4}

A B c

slide-7
SLIDE 7

Today’s Problem: Connected Components

When is this useful?

What are trivial values of 𝜁? What are hard values of 𝜁? What sort of applications is this useful for?

slide-8
SLIDE 8

Approximate Connected Components

When is this useful?

What are interesting values of 𝜁?

  • What happens when 𝜁 = 1?
  • What happens when 𝜁 = 1/(2n)?

What sort of applications is this useful for?

  • Large graphs?
  • Large social networks?
  • The internet?
  • Networks with many connected components?
  • Number of components follows a heavy tail distribution?
slide-9
SLIDE 9

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u.

w x y z

Key Idea 1:

n(w) = 6 n(x) = 6 n(y) = 3 n(z) = 1

A B c

slide-10
SLIDE 10

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u. Let cost(u) = 1/n(u).

Key Idea 1:

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-11
SLIDE 11

Approximate Connected Components

Why is this useful?

Key Idea 1:

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-12
SLIDE 12

Why is this useful?

Approximate Connected Components Key Idea 1:

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-13
SLIDE 13

Why is this useful?

Approximate Connected Components Key Idea 1:

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-14
SLIDE 14

Why is this useful?

Approximate Connected Components Key Idea 1:

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-15
SLIDE 15

sum = 0 for each u in V: sum = sum + cost(u) return sum

Approximate Connected Components Algorithm 1

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-16
SLIDE 16

sum = 0 for each u in V: sum = sum + cost(u) return sum Comments:

  • Need a way to efficiently

compute cost(u).

  • Runs in O(n) time.

Approximate Connected Components Algorithm 1

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-17
SLIDE 17

Sample

  • Choose a small random

subset S of V.

  • For each node u in S,

compute cost(u).

  • Use the sample to estimate

the average cost of all the nodes.

Approximate Connected Components Key Idea 2: Sampling

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-18
SLIDE 18

Worries?

Approximate Connected Components Key Idea 2: Sampling

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-19
SLIDE 19

Worries?

  • Big components are sampled

more often than small components?

  • Small components may

never be sampled?

  • Bad examples?

1 component of size 90, 10 components of size 1

Approximate Connected Components Key Idea 2: Sampling

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-20
SLIDE 20

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Comments:

  • (sum/s) is average cost of sample.
  • Efficiently compute cost(u)?
  • Runs in O(s) time.

Approximate Connected Components Algorithm 2

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-21
SLIDE 21

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Define random variables: Y1, Y2, …, Ys

Approximate Connected Components Algorithm 2 Analysis

slide-22
SLIDE 22

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Approximate Connected Components Algorithm 2 Analysis

slide-23
SLIDE 23

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Approximate Connected Components Algorithm 2 Analysis

slide-24
SLIDE 24

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Approximate Connected Components Algorithm 2 Analysis

slide-25
SLIDE 25

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Approximate Connected Components Algorithm 2 Analysis

slide-26
SLIDE 26

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Approximate Connected Components Algorithm 2 Analysis

slide-27
SLIDE 27

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Notice:

Output of algorithm is:

Approximate Connected Components Algorithm 2 Analysis

slide-28
SLIDE 28

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Notice:

Expected output of algorithm is:

Approximate Connected Components Algorithm 2 Analysis

slide-29
SLIDE 29

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Important step:

Expected out is number of connected components! (Algorithm is an unbiased estimator.)

Approximate Connected Components Algorithm 2 Analysis

slide-30
SLIDE 30

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Notice:

Goal:

Approximate Connected Components Algorithm 2 Analysis

slide-31
SLIDE 31

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

Notice:

Goal:

Approximate Connected Components Algorithm 2 Analysis

slide-32
SLIDE 32

Given: independent random variables Y1, Y2, …, Ys Assume: each Yj ∊ [0,1]

Then:

Approximate Connected Components Reminder: Hoeffding Bound

slide-33
SLIDE 33

Given: independent random variables Y1, Y2, …, Ys Assume: each Yj ∊ [0,1]

Then:

Approximate Connected Components Reminder: Hoeffding Bound

Goal:

slide-34
SLIDE 34

Derivation:

Approximate Connected Components Algorithm 2 Analysis

slide-35
SLIDE 35

Derivation:

Approximate Connected Components Algorithm 2 Analysis

slide-36
SLIDE 36

Derivation:

Approximate Connected Components Algorithm 2 Analysis

slide-37
SLIDE 37

Derivation:

Approximate Connected Components Algorithm 2 Analysis

slide-38
SLIDE 38

Derivation:

Approximate Connected Components Algorithm 2 Analysis

slide-39
SLIDE 39

Approximate Connected Components Algorithm 2 Analysis

Derivation:

slide-40
SLIDE 40

Approximate Connected Components Algorithm 2 Analysis

Derivation:

slide-41
SLIDE 41

Approximate Connected Components Algorithm 2 Analysis

Derivation:

slide-42
SLIDE 42

Approximate Connected Components Algorithm 2 Analysis

Derivation:

slide-43
SLIDE 43

Approximate Connected Components Algorithm 2 Analysis

Derivation:

slide-44
SLIDE 44

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

We have shown: W.p. > 2/3, output is equal to: CC(G) ± 𝜁n/2

Approximate Connected Components Algorithm 2

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-45
SLIDE 45

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

We have shown: Time: O(1/𝜁2)

Approximate Connected Components Algorithm 2

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-46
SLIDE 46

Key problem:

How to efficiently compute cost(u).

Approximate Connected Components Key Idea 2: Sampling

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-47
SLIDE 47

Key problem:

How to efficiently compute cost(u).

Key idea 3:

Approximate cost(u).

Approximate Connected Components Key Idea 2: Sampling

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-48
SLIDE 48

Approximate low cost components:

If cost(u) is small, round up.

Approximate Connected Components Key Idea 3: Approximate Cost

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

How small is small enough?

slide-49
SLIDE 49

Approximate low cost components:

If cost(u) < 𝜁/2, round up.

Approximate Connected Components Key Idea 3: Approximate Cost

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-50
SLIDE 50

Ignore low cost components:

If cost(u) < 𝜁/2, round up. Total added cost ≤ 𝜁n/2.

Approximate Connected Components Key Idea 3: Approximate Cost

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-51
SLIDE 51

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u. Let ñ(u) = min(n(u), 2/𝜁). Let cost(u) = max(1/n(u), 𝜁/2). = 1/ñ(u).

Key Idea 3: Approximate Cost

w x y z cost(w) = 1/6 cost(x) = 1/6 cost(y) = 1/3 cost(z) = 1

A B c

slide-52
SLIDE 52

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u. Let ñ(u) = min(n(u), 2/𝜁). Let cost(u) = max(1/n(u), 𝜁/2). = 1/ñ(u).

Key Idea 3: Approximate Cost

Define: Note:

slide-53
SLIDE 53

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u. Let ñ(u) = min(n(u), 2/𝜁). Let cost(u) = max(1/n(u), 𝜁/2). = 1/ñ(u).

Key Idea 3: Approximate Cost

Define: Note:

slide-54
SLIDE 54

Approximate Connected Components Close enough approximation:

Intuition: By rounding cost(u) up to 𝜁/2, we increase the error at most 𝜁n/2.

slide-55
SLIDE 55

Approximate Connected Components Close enough approximation:

Intuition: By rounding cost(u) up to 𝜁/2, we increase the error at most 𝜁n/2.

slide-56
SLIDE 56

Approximate Connected Components Close enough approximation:

Intuition: By rounding cost(u) up to 𝜁/2, we increase the error at most 𝜁n/2.

slide-57
SLIDE 57

Approximate Connected Components Close enough approximation:

Intuition: By rounding cost(u) up to 𝜁/2, we increase the error at most 𝜁n/2.

slide-58
SLIDE 58

Approximate Connected Components Close enough approximation:

Intuition: By rounding cost(u) up to 𝜁/2, we increase the error at most 𝜁n/2.

slide-59
SLIDE 59

Approximate Connected Components Close enough approximation:

Intuition: By rounding cost(u) up to 𝜁/2, we increase the error at most 𝜁n/2.

slide-60
SLIDE 60

sum = 0 for j = 1 to s: Choose u uniformly at random. sum = sum + cost(u) return n∙(sum/s)

We have shown: Sufficient to approximate cost(u) by rounding up.

Approximate Connected Components Algorithm 3

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-61
SLIDE 61

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u. Let ñ(u) = min(n(u), 2/𝜁). Let cost(u) = max(1/n(u), 𝜁/2). = 1/ñ(u).

Algorithm 3

How to efficiently compute cost(u)?

slide-62
SLIDE 62

Approximate Connected Components

Define: per-node cost

Let n(u) = number of nodes in the connected component containing node u. Let ñ(u) = min(n(u), 2/𝜁). Let cost(u) = max(1/n(u), 𝜁/2). = 1/ñ(u).

Algorithm 3

How to efficiently compute cost(u)?

slide-63
SLIDE 63

sum = 0 for j = 1 to s: Choose u uniformly at random. Perform a BFS from u; stop after seeing 2/𝜁 nodes. if BFS found > 2/𝜁 nodes: sum = sum + 𝜁/2 else if BFS found n(u) nodes: sum = sum + 1/n(u) return n∙(sum/s)

Approximate Connected Components Algorithm 3

slide-64
SLIDE 64

Goal:

Approximate Connected Components Analysis

slide-65
SLIDE 65

Goal: Implies:

Approximate Connected Components Analysis

slide-66
SLIDE 66

Define random variables: Y1, Y2, …, Ys

Approximate Connected Components Algorithm 3 Analysis

Rounded up cost

slide-67
SLIDE 67

Define random variables: Y1, Y2, …, Ys

Approximate Connected Components Algorithm 3 Analysis

slide-68
SLIDE 68

Unbiased estimator:

Approximate Connected Components Algorithm 3 Analysis

slide-69
SLIDE 69

Notice:

Expected output of algorithm is:

Approximate Connected Components Algorithm 3 Analysis

slide-70
SLIDE 70

Goal:

Approximate Connected Components Algorithm 3 Analysis

slide-71
SLIDE 71

Derivation:

Approximate Connected Components Algorithm 3 Analysis

slide-72
SLIDE 72

Approximate Connected Components Algorithm 3 Analysis

Derivation:

slide-73
SLIDE 73

Goal: Implies:

Approximate Connected Components Analysis

slide-74
SLIDE 74

sum = 0 for j = 1 to s: Choose u uniformly at random. Perform a BFS from u; stop after seeing 2/𝜁 nodes. if BFS found > 2/𝜁 nodes: sum = sum + 𝜁/2 else if BFS found n(u) nodes: sum = sum + 1/n(u) return n∙(sum/s)

Approximate Connected Components Algorithm 3

slide-75
SLIDE 75

We have shown: With probability > 2/3,

  • utput is equal to:

CC(G) ± 𝜁n

Approximate Connected Components Algorithm 3

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-76
SLIDE 76

sum = 0 for j = 1 to s: Choose u uniformly at random. Perform a BFS from u; stop after seeing 2/𝜁 nodes. if BFS found > 2/𝜁 nodes: sum = sum + 𝜁/2 else if BFS found n(u) nodes: sum = sum + 1/n(u) return n∙(sum/s)

Approximate Connected Components Algorithm 3

Cost of BFS: O((2 / 𝜁)∙d)

slide-77
SLIDE 77

sum = 0 for j = 1 to s: Choose u uniformly at random. Perform a BFS from u; stop after seeing 2/𝜁 nodes. if BFS found > 2/𝜁 nodes: sum = sum + 𝜁/2 else if BFS found n(u) nodes: sum = sum + 1/n(u) return n∙(sum/s)

Approximate Connected Components Algorithm 3

Cost of BFS: O((2 / 𝜁)∙d) Total cost: O(s(2/𝜁)∙d) = O((1/𝜁2)(2/𝜁)d) = O(d/𝜁3)

slide-78
SLIDE 78

We have shown: With probability > 2/3,

  • utput is equal to:

CC(G) ± 𝜁n Running time:

Approximate Connected Components Algorithm 3

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-79
SLIDE 79

We have shown: With probability > 1 - 1/δ,

  • utput is equal to:

CC(G) ± 𝜁n Running time:

Approximate Connected Components Algorithm 3

y z cost(y) = 1/3 cost(z) = 1 w x cost(w) = 1/6 cost(x) = 1/6

A B c

slide-80
SLIDE 80

Summary

Today:

Number of connected components in a graph.

  • Approximation algorithm.

Weight of MST

  • Approximation algorithm.

Last Week:

Toy example 1: array all 0’s?

  • Gap-style question:

All 0’s or far from all 0’s? Toy example 2: Faction of 1’s?

  • Additive ± 𝜁 approximation
  • Hoeffding Bound

Is the graph connected?

  • Gap-style question.
  • O(1) time algorithm.
  • Correct with probability 2/3.

9 dots 4 lines

slide-81
SLIDE 81

Today’s Problem: Minimum Spanning Tree

Assumptions:

Graph G = (V,E)

  • Undirected
  • Weighted, max weight W
  • Connected
  • n nodes
  • m edges
  • maximum degree d

Error term: 𝜁 < 1/2

Output:

Weight of MST.

Example: output 16 1 1 1 2 2 2 2 2 3 3 3 3

slide-82
SLIDE 82

Today’s Problem: Minimum Spanning Tree

Approximation:

Output M such that: Alternate form: Correct output: w.p. > 2/3

1 1 1 2 2 2 2 2 3 3 3 3 Example: 𝜁 = 1/4 Output ∊ [12,20]

slide-83
SLIDE 83

Today’s Problem: Minimum Spanning Tree

When is this useful?

What are trivial values of 𝜁? What are hard values of 𝜁? What sort of applications is this useful for?

Why multiplicative approximation for MST and additive approximation for connected components?

slide-84
SLIDE 84

Simple Minimum Spanning Tree

Which edges must be in MST? How many weight-2 edges in MST? Best (exact) algorithm?

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

slide-85
SLIDE 85

Simple Minimum Spanning Tree

Let G1 = graph containing only edges of weight 1.

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

slide-86
SLIDE 86

Simple Minimum Spanning Tree

Let G1 = graph containing only edges of weight 1. Let C1 = number of connected components in G1.

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-87
SLIDE 87

Simple Minimum Spanning Tree

Let G1 = graph containing only edges of weight 1. Let C1 = number of connected components in G1. Claim: MST contains example C1-1 edges of weight 2.

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-88
SLIDE 88

Simple Minimum Spanning Tree

Claim: MST contains example C1-1 edges of weight 2. Basic MST Property:

For any cut, minimum weight edge across cut is in MST.

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-89
SLIDE 89

Simple Minimum Spanning Tree

Claim: MST contains example C1-1 edges of weight 2. Algorithm:

For any connected component, add minimum weight outgoing edge. Here all the edges have weight 2, so add C1-1 edges of weight 2.

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-90
SLIDE 90

Simple Minimum Spanning Tree

Claim: MST contains example C1-1 edges of weight 2. Weight of MST?

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-91
SLIDE 91

Simple Minimum Spanning Tree

Claim: MST contains example C1-1 edges of weight 2. Weight of MST?

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6 Ex: 10 + 6 – 2 = 14

slide-92
SLIDE 92

Simple Minimum Spanning Tree

Weight of MST: n + C1 - 2 Algorithm idea?

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-93
SLIDE 93

Simple Minimum Spanning Tree

Weight of MST: n + C1 - 2 Algorithm idea: Approximate connected components of G1.

Assume all weights 1 or 2

1 1 1 2 2 2 2 2 2 2 2 2 1

Ex: C1 = 6

slide-94
SLIDE 94

Approximate Minimum Spanning Tree

Let G1 = graph containing only edges of weight 1. Let G2 = graph containing only edges of weight {1, 2}.

Let Gj = graph containing only edges of weights {1, 2, …, j}.

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-95
SLIDE 95

Approximate Minimum Spanning Tree

Let C1 = number CC in G1. Let C2 = number CC in G2.

Let Cj = number CC in Gj.

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-96
SLIDE 96

Approximate Minimum Spanning Tree

Claim: MST(G) contains Cj – 1 edges

  • f weight > j.

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-97
SLIDE 97

Approximate Minimum Spanning Tree

Claim: MST(G) contains Cj – 1 edges

  • f weight > j.

Why? There are Cj connected components in Gj. There much be Cj – 1 edges connecting them, and each must have weight > j.

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-98
SLIDE 98

Approximate Minimum Spanning Tree

Lemma:

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-99
SLIDE 99

Approximate Minimum Spanning Tree

Edges of weight 1: n – 1 edges total in MST C1 – 1 edges of weight > 1  (n – 1) – (C1 – 1) edges of weight 1.  (n – C1) edges of weight 1.

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-100
SLIDE 100

Approximate Minimum Spanning Tree

Edges of weight j+1: Cj – 1 edges of weight > j Cj+1 – 1 edges of weight > j+1  (Cj – 1) – (Cj+1 – 1) edges of weight j+1.  (Cj – Cj+1) edges of weight j+1.

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2 Note: Cj ≥ Cj+1

slide-101
SLIDE 101

Approximate Minimum Spanning Tree

Sum the weights:

Weights {1, 2, …, W}

number of edges of weight 1 number of edges of weight j+1 weight of edge

  • f weight j+1

Note: sum is from j = 1 to W-1.

slide-102
SLIDE 102

Approximate Minimum Spanning Tree

Sum the weights:

Weights {1, 2, …, W}

slide-103
SLIDE 103

Approximate Minimum Spanning Tree

Sum the weights:

Weights {1, 2, …, W}

slide-104
SLIDE 104

Approximate Minimum Spanning Tree

Sum the weights:

Weights {1, 2, …, W}

slide-105
SLIDE 105

Approximate Minimum Spanning Tree

Lemma:

Weights {1, 2, …, W}

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-106
SLIDE 106

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Algorithm ApproxMST

1 1 1 2 2 2 2 2 3 3 3 3

Ex: G2

slide-107
SLIDE 107

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Error Calculation

Set: 𝜁’ = 𝜁/W Sum of errors: ≤ W(𝜁n/W) ≤ 𝜁n

slide-108
SLIDE 108

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Error Calculation

Guarantee for each AproxCC:

slide-109
SLIDE 109

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Error Calculation

Guarantee for each AproxCC: Not good enough: Pr{all correct} ≅ (2/3)W

slide-110
SLIDE 110

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Error Calculation

Set 𝜁’ = 𝜁/W, δ = 1/(3W) Error probability:

slide-111
SLIDE 111

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Error Calculation

Set 𝜁’ = 𝜁/W, δ = 1/(3W) Guarantee for each AproxCC:

slide-112
SLIDE 112

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Error Calculation

Set: 𝜁’ = 𝜁/W, δ = 1/(3W) Sum of errors: ≤ W(𝜁n/W) ≤ 𝜁n 

slide-113
SLIDE 113

Approximate Minimum Spanning Tree Error Calculation

slide-114
SLIDE 114

Approximate Minimum Spanning Tree Error Calculation

slide-115
SLIDE 115

Approximate Minimum Spanning Tree Error Calculation

slide-116
SLIDE 116

Approximate Minimum Spanning Tree Error Calculation

slide-117
SLIDE 117

Approximate Minimum Spanning Tree Error Calculation

slide-118
SLIDE 118

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Running time

Set 𝜁’ = 𝜁/W, δ = 1/(3W) Running time:

slide-119
SLIDE 119

Approximate Minimum Spanning Tree

sum = n – W for j = 1 to W – 1: Xj = AproxCC(Gj, d, 𝜁’, δ) sum = sum + Xj return sum

Running Time

Set 𝜁’ = 𝜁/W, δ = 1/(3W) Running time:

slide-120
SLIDE 120

We have shown: With probability > 2/3, output is equal to: MST(G)(1 ± 𝜁n) Running time:

Approximate MST Summary

slide-121
SLIDE 121

Note: Impossible to do better than: Best known:

Approximate MST Summary

See: Chazelle, Rubinfeld, Trevisan

slide-122
SLIDE 122

Summary

Today:

Number of connected components in a graph.

  • Approximation algorithm.

Weight of MST

  • Approximation algorithm.

Last Week:

Toy example 1: array all 0’s?

  • Gap-style question:

All 0’s or far from all 0’s? Toy example 2: Faction of 1’s?

  • Additive ± 𝜁 approximation
  • Hoeffding Bound

Is the graph connected?

  • Gap-style question.
  • O(1) time algorithm.
  • Correct with probability 2/3.

9 dots 4 lines

slide-123
SLIDE 123

Today’s Problem: Maximum Matching

Matching:

Output set of edges M such that no two edges in M are adjacent.

Size of Maximum Matching:

Output the largest value v where there is a matching M of size v.

Example: Size of matching: 5

slide-124
SLIDE 124

Today’s Problem: Maximal Matching

Maximal Matching:

Output set of edges M such that no two edges in M are adjacent, and no more edges can be added to M.

Size of Maximal Matching:

Output the largest value v where there is a maximal matching M of size v.

Example: Size of matching: 5

slide-125
SLIDE 125

Today’s Problem: Maximal Matching

Size of Maximal Matching:

Output the largest value v where there is a maximal matching M of size v. Note: The maximum matching is at most twice as big as the maximal matching.  Maximal is a 2-approximation of maximum.

Example: Size of matching: 5

slide-126
SLIDE 126

Today’s Problem: Maximal Matching

Algorithm for maximal matching:

1) Assign each edge a random

  • number. (Equivalent: choose a

random permutation of the edges.)

1 2 3 4 5 6 7 8 9 10 11 12

slide-127
SLIDE 127

Today’s Problem: Maximal Matching

Algorithm for maximal matching:

1) Assign each edge a random

  • number. (Equivalent: choose a

random permutation of the edges.) 2) Greedily, in order, try to add each edge to the matching.

1 2 3 4 5 6 7 8 9 10 11 12

slide-128
SLIDE 128

Today’s Problem: Maximal Matching

Algorithm for maximal matching:

1) Assign each edge a random

  • number. (Equivalent: choose a

random permutation of the edges.) 2) Greedily, in order, try to add each edge to the matching.

1 2 3 4 5 6 7 8 9 10 11 12

slide-129
SLIDE 129

Today’s Problem: Maximal Matching

Algorithm for maximal matching:

1) Assign each edge a random

  • number. (Equivalent: choose a

random permutation of the edges.) 2) Greedily, in order, try to add each edge to the matching.  Each random permutation defines a unique maximal matching.

1 2 3 4 5 6 7 8 9 10 11 12

slide-130
SLIDE 130

Today’s Problem: Maximal Matching

To solve via sampling:

1) Choose a random permutation for the edges (e.g., a hash function). 2) Choose s edges at random. 3) Decide if they are in the matching for the chosen permutation.

1 2 3 4 5 6 7 8 9 10 11 12

slide-131
SLIDE 131

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if query(e’) = true return false return true

slide-132
SLIDE 132

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if query(e’) = true return false return true

Oops… That doesn’t exactly work!

slide-133
SLIDE 133

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-134
SLIDE 134

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-135
SLIDE 135

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-136
SLIDE 136

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-137
SLIDE 137

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-138
SLIDE 138

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-139
SLIDE 139

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-140
SLIDE 140

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-141
SLIDE 141

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-142
SLIDE 142

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-143
SLIDE 143

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-144
SLIDE 144

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-145
SLIDE 145

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-146
SLIDE 146

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-147
SLIDE 147

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-148
SLIDE 148

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

slide-149
SLIDE 149

Today’s Problem: Maximal Matching

To decide if an edge is in the matching:

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

hash(e) returns the number chosen for edge e. Only query smaller edges. Larger edges do not matter.

 FALSE

slide-150
SLIDE 150

Today’s Problem: Maximal Matching

Key question: How expensive is a query?

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

slide-151
SLIDE 151

Today’s Problem: Maximal Matching

Some simple analysis: If graph has maximum degree d, then there are at most 2dk paths of length k starting from the query edge.

1 2 3 4 5 6 7 8 9 10 11 12

slide-152
SLIDE 152

Today’s Problem: Maximal Matching

Some simple analysis: If graph has maximum degree d, then there are at most 2dk paths of length k starting from the query edge. Each path of length k defines a random permutation of hash values.

1 2 3 4 5 6 7 8 9 10 11 12 Permutation: [6,1,11,10,3]

slide-153
SLIDE 153

Today’s Problem: Maximal Matching

Some simple analysis: If graph has maximum degree d, then there are at most 2dk paths of length k starting from the query edge. Each path of length k defines a random permutation of hash values. There are k! possible permutations.

1 2 3 4 5 6 7 8 9 10 11 12 Permutation: [6,1,11,10,3]

slide-154
SLIDE 154

Today’s Problem: Maximal Matching

Some simple analysis: If graph has maximum degree d, then there are at most 2dk paths of length k starting from the query edge. Each path of length k defines a random permutation of hash values. There are k! possible permutations. Pr[path is all decreasing] = 1/k!

1 2 3 4 5 6 7 8 9 10 11 12 Permutation: [6,1,11,10,3]

slide-155
SLIDE 155

Today’s Problem: Maximal Matching

Conclusion: The expected number of paths traversed of length k is at most:

𝑒𝑙 𝑙!

1 2 3 4 5 6 7 8 9 10 11 12 Permutation: [6,1,11,10,3]

slide-156
SLIDE 156

Today’s Problem: Maximal Matching

Conclusion: The expected number of paths traversed of length k is at most:

𝑒𝑙 𝑙!

The expected total cost of a query is: ෍

𝑙=1 ∞ 𝑒𝑙

𝑙! = 𝑃 𝑓𝑒

1 2 3 4 5 6 7 8 9 10 11 12 Permutation: [6,1,11,10,3]

slide-157
SLIDE 157

Today’s Problem: Maximal Matching

Key question: How expensive is a query? E[cost] = O(ed)

1 2 3 4 5 6 7 8 9 10 11 12

query(e): for all neighbors e’ of e: if hash(e’) < hash(e) if query(e’) = true return false return true

slide-158
SLIDE 158

Today’s Problem: Maximal Matching

To solve via sampling:

1) Choose a random permutation for the edges (e.g., a hash function). 2) Choose s edges at random. 3) Decide if they are in the matching for the chosen permutation via query operation.

1 2 3 4 5 6 7 8 9 10 11 12

slide-159
SLIDE 159

sum = 0 for j = 1 to s: Choose edge e uniformly at random. if (query(e) = true) then sum = sum + 1 return m∙(sum/s)

Approximate Maximal Matching MaxMatch-Sampling

slide-160
SLIDE 160

sum = 0 for j = 1 to s: Choose edge e uniformly at random. if (query(e) = true) then sum = sum + 1 return m∙(sum/s)

Approximate Maximal Matching MaxMatch-Sampling

Claim: returns size of maximal matching ± εm

slide-161
SLIDE 161

sum = 0 for j = 1 to s: Choose edge e uniformly at random. if (query(e) = true) then sum = sum + 1 return m∙(sum/s)

Approximate Maximal Matching MaxMatch-Sampling

Claim: returns size of maximal matching ± εm Claim: Runs in time O(ed / ε2)

slide-162
SLIDE 162

Today’s Problem: Maximal Matching

Two improvements: 1) Reduce error from ± εm to ± εn.

1 2 3 4 5 6 7 8 9 10 11 12

slide-163
SLIDE 163

Today’s Problem: Maximal Matching

Two improvements: 1) Reduce error from ± εm to ± εn.

(Hint: each node is either matched or unmatched, and you can compute the size of the matching from the number of matched nodes.) 1 2 3 4 5 6 7 8 9 10 11 12

slide-164
SLIDE 164

Today’s Problem: Maximal Matching

Two improvements: 1) Reduce error from ± εm to ± εn.

(Hint: each node is either matched or unmatched, and you can compute the size of the matching from the number of matched nodes.)

2) Reduce the running time from exponential to O(d4/ ε2).

1 2 3 4 5 6 7 8 9 10 11 12

slide-165
SLIDE 165

Today’s Problem: Maximal Matching

Two improvements: 1) Reduce error from ± εm to ± εn.

(Hint: each node is either matched or unmatched, and you can compute the size of the matching from the number of matched nodes.)

2) Reduce the running time from exponential to O(d4/ ε2).

(Hint: In query, explore neighboring edges in

  • rder of smallest weight first. Analysis is not

simple!) 1 2 3 4 5 6 7 8 9 10 11 12

slide-166
SLIDE 166

Questions to think about:

1) Show that the sampling algorithm works as claims (if the query

  • peration is correct).

2) Reduce error from ± εm to ± εn.

(Hint: each node is either matched or unmatched, and you can compute the size of the matching from the number of matched nodes.)

3) Can you find a multiplicative (instead of additive) approximation? Why not?

(Hint: Think about a graph where the maximal matching is very small.) 1 2 3 4 5 6 7 8 9 10 11 12

slide-167
SLIDE 167

Two more questions:

1) Give an algorithm for deciding if the black pixels are connected or ε-far from connected in an n by n square of pixels. 2) Give an algorithm for deciding if the black pixels are a rectangle or ε-far from a rectangle in an n by n square of pixels.

connected rectangle Hint: imagine querying a grid of pixels distance εn apart.

slide-168
SLIDE 168

Summary

Today:

Number of connected components in a graph.

  • Approximation algorithm.

Weight of MST

  • Approximation algorithm.

Size of maximal matching

  • Approximation algorithm.

Last Week:

Toy example 1: array all 0’s?

  • Gap-style question:

All 0’s or far from all 0’s? Toy example 2: Faction of 1’s?

  • Additive ± 𝜁 approximation
  • Hoeffding Bound

Is the graph connected?

  • Gap-style question.
  • O(1) time algorithm.
  • Correct with probability 2/3.