CSC373 Week 11: Randomized Algorithms
373F19 - Nisarg Shah & Karan Singh 1
CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & - - PowerPoint PPT Presentation
CSC373 Week 11: Randomized Algorithms 373F19 - Nisarg Shah & Karan Singh 1 Randomized Algorithms Input Deterministic Algorithm Output Input Randomized Algorithm Output Randomness 373F19 - Nisarg Shah & Karan Singh 2 Randomized
373F19 - Nisarg Shah & Karan Singh 1
373F19 - Nisarg Shah & Karan Singh 2
Deterministic Algorithm Input Output Randomized Algorithm Input Output Randomness
373F19 - Nisarg Shah & Karan Singh 3
➢ Sometimes, we want the algorithm to always take a small
amount of time
➢ Sometimes, we want the algorithm to take a small
amount of time in expectation
373F19 - Nisarg Shah & Karan Singh 4
➢ We want the algorithm to return a solution that is, in
expectation, close to the optimum according to the
373F19 - Nisarg Shah & Karan Singh 5
➢ Informally, the randomized algorithm is making some
random choices, and sometimes they turn out to be good
➢ Can we make these “good” choices deterministically?
373F19 - Nisarg Shah & Karan Singh 6
➢ Discrete
probability 1/6 each)
➢ Continuous
distribution over [0,1], …
373F19 - Nisarg Shah & Karan Singh 7
➢ Conditional probabilities ➢ Independence among random variables ➢ Conditional expectations ➢ Moments of random variables ➢ Standard discrete distributions: uniform over a finite set,
Bernoulli, binomial, geometric, Poisson, …
➢ Standard continuous distributions: uniform over intervals,
Gaussian/normal, exponential, …
373F19 - Nisarg Shah & Karan Singh 8
373F19 - Nisarg Shah & Karan Singh 9
➢ 𝐹 𝑌 + 𝑍 = 𝐹 𝑌 + 𝐹[𝑍] ➢ This does not require any independence assumptions
➢ E.g. if you want to find out how many people will attend
your party on average, just ask each person the probability with which they will attend and add up
attend together or not attend together
373F19 - Nisarg Shah & Karan Singh 10
➢ For any two events 𝐵 and 𝐶, Pr 𝐵 ∪ 𝐶 ≤ Pr 𝐵 + Pr[𝐶] ➢ “Probability that at least one of the 𝑜 events 𝐵1, … , 𝐵𝑜
will occur is at most σ𝑗 Pr 𝐵𝑗 ”
➢ Typically, 𝐵1, … , 𝐵𝑜 are “bad events”
1 2𝑜 for each 𝑗, then
probability that at least one them occurs ≤ Τ 1 2
1 2, none of the bad events will occur
➢ Read up!
373F19 - Nisarg Shah & Karan Singh 11
373F19 - Nisarg Shah & Karan Singh 12
➢ Input: An exact 𝑙-SAT formula 𝜒 = 𝐷1 ∧ 𝐷2 ∧ ⋯ ∧ 𝐷𝑛,
where each clause 𝐷𝑗 has exactly 𝑙 literals, and a weight 𝑥𝑗 ≥ 0 of each clause 𝐷𝑗
➢ Output: A truth assignment 𝜐 maximizing the number (or
total weight) of clauses satisfied under 𝜐
➢ Let us denote by 𝑋(𝜐) the total weight of clauses
373F19 - Nisarg Shah & Karan Singh 13
➢ 𝑂𝑒(𝜐) = set of all truth assignments which can be obtained
by changing the value of at most 𝑒 variables in 𝜐
2 3-apx for Exact
3 4-apx for
3 4-apx for Exact Max-2-SAT.
373F19 - Nisarg Shah & Karan Singh 14
➢ 𝑂𝑒(𝜐) = set of all truth assignments which can be obtained
by changing the value of at most 𝑒 variables in 𝜐
2𝑙−1 2𝑙 -apx for Exact Max-𝑙-SAT
➢ Algorithm becomes slightly more complicated
373F19 - Nisarg Shah & Karan Singh 15
➢ We have a formula 𝜒 = 𝐷1 ∧ 𝐷2 ∧ ⋯ ∧ 𝐷𝑛 ➢ Variables = 𝑦1, … , 𝑦𝑜, literals = variables or their negations ➢ Each clause contains exactly 𝑙 literals
➢ Set each variable to TRUE with probability ½ and to FALSE
373F19 - Nisarg Shah & Karan Singh 16
➢ We have a formula 𝜒 = 𝐷1 ∧ 𝐷2 ∧ ⋯ ∧ 𝐷𝑛 ➢ Variables = 𝑦1, … , 𝑦𝑜, literals = variables or their negations ➢ Each clause contains exactly 𝑙 literals
➢ Pr[𝐷𝑗 is not satisfied] = 1/2𝑙 (WHY?) ➢ Hence, Pr[𝐷𝑗 is satisfied] = (2𝑙 − 1)/2𝑙
373F19 - Nisarg Shah & Karan Singh 17
➢ Pr[𝐷𝑗 is not satisfied] = 1/2𝑙 (WHY?) ➢ Hence, Pr[𝐷𝑗 is satisfied] = (2𝑙 − 1)/2𝑙
➢ 𝐹 𝑋 𝜐
= σ𝑗=1
𝑛 𝑥𝑗 ⋅ Pr[𝐷𝑗 is satisfied]
➢ 𝐹 𝑋 𝜐
=
2𝑙−1 2𝑙 ⋅ σ𝑗=1 𝑛 𝑥𝑗 ≥ 2𝑙−1 2𝑙 ⋅ 𝑃𝑄𝑈
373F19 - Nisarg Shah & Karan Singh 18
➢ What are the choices made by the algorithm?
➢ How do we know which set of choices is good?
➢ Do not think about all the choices at once. ➢ Think about them one by one.
373F19 - Nisarg Shah & Karan Singh 19
➢ Choices of 𝑦2, … , 𝑦𝑜 are still random
𝐹 𝑋 𝜐 = Pr 𝑦1 = 𝑈 ⋅ 𝐹 𝑋 𝜐 𝑦1 = 𝑈 + Pr 𝑦1 = 𝐺 ⋅ 𝐹 𝑋 𝜐 𝑦1 = 𝐺 = ൗ 1 2 ⋅ 𝐹 𝑋 𝜐 𝑦1 = 𝑈 + ൗ 1 2 ⋅ 𝐹 𝑋 𝜐 𝑦1 = 𝐺
➢ This means at least one of 𝐹[𝑋(𝜐)|𝑦1 = 𝑈] and
better of the two!
➢
𝐹 𝑋 𝜐 |𝑦1 = 𝑈 = ൗ 1 2 ⋅ 𝐹 𝑋 𝜐 𝑦1 = 𝑈, 𝑦2 = 𝑈 + ൗ 1 2 ⋅ 𝐹 𝑋 𝜐 𝑦1 = 𝑈, 𝑦2 = 𝐺
➢ And then we can pick the choice that leads to a better
conditional expectation
➢ For 𝑗 = 1, … , 𝑜
𝐹 𝑋 𝜐 𝑦1 = 𝑨1, … , 𝑦𝑗−1 = 𝑨𝑗−1, 𝑦𝑗 = 𝐺 , and 𝑨𝑗 = 𝐺 otherwise
373F19 - Nisarg Shah & Karan Singh 20
➢ If we’re happy when making a choice at random, we
should be at least as happy conditioned on at least one of the possible values of that choice
➢ For 𝑗 = 1, … , 𝑜
𝐹 𝑋 𝜐 𝑦1 = 𝑨1, … , 𝑦𝑗−1 = 𝑨𝑗−1, 𝑦𝑗 = 𝐺 , and 𝑨𝑗 = 𝐺 otherwise
➢ How do we compare the two conditional expectations?
373F19 - Nisarg Shah & Karan Singh 21
373F19 - Nisarg Shah & Karan Singh 22
➢ = σ𝑠 𝑥𝑠 ⋅ Pr[𝐷𝑠 is satisfied 𝑦1 = 𝑨1, … , 𝑦𝑗−1 = 𝑨𝑗−1, 𝑦𝑗 = 𝑈
➢ Set the values of 𝑦1, … , 𝑦𝑗−1, 𝑦𝑗 ➢ If 𝐷𝑠 resolves to TRUE already, the corresponding probability is 1 ➢ If 𝐷𝑠 resolves to FALSE already, the corresponding probability is 0 ➢ Otherwise, if there are ℓ literals left in 𝐷𝑠 after setting 𝑦1, … , 𝑦𝑗−1, 𝑦𝑗,
the corresponding probability is
2ℓ−1 2ℓ
373F19 - Nisarg Shah & Karan Singh 23
➢
2𝑙−1 2𝑙 −approximation for Max-𝑙-SAT
➢ Max-3-SAT ⇒ Τ
7 8
➢ Max-2-SAT ⇒ Τ
3 4 = 0.75
programming and randomized rounding
➢ Max-SAT ⇒ Τ
1 2
programming and randomized rounding
373F19 - Nisarg Shah & Karan Singh 24
➢ Semi-definite programming is out of the scope ➢ But we will see the simpler “LP + randomized rounding”
approach that gives 1 − Τ
1 𝑓 ≈ 0.6321 approximation
➢ Input: 𝜒 = 𝐷1 ∧ 𝐷2 ∧ ⋯ ∧ 𝐷𝑛, where each clause 𝐷𝑗 has
➢ Output: Truth assignment that approximately maximizes
the weight of clauses satisfied
373F19 - Nisarg Shah & Karan Singh 25
➢ Variables:
𝑘 = 1 iff clause 𝐷 𝑘 is satisfied in Max-SAT
Maximize Σ𝑘 𝑥
𝑘 ⋅ 𝑨 𝑘
s.t. Σ𝑦𝑗∈𝐷𝑘 𝑧𝑗 + Σ ҧ
𝑦𝑗∈𝐷𝑘 1 − 𝑧𝑗 ≥ 𝑨 𝑘
∀𝑘 ∈ 1, … , 𝑛 𝑧𝑗, 𝑨
𝑘 ∈ 0,1
∀𝑗 ∈ 1, … , 𝑜 , 𝑘 ∈ 1, … , 𝑛
373F19 - Nisarg Shah & Karan Singh 26
➢ Variables:
𝑘 = 1 iff clause 𝐷 𝑘 is satisfied in Max-SAT
Maximize Σ𝑘 𝑥
𝑘 ⋅ 𝑨 𝑘
s.t. Σ𝑦𝑗∈𝐷𝑘 𝑧𝑗 + Σ ҧ
𝑦𝑗∈𝐷𝑘 1 − 𝑧𝑗 ≥ 𝑨 𝑘
∀𝑘 ∈ 1, … , 𝑛 𝑧𝑗, 𝑨
𝑘 ∈ [0,1]
∀𝑗 ∈ 1, … , 𝑜 , 𝑘 ∈ 1, … , 𝑛
373F19 - Nisarg Shah & Karan Singh 27
➢ Find the optimal solution (𝑧∗, 𝑨∗) of the LP ➢ Compute a random IP solution ො
𝑧 such that
𝑧𝑗 = 1 with probability 𝑧𝑗
∗ and 0 with probability 1 − 𝑧𝑗 ∗
𝑧𝑗’s
➢ What is Pr[𝐷
𝑘 is satisfied] if 𝐷 𝑘 has 𝑙 literals?
1 − Π𝑦𝑗∈𝐷𝑘 1 − 𝑧𝑗
∗ ⋅ Π ҧ 𝑦𝑗∈𝐷𝑘 𝑧𝑗 ∗
≥ 1 − Σ𝑦𝑗∈𝐷𝑘 1 − 𝑧𝑗
∗ + Σ ҧ 𝑦𝑗∈𝐷𝑘 𝑧𝑗 ∗
𝑙
𝑙
≥ 1 − 𝑙 − 𝑨
𝑘 ∗
𝑙
𝑙
AM-GM inequality LP constraint
373F19 - Nisarg Shah & Karan Singh 28
➢ 1 − 1 −
𝑨 𝑙 𝑙
≥ 1 − 1 −
1 𝑙 𝑙
⋅ 𝑨 for all 𝑨 ∈ [0,1] and 𝑙 ∈ ℕ
Pr 𝐷
𝑘 is satisfied ≥ 1 − 𝑙−𝑨𝑘
∗
𝑙 𝑙
≥ 1 − 1 − 1
𝑙 𝑙
⋅ 𝑨𝑘
∗ ≥ 1 − 1 𝑓 ⋅ 𝑨𝑘 ∗
𝔽[#weight of clauses satisfied] ≥ 1 −
1 e σ𝑘 𝑥 𝑘 ⋅ 𝑨 𝑘 ∗ ≥ 1 − 1 𝑓 ⋅ 𝑃𝑄𝑈
Standard inequality
373F19 - Nisarg Shah & Karan Singh 29
➢ 1 − 1 −
𝑨 𝑙 𝑙
≥ 1 − 1 −
1 𝑙 𝑙
⋅ 𝑨 for all 𝑨 ∈ [0,1] and 𝑙 ∈ ℕ
➢ True at 𝑨 = 0 and 𝑨 = 1 (same quantity on both sides) ➢ For 0 ≤ 𝑨 ≤ 1:
373F19 - Nisarg Shah & Karan Singh 30
➢ Running both “LP + randomized rounding” and “naïve
randomized algorithm”, and returning the best of the two solutions gives Τ
3 4 = 0.75 approximation!
➢ This algorithm can be derandomized. ➢ Recall:
TRUE/FALSE with probability 0.5 each, which only gives Τ
1 2 = 0.5
approximation by itself
373F19 - Nisarg Shah & Karan Singh 31
➢ “Given a 2-CNF formula, check whether all clauses can be
satisfied simultaneously.”
➢ Eliminate all unit clauses, set the corresponding literals. ➢ Create a graph with 2𝑜 literals as vertices. ➢ For every clause (𝑦 ∨ 𝑧), add two edges: ҧ
𝑦 → 𝑧 and ത 𝑧 → 𝑦.
➢ Formula is satisfiable iff no path from 𝑦 to ҧ
𝑦 or ҧ 𝑦 to 𝑦 for any 𝑦
➢ Solve 𝑡 − 𝑢 connectivity problem in polynomial time
373F19 - Nisarg Shah & Karan Singh 32
➢ Start with an arbitrary assignment. ➢ While there is an unsatisfied clause 𝐷 = (𝑦 ∨ 𝑧)
373F19 - Nisarg Shah & Karan Singh 33
➢ If there is a satisfying assignment 𝜐∗, then the expected
time to reach some satisfying assignment is at most 𝑃 𝑜2 .
➢ Fix 𝜐∗. Let 𝜐0 be the starting assignment. Let 𝜐𝑗 be the
assignment after 𝑗 iterations.
➢ Consider the “hamming distance” 𝑒𝑗 between 𝜐𝑗 and 𝜐∗
➢ To show: in expectation, we will hit 𝑒𝑗 = 0 in 𝑃 𝑜2
iterations, unless the algorithm stops before that.
373F19 - Nisarg Shah & Karan Singh 34
➢ Because we change one variable in each iteration.
➢ Iteration 𝑗 considers an unsatisfied clause 𝐷 = (𝑦 ∨ 𝑧) ➢ 𝜐∗ satisfies at least one of 𝑦 or 𝑧, while 𝜐𝑗 satisfies neither ➢ Because we pick a literal randomly, w.p. at least ½ we
➢ Q: Why did we need an unsatisfied clause? What if we
pick one of 𝑜 variables randomly, and flip it?
373F19 - Nisarg Shah & Karan Singh 35
➢ We want the distance to decrease with probability at
least
1 2 no matter how close or far we are from 𝜐∗.
➢ If we are already close, choosing a variable at random will
likely choose one where 𝜐 and 𝜐∗ already match.
➢ Flipping this variable will increase the distance with high
probability.
➢ An unsatisfied clause narrows it down to two variables
s.t. 𝜐 and 𝜐∗ differ on at least one of them
373F19 - Nisarg Shah & Karan Singh 36
1 2 3 4 𝑜 5
≥ 1 2 ≤ 1 2 ≥ 1 2 ≤ 1 2
373F19 - Nisarg Shah & Karan Singh 37
➢ Can view this as Markov chain and use hitting time results ➢ But let’s prove it with elementary methods.
1 2 3 4 𝑜 5
≥ 1 2 ≤ 1 2 ≥ 1 2 ≤ 1 2
373F19 - Nisarg Shah & Karan Singh 38
➢ 𝑈𝑙,ℓ = expected number of iterations it takes to hit distance ℓ
for the first time when you start at distance 𝑙
1 2 ∗ 1 + 1 2 ∗ 1 + 𝑈𝑗+2,𝑗
1 2 ∗ (1) + 1 2 ∗ 1 + 𝑈𝑗+2,𝑗+1 + 𝑈𝑗+1,𝑗
➢ 𝑈
𝑗+1,𝑗 ≤ 2 + 𝑈 𝑗+2,𝑗+1 ≤ 4 + 𝑈 𝑗+3,𝑗+2 ≤ ⋯ ≤ 𝑃 𝑜 + 𝑈 𝑜,𝑜−1 ≤ 𝑃 𝑜
𝑜,𝑜−1 = 1 (Why?)
373F19 - Nisarg Shah & Karan Singh 39
➢ We are searching the local neighborhood ➢ But we don’t ensure that we necessarily improve. ➢ We just ensure that in expectation, we aren’t hurt. ➢ Hope to reach a feasible solution in polynomial time
➢ Schöning’s algorithm no longer runs in polynomial time,
➢ It still improves upon the naïve 2𝑜 ➢ Later derandomized by Moser and Scheder [2011]
373F19 - Nisarg Shah & Karan Singh 40
➢ Choose a random assignment 𝜐. ➢ Repeat 3𝑜 times (𝑜 = #variables)
in the clause.
373F19 - Nisarg Shah & Karan Singh 41
➢ If the CNF is satisfiable, it finds an assignment with
probability at least
1 2 ⋅ 𝑙 𝑙−1 𝑜
➢ If the CNF is unsatisfiable, it surely does not find an
assignment.
1 𝑙 𝑜
➢ For 𝑙 = 3, this gives 𝑃(1.3333𝑜) ➢ For 𝑙 = 4, this gives 𝑃 1.5𝑜
373F19 - Nisarg Shah & Karan Singh 42
➢ Derandomized Schöning’s algorithm: 𝑃(1.3333𝑜) ➢ Best known: 𝑃(1.3303𝑜) [HSSW]
➢ Nothing better known without one-sided error ➢ With one-sided error, best known is 𝑃 1.30704𝑜
[Modified PPSZ]
373F19 - Nisarg Shah & Karan Singh 43
➢ WalkSAT is a practical SAT algorithm ➢ At each iteration, pick an unsatisfied clause at random ➢ Pick a variable in the unsatisfied clause to flip:
previously satisfied clauses unsatisfied.
➢ Restart a few times (avoids being stuck in local minima)
➢ Flip the variable that satisfies most clauses
373F19 - Nisarg Shah & Karan Singh 44
➢ Let 𝐻 be a connected undirected graph. Then a random
walk starting from any vertex will cover the entire graph (visit each vertex at least once) in 𝑃(𝑛𝑜) steps.
➢ In the limit, the random walk with spend
𝑒𝑗 2𝑛 fraction of
the time on vertex with degree 𝑒𝑗
➢ Generalize to directed (possibly infinite) graphs with
unequal edge probabilities
373F19 - Nisarg Shah & Karan Singh 45
373F19 - Nisarg Shah & Karan Singh 46
➢ 𝑝(𝑜) examples: log 𝑜 ,
𝑜, 𝑜0.999,
𝑜 log 𝑜 , …
➢ The algorithm doesn’t even get to read the full input! ➢ There are four possibilities:
correct/optimal solution or only does so with high probability (or gives some approximation)
always takes 𝑝(𝑜) time or only does so in expectation (but still on every instance)
373F19 - Nisarg Shah & Karan Singh 47
373F19 - Nisarg Shah & Karan Singh 48
➢ Imagine you have an array 𝐵 with 𝑃(1) access to 𝐵[𝑗] ➢ 𝐵[𝑗] is a tuple (𝑦𝑗, 𝑞𝑗, 𝑜𝑗)
➢ Sorted: 𝑦𝑞𝑗 ≤ 𝑦𝑗 ≤ 𝑦𝑜𝑗
373F19 - Nisarg Shah & Karan Singh 49
➢ Often we deal with large datasets that are stored in a
large file on disk, or possibly broken into multiple files
➢ Creating a new, sorted version of the dataset is expensive ➢ It is often preferred to “implicitly sort” the data by simply
adding previous-next pointers along with each element
➢ Would like algorithms that can operate on such implicitly
373F19 - Nisarg Shah & Karan Singh 50
➢ Select 𝑜 random indices 𝑆 ➢ Access 𝑦𝑘 for each 𝑘 ∈ 𝑆 ➢ Find “accessed 𝑦𝑘 nearest to 𝑦 in either direction”
➢ If you take the largest 𝑦𝑘 ≤ 𝑦, start from there and keep
going “next” until you find 𝑦 or go past its value
➢ If you take the smallest 𝑦𝑘 ≥ 𝑦, start from there and keep
going “previous” until you find 𝑦 or go past its value
373F19 - Nisarg Shah & Karan Singh 51
➢ Suppose you find the largest 𝑦𝑘 ≤ 𝑦 and keep going
“next”
➢ Let 𝑦𝑗 be smallest value ≥ 𝑦 ➢ Algorithm stops when it hits 𝑦𝑗 ➢ Algorithm throws 𝑜 random “darts” on the sorted list ➢ Chernoff bound:
𝑜
➢ Hence, the algorithm only does “next” 𝑃
𝑜 times in expectation
373F19 - Nisarg Shah & Karan Singh 52
➢ We don’t really require the list to be doubly linked. Just
“next” pointer suffices if we have a pointer to the first element of the list (a.k.a. “anchored list”).
➢ Can be proved using Yao’s minimax principle ➢ Beyond the scope of the course, but this is a fundamental
result with wide-ranging applications
373F19 - Nisarg Shah & Karan Singh 53
➢ Their main focus was to generalize these ideas to come
up with sublinear algorithms for geometric problems
➢ Polygon intersection: Given two convex polyhedra, check
if they intersect.
➢ Point location: Given a Delaunay triangulation (or Voronoi
➢ They provided optimal 𝑃
𝑜 algorithms for both these problems.
373F19 - Nisarg Shah & Karan Singh 54
373F19 - Nisarg Shah & Karan Singh 55
➢ 𝜗 is constant ⇒ sublinear in input size 𝑜
373F19 - Nisarg Shah & Karan Singh 56
➢ Isn’t this equivalent to “given an array of 𝑜 numbers
between 1 and 𝑜 − 1, estimate their average”?
➢ No! That requires Ω(𝑜) time for constant approximation!
1’s: you may not discover any 𝑜 − 1 until you query Ω(𝑜) numbers
➢ Why are degree sequences more special?
sum is even and σ𝑗=1
𝑙
𝑒𝑗 ≤ 𝑙 𝑙 − 1 + σ𝑗=𝑙+1
𝑜
𝑒𝑗.
𝑜 vertices
find their neighbors, and thus account for their edges anyway!
373F19 - Nisarg Shah & Karan Singh 57
➢ Take Τ
8 𝜗 random subsets 𝑇𝑗 ⊆ 𝑊 with 𝑇𝑗 = 𝑡
➢ Compute the average degree 𝑒𝑇𝑗 in each 𝑇𝑗. ➢ Return
𝑒 = min𝑗 𝑒𝑇𝑗
➢ But doesn’t use anything other than Hoeffding’s
inequality, Markov’s inequality, linearity of expectation, and union bound