[PPT] - Randomized Algorithms Lecture 3: Occupancy, Moments and deviations, PowerPoint Presentation

SLIDE 1

Randomized Algorithms Lecture 3: “Occupancy, Moments and deviations, Randomized selection ”

Sotiris Nikoletseas Associate Professor

CEID - ETY Course 2013 - 2014

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 1 / 34

SLIDE 2

1. Some basic inequalities (I)

(i) ( 1 + 1

n

)n ≤ e Proof: It is: ∀x ≥ 0: 1 + x ≤ ex. For x = 1

n, we get

( 1 + 1

n

)n ≤ ( e

1 n

)n = e (ii) ( 1 − 1

n

)n−1 ≥ 1

e

Proof: It suffices that (n−1

n

)n−1 ≥ 1

e ⇔

(

n n−1

)n−1 ≤ e But

n n−1 = 1 + 1 n−1, so it suffices that

( 1 +

1 n−1

)n−1 ≤ e which is true by (i).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 2 / 34

SLIDE 3

1. Some basic inequalities (II)

(iii) n! ≥ ( n

e

)n Proof: It is obviously nn n! ≤

∞

∑

i=0

ni i! But

∞

∑

i=0

ni i! = en from Taylor’s expansion of f(x) = ex. (iv) For any k ≤ n: ( n

k

)k ≤ (n

k

) ≤ ( ne

k

)k Proof: Indeed, k ≤ n ⇒ n

k ≤ n−1 k−1

Inductively k ≤ n ⇒ n

k ≤ n−i k−i, (1 ≤ i ≤ k − 1)

Thus (n

k

)k ≤ n

k · n−1 k−1 · · · n−(k−1) k−(k−1) = nk k! =

(n

k

) For the right inequality we obviously have (n

k

) ≤ nk

k!

and by (iii) it is k! ≥ ( k

e

)k

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 3 / 34

SLIDE 4

2. Preliminaries

(i) Boole’s inequality (or union bound) Let random events E1, E2, . . . , En. Then Pr { n ∪

i=1

Ei } = Pr{E1 ∪ E2 ∪ · · · ∪ En} ≤

n

∑

i=1

Pr{Ei} Note: If the events are disjoint, then we get equality.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 4 / 34

SLIDE 5

2. Preliminaries

(ii) Expectation (or Mean) Let X a random variable with probability density function (pdf) f(x). Its expectation is: µx = E[X] = ∑

x

x · Pr{X = x} If X is continuous, µx =

∞

∫

−∞

xf(x) dx

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 5 / 34

SLIDE 6

2. Preliminaries

(ii) Expectation (or Mean) Properties: ∀Xi (i = 1, 2, . . . , n) : E [ n ∑

i=1

Xi ] =

n

∑

i=1

E[Xi] This important property is called “linearity of expectation”. E[cX] = cE[X], where c constant if X, Y stochastically independent, then E[X · Y ] = E[X] · E[Y ] Let f(X) a real-valued function of X. Then E[f(x)] = ∑

x

f(x)Pr{X = x}

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 6 / 34

SLIDE 7

2. Preliminaries

(iii) Markov’s inequality Theorem: Let X a non-negative random variable. Then, ∀t > 0 Pr{X ≥ t} ≤ E[X]

t

Proof: E[X] = ∑

x

xPr{X = x} ≥ ∑

x≥t

xPr{X = x} ≥ ∑

x≥t

tPr{X = x} = t ∑

x≥t

Pr{X = x} = t Pr{X ≥ t} Note: Markov is a (rather weak) concentration inequality, e.g. Pr{X ≥ 2E[X]} ≤ 1

2

Pr{X ≥ 3E[X]} ≤ 1

3

etc

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 7 / 34

SLIDE 8

2. Preliminaries

(iv) Variance (or second moment) Definition: V ar(X) = E[(X − µ)2], where µ = E[X] i.e. it measures (statistically) deviations from mean. Properties:

V ar(X) = E[X2] − E2[X] V ar(cX) = c2V ar(X), where c constant. if X, Y independent, it is V ar(X + Y ) = V ar(X) + V ar(Y )

Note: We call σ = √ V ar(X) the standard deviation of X.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 8 / 34

SLIDE 9

2. Preliminaries

(v) Chebyshev’s inequality Theorem: Let X a r.v. with mean µ = E[X]. It is: Pr{|X − µ| ≥ t} ≤ V ar(X)

t2

∀t > 0 Proof: Pr{|X − µ| ≥ t} = Pr{(X − µ)2 ≥ t2} From Markov’s inequality: Pr{(X − µ)2 ≥ t2} ≤ E[(X−µ)2]

t2

= V ar(X)

t2

Note: Chebyshev’s inequality provides stronger (than Markov’s) concentration bounds, e.g. Pr{|X − µ| ≥ 2σ} ≤ 1

4

Pr{|X − µ| ≥ 3σ} ≤ 1

9

etc

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 9 / 34

SLIDE 10

3. Occupancy - importance
ccupancy procedures are actually stochastic processes (i.e,

random processes in time). Particularly, the occupancy process consists in placing randomly balls into bins, one at a time.

ccupancy problems/processes have fundamental

importance for the analysis of randomized algorithms, such as for data structures (e.g. hash tables), routing etc.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 10 / 34

SLIDE 11

3. Occupancy - definition and basic questions

general occupancy process: we uniformly randomly and independently put, one at a time, m distinct objects (“balls”) each one into one of n distinct classes (“bins”). basic questions:

what is the maximum number of balls in any bin? how many balls are needed so as no bin remains empty, with high probability? what is the number of empty bins? what is the number of bins with k balls in them?

Note: in the next lecture we will study the coupon collector’s problem, a variant of occupancy.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 11 / 34

SLIDE 12

3. Occupancy - the case m = n

Let us randomly place m = n balls into n bins. Question: What is the maximum number of balls in any bin? Remark: Let us first estimate the expected number of balls in any bin. For any bin i (1 ≤ i ≤ n) let Xi = # balls in bin i. Clearly Xi ∼ B(m, 1

n) (binomial)

So E[Xi] = m 1

n = n 1 n = 1

We however expect this “mean” (expected) behaviour to be highly improbable, i.e., some bins get no balls at all some bins get many balls

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 12 / 34

SLIDE 13

3. Occupancy - the case m = n

Theorem 1. With probability at least 1 − 1

n, no bin gets more

than k∗ = 3 ln n

ln ln n balls.

Proof: Let Ej(k) the event “bin j gets k or more balls”. Because

f symmetry, we first focus on a given bin (say bin 1). It is

Pr{bin 1 gets exactly i balls} = (n

i

) ( 1

n

)i ( 1 − 1

n

)n−i since we have a binomial B(n, 1

n). But

(n

i

) ( 1

n

)i ( 1 − 1

n

)n−i ≤ (n

i

) ( 1

n

)i ≤ ( ne

i

)i ( 1

n

)i = (e

i

)i (from basic inequality iv) Thus Pr{E1(k)} ≤

n

∑

i=k

(e i )i ≤ ( e k )k · ( 1 + e k + ( e k )2 + · · · ) = = ( e k )k 1 1 − e

k

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 13 / 34

SLIDE 14

3. Occupancy - the case m = n

Now, let k∗ = ⌈ 3 ln n

ln ln n

⌉ . Then: Pr{E1(k∗)} ≤ ( e

k∗

)k∗

1 1− e

k∗ ≤ 2

(

e

3 ln n ln ln n

)k∗ since it suffices

1 1− e

k∗ ≤ 2 ⇔

k∗ k∗−e ≤ 2 ⇔ k∗ ≤ 2k∗ − 2e ⇔

⇔ k∗ ≥ 2e which is true. But 2 (

e

3 ln n ln ln n

)k∗ = 2 ( e1−ln 3−ln ln n+ln ln ln n)k∗ ≤ 2 ( e− ln ln n+ln ln ln n)k∗ ≤ 2 exp ( −3 ln n + 6 ln n ln ln ln n

ln ln n

) ≤ 2 exp(−3 ln n + 0.5 ln n) = 2 exp(−2.5 ln n) ≤

1 n2

for n large enough.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 14 / 34

SLIDE 15

3. Occupancy - the case m = n

Thus, Pr{any bin gets more than k∗ balls} = Pr   

n

∪

j=1

Ej(k∗)    ≤

n

∑

j=1

Pr{Ej(k∗)} ≤ nPr{E1(k∗)} ≤ n 1 n2 = 1 n (by symmetry)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 15 / 34

SLIDE 16

3. Occupancy - the case m = n log n

We showed that when m = n the mean number of balls in any bin is 1, but the maximum can be as high as k∗ = 3 ln n

ln ln n

The next theorem shows that when m = n log n the maximum number of balls in any bin is more or less the same as the expected number of balls in any bin. Theorem 2. When m = n ln n, then with probability 1 − o(1) every bin has O(log n) balls.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 16 / 34

SLIDE 17

3. Occupancy - the case m = n - An improvement

If at each iteration we randomly pick d bins and throw the ball into the bin with the smallest number of balls, we can do much better than in Theorem 2: Theorem 3. We place m = n balls sequentially in n bins as follows: For each ball, d ≥ 2 bins are chosen uniformly at random (and independently). Each ball is placed in the least full of the d bins (ties broken randomly). When all balls are placed, the maximum load at any bin is at most

ln ln n ln d + O(1), with probability at least 1 − o(1) (in other

words, a more balanced balls distribution is achieved).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 17 / 34

SLIDE 18

3. Occupancy - tightness of Theorem 1

Theorem 1 shows that when m = n then the maximum load in any bin is O ( ln n

ln ln n

) , with high probability. We now show that this result is tight: Lemma 1: There is a k = Ω ( ln n

ln ln n

) such that bin 1 has k balls with probability at least

1 √n.

Proof: Pr[k balls in bin 1] = (n

k

) ( 1

n

)k ( 1 − 1

n

)n−k ≥ ( n

k

)k

1 nk

( 1 − 1

n

)n−k (from basic inequality iv) = ( 1

k

)k ( 1 − 1

n

)n−k ≥ ( 1

k

)k ( 1

2e

) = 1

2e

( 1

k

)k (for n ≥ 2)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 18 / 34

SLIDE 19

3. Occupancy - tightness of Theorem 1

By putting k = c ln n

ln ln n we get

Pr{ c ln n

ln ln n balls in bin 1} ≥ 1 2e

( ln ln n

c ln n

) c ln n

ln ln n ≥

(

1 c ln n

) c ln n

ln ln n

(for n ≥ 4) = (

1 c2ln ln n

) c ln n

ln ln n =

1 c2ln ln n c ln n

ln ln n =

1 c2c ln n = 1 cnc = Ω(n−c)

Setting c = 1

2 we get Pr{ c ln n ln ln n balls in bin 1} ≥ Ω( 1 √n)

Sotiris Nikoletseas, Associate Professor

Randomized Algorithms - Lecture 3 19 / 34

SLIDE 20

3. Occupancy - the case m = n log n

Towards a proof of Theorem 2. We use the following bound. Theorem (Chernoff bound). Let X a r.v.: X = ∑n

i=1 Xi = X1 + · · · + Xn where for all i (1 ≤ i ≤ n) the

Xi’s are independent and Xi = { 1, with probability p 0, with probability 1 − p Let E[X] = np = µ. Then, ∀δ > 0 Pr{X ≥ µ(1 + δ)} ≤ (

eδ (1+δ)(1+δ)

)µ

When placing m = n log n balls into n bins let

Xi = { 1, if ball i lands in bin 1 (prob= 1

n)

0, else and X = ∑m

i=1 Xi = # of balls in bin 1. Then

µ = E[X] = m 1

n = ln n.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 20 / 34

SLIDE 21

3. Occupancy - the case m = n log n

Let us estimate the probability that bin 1 receives more than e.g. 10 ln n balls by the Markov inequality: Pr{X ≥ 10 ln n} ≤

ln n 10 ln n = 1 10 (the bound is not strong)

by the Chebyshev’s inequality: X is actually binomial, i.e. X ∼ B(m, 1

n) thus its variance

is V ar(X) = m ( 1

n

) ( 1 − 1

n

) = m

n − m n2 ≤ m n

Thus Pr{X ≥ m

n + k} ≤ Pr{|X − m n | ≥ k} ≤ V ar(X) k2

≤

m nk2

For m = n ln n ⇒ m

n = ln n and for k = 9 ln n we have

Pr{X ≥ 10 ln n} = Pr{X ≥ ln n+9 ln n} ≤

n ln n n81 ln2 n = 1 81 ln n

(a bound which is better than the one by Markov’s inequality)

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 21 / 34

SLIDE 22

3. Occupancy - the case m = n log n

Let us estimate the probability that bin 1 receives more than e.g. 10 ln n balls by Chernoff bound: Pr{X ≥ 10 ln n} = Pr{X ≥ (1 + 9) ln n} ≤ (

e9 1010

)ln n ≤

1 n10

(much stronger) Thus, Pr{∃ bin with more than 10 ln n balls } ≤ n 1

n10 = n−9

⇒ Pr{all bins have less than 10 ln n balls} ≥ 1 − n−9 A similar bound applies to the “low tail”, i.e. the probability that there exists a bin with less than, say,

1 10 ln n balls tends to zero, as n tends to infinity. Overall,

there is high concentration around the mean value of ln n balls per bin.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 22 / 34

SLIDE 23

3. Occupancy - the case m = n log n

Note: The corresponding bounds (for any bin) by Markov’s inequality and Chebychev’s inequality are trivial: by Markov we get ≤ n

10

by Chebyshev we get ≤

n 81 ln n

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 23 / 34

SLIDE 24

3. Occupancy - all balls in distinct bins

Let the experiment of sequentially putting m balls randomly in n bins. Problem: How large m can be so that the probability of all balls being placed in distinct bins remains high? For 2 ≤ i ≤ m, let Ei= “the ith ball lands in a bin not

ccupied by the first i − 1 balls”. The desired probability is:

Pr{∩m

i=2 Ei} = ∏m i=2 Pr{Ei| ∩i−1 j=2 Ej} =

Pr{E2}Pr{E3|E2}Pr{E4|E2E3} · · · Pr{Em|E2 . . . Em−1} But Pr{Ei| ∩i−1

j=2 Ej} = 1 − i−1 n ≤ e− i−1

n

Pr{∩m

i=2 Ei} ≤ ∏m i=2 e− i−1

n = e− ∑m i=2 i−1 n = e− 1 n

∑m−1

i=1 i =

e− m(m−1)

2n

Thus, when m = ⌈ √ 2n + 1⌉ then this probability is at most

1 e while when m increases the probability decreases rapidly.

Note: This is similar to the classic “birthday paradox” in probability theory.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 24 / 34

SLIDE 25

4. The Randomized Selection Algorithm

The problem: We are given a set S of n distinct elements (e.g. numbers) and we are asked to find the kth smallest. Notation:

rS(t): the rank of element t (e.g. the smallest element has rank 1, the largest n and the kth smallest has rank k). S(i) denotes the ith smallest element of S (clearly, we seek S(k) and rS(S(k)) = k).

Remark: the fastest known deterministic algorithm needs 3n time and is quite complex. Also, any deterministic algorithm requires 2n time (a tight lower bound).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 25 / 34

SLIDE 26

4. The basic idea: random sampling

we will randomly sample a subnet of elements from S, trying to optimize the following trade-off:

the sample should be small enough to be processed (e.g.
rdered) in small time
the sample should be large enough to contain the kth

smallest element, with high probability

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 26 / 34

SLIDE 27

4. The Lazy Select Algorithm

1 Pick randomly uniformly, with replacement, a subset R of

n

3 4 elements from S.

2 Sort R using an optimal deterministic sorting algorithm. 3 Let x = k · n− 1

4 .

l = max{⌊x − √n⌋, 1} and h = min{⌈x + √n ⌉, n

3 4 } .

a = R(l) and b = R(h) By comparing a and b to every element of S, determine rS(a), rS(b).

4 If k ∈ [n

1 4 , n − n 1 4 ], let P = {y ∈ S : a ≤ y ≤ b}.

Check whether S(k) ∈ P and |P| ≤ 4n

3 4 + 2. If not, repeat

steps 1-3 until such a P is found.

5 By sorting P, identify P(k−rS(a)+1) = S(k).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 27 / 34

SLIDE 28

4. Remarks on the Lazy Select Algorithm

In Step 1, sampling is done with replacement to simplify the analysis. Sampling without replacement is marginally faster but more complex to implement. Step 2 takes O(n

3 4 log n) time (which is o(n)).

Step 3 clearly takes 2n time (2n comparisons). Graphically, An example: assume rS(a) = 3 and we want S(7). In the sorted list of P elements, S(7) = P(k−rS(a)+1) = = P(7−3+1) = P5, i.e. the 5th element indeed.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 28 / 34

SLIDE 29

4. Remarks on the Lazy Select Algorithm

In Step 4, it is easy to check (in constant time) whether S(k) ∈ P by comparing k to (the now known) rS(a), rS(b). In Step 5, sorting P takes O(n

3 4 log n) = o(n) time.

Note: we skip in Step 4 the (less interesting) cases where k < n

1 4 and k > n − n 1 4 . Their analysis is similar. Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 29 / 34

SLIDE 30

4. When Lazy Select fails?

The algorithm may fail in Step 4, either because S(k) / ∈ P because |P| is large. We will show that the probability of failure is very small. Lemma 1. The probability that S(k) / ∈ P is O(n− 1

4 ).

Proof: This happens if i)S(k) < a or ii)S(k) > b. i) S(k) < a ⇔ fewer than l (l = k · n− 1

4 − √n) of the samples in

R are less than or equal to S(k). Let: Xi = { 1, the ith random sample is at most S(k) 0,

therwise

Clearly, E(Xi) = Pr{Xi} = k

n and V ar(Xi) = k n(1 − k n)

Let X =

|R|

∑

i=1

Xi = # samples in R that are at most S(k). Then

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 30 / 34

SLIDE 31

4. When Lazy Select fails?

µX = E[X] = |R| · E[Xi] = n

3 4 k

n = kn− 1

4 and

σ2

X = V ar[X] = |R|

∑

i=1

V ar(Xi) = n

3 4 k

n(1 − k n) ≤ n

3 4

4 (since the samples are independent) Thus, Pr{|X − µX| ≥ √n} ≤ σ2

X

n ≤ n

3 4

4n = O(n− 1

4 )

⇒ Pr{X − µX < −√n} ≤ O(n− 1

4 )

⇒ Pr{X < µX − √n} = Pr{X < kn− 1

4 − √n

l

} ≤ O(n− 1

4 ) Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 31 / 34

SLIDE 32

4. When Lazy Select fails?

ii) The case S(k) > b is essentially symmetric (at least h of the random samples should be smaller than S(k)), so Pr{S(k) > b} = O(n− 1

4 )

Overall Pr{S(k) / ∈ P} = Pr{S(k) < a ∪ S(k) > b} = O(n− 1

4 ) + O(n− 1 4 ) = O(n− 1 4 )

Sotiris Nikoletseas, Associate Professor

Randomized Algorithms - Lecture 3 32 / 34

SLIDE 33

4. The Lazy Select Algorithm

Lemma 2 The probability that P contains more than 4n

3 4 + 2

elements is O(n− 1

4 )

Proof: Very similar to the proof of Lemma 1: Let ke = max{1, k − 2n

3 4 } and

kn = min{k + 2n

3 4 , n}

If S(kl) < a or S(kh) > b then P contains more than 4n

3 4 + 2

elements. For simplicity, let kl = k − 2n

3 4 , kh = k + 2n 3 4

Then, it suffices to “simulate” the proof of Lemma 1 for k = kl and then for k = kh.

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 33 / 34

SLIDE 34

4. The Lazy Select Algorithm

Theorem The Algorithm Lazy Select finds the correct solution with probability 1 − O(n− 1

4 ) performing 2n + o(n) comparisons.

Proof: Due to Lemmata 1, 2 the Algorithm finds S(k) on the first pass through steps 1-5 with probability 1 − O(n− 1

4 ) (i.e., it

does not fail in Step 4 avoiding a loop to Step 1). Step 1

bviously takes o(n) time. Step 2 requires O(n

3 4 log n) = o(n)

time, and Step 3 clearly needs 2n comparisons (comparing each

f the n elements of S to a and b). Overall the time needed is

thus 2n + o(n).

Sotiris Nikoletseas, Associate Professor Randomized Algorithms - Lecture 3 34 / 34