Concentration inequalities and tail bounds John Duchi Prof. John - - PowerPoint PPT Presentation

concentration inequalities and tail bounds
SMART_READER_LITE
LIVE PREVIEW

Concentration inequalities and tail bounds John Duchi Prof. John - - PowerPoint PPT Presentation

Concentration inequalities and tail bounds John Duchi Prof. John Duchi Outline I Basics and motivation 1 Law of large numbers 2 Markov inequality 3 Cherno ff bounds II Sub-Gaussian random variables 1 Definitions 2 Examples 3 Hoe ff ding


slide-1
SLIDE 1

Concentration inequalities and tail bounds

John Duchi

  • Prof. John Duchi
slide-2
SLIDE 2

Outline

I Basics and motivation

1 Law of large numbers 2 Markov inequality 3 Chernoff bounds

II Sub-Gaussian random variables

1 Definitions 2 Examples 3 Hoeffding inequalities

III Sub-exponential random variables

1 Definitions 2 Examples 3 Chernoff/Bernstein bounds

  • Prof. John Duchi
slide-3
SLIDE 3

Motivation

I Often in this class, goal is to argue that sequence of random

(vectors) X1, X2, . . . satisfies 1 n

n

X

i=1

Xi

p

! E[X].

I Law of large numbers: if E[kXk] < 1, then

P lim

n→∞

1 n

n

X

i=1

Xi 6= E[X] ! = 0.

  • Prof. John Duchi
slide-4
SLIDE 4

Markov inequalities

Theorem (Markov’s inequality)

Let X be a non-negative random variable. Then P(X t)  E[X] t .

  • Prof. John Duchi
slide-5
SLIDE 5

Chebyshev inequalities

Theorem (Chebyshev’s inequality)

Let X be a real-valued random variable with E[X2] < 1. Then P(X E[X] t)  E[(X E[X])2] t2 = Var(X) t2 . Example: i.i.d. sampling

  • Prof. John Duchi
slide-6
SLIDE 6

Chernoff bounds

Moment generating function: for random variable X, the MGF is MX(λ) := E[eX] Example: Normally distributed random variables

  • Prof. John Duchi
slide-7
SLIDE 7

Chernoff bounds

Theorem (Chernoff bound)

For any random variable and t 0, P(X E[X] t)  inf

≥0 MX−E[X](λ)e−t = inf ≥0 E[e(X−E[X])]e−t.

  • Prof. John Duchi
slide-8
SLIDE 8

Sub-Gaussian random variables

Definition (Sub-Gaussianity)

A mean-zero random variable X is σ2-sub-Gaussian if E h eXi  exp ✓λ2σ2 2 ◆ for all λ 2 R Example: X ⇠ N(0, σ2)

  • Prof. John Duchi
slide-9
SLIDE 9

Properties of sub-Gaussians

Proposition (sums of sub-Gaussians)

Let Xi be independent, mean-zero σ2

i -sub-Gaussian. Then

Pn

i=1 Xi is Pn i=1 σ2 i -sub-Gaussian.

  • Prof. John Duchi
slide-10
SLIDE 10

Concentration inequalities

Theorem

Let X be σ2-sub-Gaussian. Then for t 0, P(X E[X] t)  exp ✓ t2 2σ2 ◆ P(X E[X]  t)  exp ✓ t2 2σ2 ◆

  • Prof. John Duchi
slide-11
SLIDE 11

Concentration: convergence of an independent sum

Corollary

Let Xi be independent σ2

i -sub-Gaussian. Then for t 0,

P 1 n

n

X

i=1

Xi t !  exp

  • nt2

2 1

n

Pn

i=1 σ2 i

!

  • Prof. John Duchi
slide-12
SLIDE 12

Example: bounded random variables

Proposition

Let X 2 [a, b], with E[X] = 0. Then E[eX]  e

λ2(b−a)2 8

.

  • Prof. John Duchi
slide-13
SLIDE 13

Maxima of sub-Gaussian random variables (in probability)

E  max

j≤n Xj

p 2σ2 log n

  • Prof. John Duchi
slide-14
SLIDE 14

Maxima of sub-Gaussian random variables (in expectation)

P ✓ max

j≤n Xj

p 2σ2(log n + t) ◆  e−t.

  • Prof. John Duchi
slide-15
SLIDE 15

Hoeffding’s inequality

If Xi are bounded in [ai, bi] then for t 0, P 1 n

n

X

i=1

(Xi E[Xi]) t !  exp

  • 2nt2

1 n

Pn

i=1(bi ai)2

! P 1 n

n

X

i=1

(Xi E[Xi])  t !  exp

  • 2nt2

1 n

Pn

i=1(bi ai)2

! .

  • Prof. John Duchi
slide-16
SLIDE 16

Equivalent definitions of sub-Gaussianity

Theorem

The following are equivalent (up to constants) i E[exp(X2/σ2)]  e ii E[|X|k]1/k  σ p k iii P(|X| t)  exp( t2

22 )

If in addition X is mean-zero, then this is also equivalent to i–iii above iv X is σ2-sub-Gaussian

  • Prof. John Duchi
slide-17
SLIDE 17

Sub-exponential random variables

Definition (Sub-exponential)

A mean-zero random variable X is (τ 2, b)-sub-Exponential if E [exp (λX)]  exp ✓λ2τ 2 2 ◆ for |λ|  1 b. Example: Exponential RV, density p(x) = βe−x for x 0

  • Prof. John Duchi
slide-18
SLIDE 18

Sub-exponential random variables

Example: χ2-random variable. Let Z ⇠ N(0, σ2) and X = Z2. Then E[eX] = 1 [1 2λσ2]

1 2

+

.

  • Prof. John Duchi
slide-19
SLIDE 19

Concentration of sub-exponentials

Theorem

Let X be (τ 2, b)-sub-exponential. Then P(X E[X]+t)  ( e− t2

2τ2

if 0  t  ⌧ 2

b

e− t

2b

if t ⌧ 2

b

= max ⇢ e− t2

2τ2 , e− t 2b

  • .
  • Prof. John Duchi
slide-20
SLIDE 20

Sums of sub-exponential random variables

Let Xi be independent (τ 2

i , bi)-sub-exponential random variables.

Then Pn

i=1 Xi is (Pn i=1 τ 2 i , b∗)-sub-exponential, where

b∗ = maxi bi Corollary: If Xi satisfy above, then P

  • 1

n

n

X

i=1

Xi E[Xi]

  • t

!  2 exp min ( nt2 2 1

n

Pn

i=1 τ 2 i

, nt 2b∗ )! .

  • Prof. John Duchi
slide-21
SLIDE 21

Bernstein conditions and sub-exponentials

Suppose X is mean-zero with |E[Xk]|  1 2k! σ2bk−2 Then E[eX]  exp ✓ λ2σ2 2(1 b|λ|) ◆

  • Prof. John Duchi
slide-22
SLIDE 22

Johnson-Lindenstrauss and high-dimensional embedding

Question: Let u1, . . . , um 2 Rd be arbitrary. Can we find a mapping F : Rd ! Rn, n ⌧ d, such that (1 δ)

  • ui uj

2

2 

  • F(ui) F(uj)
  • 2

2  (1 + δ)

  • ui uj

2

2

Theorem (Johnson-Lindenstrauss embedding)

For n & 1

✏2 log m such a mapping exists.

  • Prof. John Duchi
slide-23
SLIDE 23

Proof of Johnson-Lindenstrauss continued

P

  • kXuk2

2

n kuk2

2

1

  • t

!  2 exp ✓ nt2 8 ◆ for t 2 [0, 1].

  • Prof. John Duchi
slide-24
SLIDE 24

Reading and bibliography

  • 1. S. Boucheron, O. Bousquet, and G. Lugosi. Concentration

inequalities. In O. Bousquet, U. Luxburg, and G. Ratsch, editors, Advanced Lectures in Machine Learning, pages 208–240. Springer, 2004

  • 2. V. Buldygin and Y. Kozachenko. Metric Characterization of

Random Variables and Random Processes, volume 188 of Translations of Mathematical Monographs. American Mathematical Society, 2000

  • 3. M. Ledoux. The Concentration of Measure Phenomenon.

American Mathematical Society, 2001

  • 4. S. Boucheron, G. Lugosi, and P. Massart. Concentration

Inequalities: a Nonasymptotic Theory of Independence. Oxford University Press, 2013

  • Prof. John Duchi