[PPT] - On Nonparametric Estimation of the Fisher Information Wei Cao 1 , PowerPoint Presentation

SLIDE 1

On Nonparametric Estimation of the Fisher Information

Wei Cao1, Alex Dytso2, Michael Fauß2, H. Vincent Poor2, and Gang Feng1

1University of Electronic Science and Technology of China 2Princeton University

IEEE International Symposium on Information Theory (ISIT) June 2020

SLIDE 2

Presentation Outline

1

Introduction

2

The Bhatuacharya Estimator

3

A Clipped Estimator

4

The Gaussian Noise Case

5

Conclusion

1 / 19

SLIDE 3

Introduction

Fisher information for location of a pdf f: I(f) =

t∈R

(f′(t))2 f(t) dt, (1) where f′ is the derivative of f.

▶ An important quantity providing fundamental performance bounds ▶ In practice: no closed-form solutions, distributions rarely known exactly ▶ In Gaussian noise: Fisher information of the received signal allows for optimal power allocation at the transmituer

Problem: Estimating I(f) based on n random samples Y1, . . . , Yn independently drawn from f.

2 / 19

SLIDE 4

Introduction

Available estimators

▶ Bhatuacharya estimator1: kernel based, straightforward and easy to implement ▶ Donoho estimator2: lower bound FI over neighborhood of empirical CDF

Asymptotic results with inexplicit constants Main contributions

▶ Explicit and tighter non-asymptotic results for the Bhatuacharya estimator ▶ A new estimator with betuer bounds on convergence rate ▶ Evaluation for the case of a r.v. contaminated by Gaussian noise, and a consistent estimator for the MMSE

1PK Bhatuacharya. “Estimation of a probability density function and its derivatives”. In: Sankhyā: The Indian Journal of

Statistics, Series A 29.4 (1967), pp. 373–382.

2David L Donoho. “One-sided inference about functionals of a density”. In: The Annals of Statistics 16.4 (1988),

pp. 1390–1420.

3 / 19

SLIDE 5

Presentation Outline

1

Introduction

2

The Bhatuacharya Estimator

3

A Clipped Estimator

4

The Gaussian Noise Case

5

Conclusion

4 / 19

SLIDE 6

The Bhattacharya Estimator

Kernel density estimator fn(t) = 1 n

n

i=1

1 aK t − Yi a

,

(2) where a > 0 is the bandwidth parameter. The kernel K(·) is assumed to be a continuously difgerential pdf. The Bhatuacharya estimator In(fn) =

|t|≤kn

(f′

n(t))2

fn(t) dt, (3) for some kn ≥ 0.

5 / 19

SLIDE 7

The Bhattacharya Estimator

Estimating density and its derivatives

Theorem 1

Let r ∈ {0, 1} , vr =

K(r+1)(t)
dt, and δr,a = supt∈R
E
f (r)

n (t)

− f (r)(t)
. Then, for any ϵ > δr,a and

any n ≥ 1 the following bound holds: P

sup

t∈R

f (r)

n (t) − f (r)(t)

> ϵ
≤ 2e

−2n a2r+2(ϵ−δr,a)2

v2 r

. (4)

▶ Based on the proof by Schuster3 ▶ Using the best possible constant for the DKW inequality4

3Eugene F Schuster. “Estimation of a probability density function and its derivatives”. In: The Annals of Mathematical

Statistics 40.4 (1969), pp. 1187–1195.

4Pascal Massart. “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality”. In: The Annals of Probability (1990),

pp. 1269–1283.

6 / 19

SLIDE 8

The Bhattacharya Estimator

Analysis of the Bhatuacharya Estimator

Theorem 2

Assume there exists a function ϕ such that sup|t|≤x

1 f(t) ≤ ϕ(x), for all x. Then, provided that

sup|t|≤kn

f (i)

n (t) − f (i)(t)

≤ ϵi, i ∈ {0, 1}, and ϵ0ϕ(kn) < 1, the following bound holds:

|I(f) − In(fn)| ≤ 4ϵ1knρmax(kn) + 2ϵ2

1knϕ(kn) + ϵ0ϕ(kn)I(f)

1 − ϵ0ϕ(kn) + c(kn), (5) where ρmax(kn) = sup|t|≤kn

f′(t)

f(t)

and c(kn) =
|t|≥kn

(f′(t))2 f(t)

dt.

▶ A non-asymptotic refinement of result in [1, Theorem 3], which contains ϵ0ϕ4(kn) ▶ ϕ(kn) increases with kn (usually very fast, e.g., ϕ(kn) increases with kn exponentially for a r.v. contaminated by Gaussian noise), preventing the estimators from being practical

7 / 19

SLIDE 9

Presentation Outline

1

Introduction

2

The Bhatuacharya Estimator

3

A Clipped Estimator

4

The Gaussian Noise Case

5

Conclusion

8 / 19

SLIDE 10

A Clipped Estimator

Assume there exists a function ρ such that |ρ(t)| ≤ |ρ(t)|, (6) for all t ∈ R and let ρn(t) = f′

n(t)

fn(t). (7) The clipped estimator: Ic

n(fn) =

kn

−kn

min {|ρn(t)| , |ρ(t)|} |f′

n(t)| dt.

(8)

▶ We can set ρ(kn) = ρmax(kn), where ρmax(kn) = sup|t|≤kn

f ′(t)

f(t)

9 / 19

SLIDE 11

A Clipped Estimator

Analysis of the clipped estimator

Theorem 3

Under the assumptions sup|t|≤kn

f (i)

n (t) − f (i)(t)

≤ ϵi, i ∈ {0, 1}, it holds that

|I(f) − Ic

n(fn)| ≤ 4ϵ1Φ1 max(kn) + 2ϵ0Φ2 max(kn) + c(kn),

(9) where c(kn) =

|t|≥kn

(f ′(t))2 f(t) dt (10) Φm

max(x) =

x

−x

|ρm(t)| dt. (11)

The proof is based on two auxiliary estimators: to under- and overestimate Ic

n 10 / 19

SLIDE 12

Presentation Outline

1

Introduction

2

The Bhatuacharya Estimator

3

A Clipped Estimator

4

The Gaussian Noise Case

5

Conclusion

11 / 19

SLIDE 13

Estimation of the FI of a R.V. Contaminated by Gaussian Noise

Let fY denote the pdf of a random variable Y = √snrX + Z, (12) where:

▶ Only a very mild assumption that X has a finite second moment but otherwise is an arbitrary random variable ▶ Z is a standard Gaussian random variable ▶ X and Z are independent

Estimating the Fisher information of fY . Gaussian kernel Lemma 1 evaluates the quantities appearing in Th.2 and Th.3

12 / 19

SLIDE 14

Estimation of the FI of a R.V. Contaminated by Gaussian Noise

Convergence of the Bhatuacharya estimator

Theorem 4

If a = n−w, where w ∈

0, 1

6

, and kn =
u log(n), where u ∈ (0, w), then

P [|In(fn) − I(fY )| ≥ εn] ≤ 2e−c1n1−4w + 2e−c2n1−6w, (13) where εn ≤ n−w u log(n)

c3 + 12
u log(n) + 2c5nu−w

1 − nu−w + c4

u log(n)

+ c5 nw−u − 1. (14)

▶ In(fn) converges to I(fY ) with probability 1. ▶ u and w: a trade-ofg between the convergence rate and the precision.

13 / 19

SLIDE 15

Estimation of the FI of a R.V. Contaminated by Gaussian Noise

Convergence of the clipped estimator

Theorem 5

If a = n−w, where w ∈

0, 1

6

, and kn = nu, where u ∈ (0, w

3 ), then

P [|Ic

n(fn) − I(fY )| ≥ εn] ≤ 2e−c1n1−4w + 2e−c2n1−6w,

(15) where εn ≤ 12nu−w c3 + 2nu + n2u + c4n−u. (16)

▶ Improved precision: decaying polynomially in n instead of logarithmically

14 / 19

SLIDE 16

Estimation of the MMSE

Brown’s identity I(fY ) = 1 − snr mmse(X|Y ), (17) where mmse(X|Y ) = E

(X − E[X|Y ])2

. (18) An estimator for the MMSE mmsen(X|Y ) = 1 − Ic

n(fn)

snr . (19)

Proposition 1

If a = n−w, where w ∈

0, 1

6

, and kn = nu, where u ∈ (0, w

3 ), then

P [|mmsen(X|Y ) − mmse(X|Y )| ≥ snr εn] ≤ 2e−c1n1−4w + 2e−c2n1−6w. (20)

15 / 19

SLIDE 17

Examples

2 4 6 8 10 0.2 0.4 0.6 0.8 1 snr I(fY ) I(fY ) ˆ I, a0 = a1 = 0.6 ˆ I, a0 = a1 = 0.3 ˆ Ic, a0 = a1 = 0.6 ˆ Ic, a0 = a1 = 0.3

(a)

2 4 6 8 10 0.4 0.6 0.8 1 snr I(fY ) I(fY ) ˆ I, a0 = a1 = 0.3 ˆ I, a0 = 0.3, a1 = 0.15 ˆ Ic, a0 = a1 = 0.3 ˆ Ic, a0 = 0.3, a1 = 0.15

(b) Figure 1: Fisher information and its estimates when n = 104 and kn = 10 with: a) Gaussian input; and b) binary input. 16 / 19

SLIDE 18

Examples

0.2 0.4 0.6 0.8 15 20 25 30 εn log10(n) Bhatuacharya estimator Clipped estimator

(a)

0.2 0.4 0.6 0.8 14 15 16 17 18 19 20 21 Perr log10(n) Bhatuacharya estimator Clipped estimator

(b) Figure 2: Sample complexity with Gaussian input versus: a) fixed Perr = 0.2 and varying εn; and b) fixed εn = 0.5 and varying Perr. 17 / 19

SLIDE 19

Presentation Outline

1

Introduction

2

The Bhatuacharya Estimator

3

A Clipped Estimator

4

The Gaussian Noise Case

5

Conclusion

18 / 19

SLIDE 20

Conclusion

Estimation of the Fisher information of a random variable

▶ Bhatuacharya estimator: new sharper convergence results ▶ A clipped estimator: betuer bounds on convergence rates ▶ The case of a Gaussian noise contaminated random variable:

Specialization of the results of both estimators A consistent estimator for the MMSE

Interesting future directions

▶ To study the Gaussian noise case with further assumptions ▶ Applications in power allocation problems5

5Wei Cao, Alex Dytso, Michael Fauß, Gang Feng, and H. Vincent Poor. “Robust Power Allocation for Parallel Gaussian

Channels with Approximately Gaussian Input Distributions”. In: IEEE Transactions on Wireless Communications (Early Access) (2020). 19 / 19

SLIDE 21

A full version can be found at:

W. Cao, A. Dytso, M. Fauß, H. V. Poor, and G. Feng,

“Nonparametric estimation of the Fisher information and its applications.” Available: https://arxiv.org/pdf/2005.03622.pdf Email: clarissa.cao@hotmail.com, {adytso, mfauss, poor}@princeton.edu, fenggang@uestc.edu.cn