[PPT] - A general procedure to combine estimators Fr ed eric Lavancier and PowerPoint Presentation

SLIDE 1

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

A general procedure to combine estimators

Fr´ ed´ eric Lavancier and Paul Rochet

Laboratoire de Math´ ematiques Jean Leray University of Nantes

SLIDE 2

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 3

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

The problem

Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate. Natural approach : choose a suitable combination ˆ θλ =

k

j=1

λjTj = λ⊤T, λ ∈ Λ ⊆ Rk. where T = (T1, ..., Tk)⊤. This amounts to find ˆ λ. Standard settings : Selection: Λ = {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)} Convex: Λ = {λ ∈ Rk : λj ≥ 0,

j λj = 1}

Affine: Λ = {λ ∈ Rk :

j λj = 1}

Linear: Λ = Rk

SLIDE 4

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

The problem

Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate. Natural approach : choose a suitable combination ˆ θλ =

k

j=1

λjTj = λ⊤T, λ ∈ Λ ⊆ Rk. where T = (T1, ..., Tk)⊤. This amounts to find ˆ λ. Standard settings : Selection: Λ = {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)} Convex: Λ = {λ ∈ Rk : λj ≥ 0,

j λj = 1}

Affine: Λ = {λ ∈ Rk :

j λj = 1}

Linear: Λ = Rk

SLIDE 5

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Existing works : Aggregation and Averaging

Aggregation T1, ..., Tk are not random (in practice: built from an independent training sample). Non-parametric regression: Y = θ(X) + ε ˆ λ = arg min

λ∈Λ

Y − ˆ

θλ(X)2 + pen(λ)

(Juditsky, Nemirovsky 2000).

Density estimation: X1, ..., Xn iid with density θ ˆ λ = arg min

λ∈Λ

ˆ

θλ2 − 2 n

n

i=1

ˆ θλ(Xi)

(Rigollet, Tsybakov 2007).

Flexibility in the choice of Λ Strong results (oracle inequalities, minimax rates, lower bounds...)

SLIDE 6

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Existing works : Aggregation and Averaging

Averaging Forecast averaging (time series): X1, ..., Xt Predictors T1(t), ..., Tk(t) ˆ λ = arg minλ∈Λ t

i=1

Xi − λ⊤T(i)

2 (Bates, Granger 1969). Model averaging (between misspecifed models) Regression: Yi = µ(Xi) + εi ˆ λ minimizes an estimator of the risk Compromise estimator (Hjort, Claeskens 2003), Jackknife (Hansen, Racine 2012), Mallows’ Cp (Benito 2012) Bayesian Model Averaging Likelihood: Yi ∼ f (y, θ, γ) Jackknife (Ando, Li 2014), AIC (Hjort, Claeskens 2003)

SLIDE 7

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Other examples

Example 1 : mean and median Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]

SLIDE 8

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Other examples

Example 1 : mean and median Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]

SLIDE 9

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

SLIDE 10

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

SLIDE 11

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

SLIDE 12

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Other examples

Example 2 : Weibull model Let x1, . . . , xn i.i.d with respect to the Weibull distribution f (x) = β η x η β−1 e−(x/η)β, x > 0. We consider 3 standard methods to estimate β and η the maximum likelihood estimator (ML) the method of moments (MM) the ordinary least squares method or Weibull plot (OLS)

SLIDE 13

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20) Simulations based on 104 replications.

ML

MM OLS 0.2 0.4 0.6 0.8 1.0 1.2 1.4

ML

MM OLS 1 2 3 4 5 6 7

SLIDE 14

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Other examples

Example 3 : Boolean model (Germ-grain model) the germs (centers of the discs) are drawn from a homogeneous Poisson process on R2 with intensity ρ the grains (discs) are independent with radius distributed according to a probability law µ ∼ B(1, α), α > 0.

Figure : Samples from a Boolean model on [0, 1]2 with intensity, from left to right,

ρ = 25, 50, 100, 150 and law of radii B(1, α) on [0, 0.1] where α = 1.

SLIDE 15

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

We do not observe the individual grains: likelihood-based inference impossible Let Aobs and Pobs the observed area and perimeter per unit area of the set N(u) the number of tangent lines orthogonal to u with convex boundary in direction u |W | the area of the observation window Estimator of α: ˆ α1 = Pobs 10(Aobs − 1) log(1 − Aobs) − 2 Estimators of ρ: ˆ ρ1 = 5 (ˆ α1 + 1)Pobs π(1 − Aobs)

r

ˆ ρ2 =

1 k

k

i=1 N(ui)

|W |(1 − Aobs)

SLIDE 16

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 17

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

General approach

Assume θ ∈ R. We consider the quadratic loss R(λ) := E(λ⊤T − θ)2, λ ∈ Λ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min

λ∈Λ R(λ)

General pattern to construct the averaging estimator :

1

Estimate the error: ˆ R(λ)

2

Compute ˆ λ = arg min

λ∈Λ

ˆ R(λ)

3

Build the averaging estimator: ˆ θ = ˆ λ⊤T Two important choices :

1

the constraint set Λ

2

the estimate ˆ R(λ) Rule of thumb : these choices must imply |ˆ θ − ˆ θ∗|

P

< < |ˆ θ∗ − θ|

SLIDE 18

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

General approach

Assume θ ∈ R. We consider the quadratic loss R(λ) := E(λ⊤T − θ)2, λ ∈ Λ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min

λ∈Λ R(λ)

General pattern to construct the averaging estimator :

1

Estimate the error: ˆ R(λ)

2

Compute ˆ λ = arg min

λ∈Λ

ˆ R(λ)

3

Build the averaging estimator: ˆ θ = ˆ λ⊤T Two important choices :

1

the constraint set Λ

2

the estimate ˆ R(λ) Rule of thumb : these choices must imply |ˆ θ − ˆ θ∗|

P

< < |ˆ θ∗ − θ|

SLIDE 19

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

General approach

Assume θ ∈ R. We consider the quadratic loss R(λ) := E(λ⊤T − θ)2, λ ∈ Λ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min

λ∈Λ R(λ)

General pattern to construct the averaging estimator :

1

Estimate the error: ˆ R(λ)

2

Compute ˆ λ = arg min

λ∈Λ

ˆ R(λ)

3

Build the averaging estimator: ˆ θ = ˆ λ⊤T Two important choices :

1

the constraint set Λ

2

the estimate ˆ R(λ) Rule of thumb : these choices must imply |ˆ θ − ˆ θ∗|

P

< < |ˆ θ∗ − θ|

SLIDE 20

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Choice of the constraint set Λ

Clearly : the larger Λ, the better the oracle. Linear: Λ = Rk The oracle is ˆ θ∗=λ∗⊤T with λ∗ = arg min

λ∈Rk E(λ⊤T − θ)2 = θ

E(TT⊤)

−1 E(T), where

E(TT⊤)
denotes the matrix with entries E(TiTj).

This gives ˆ θ∗ = λ∗⊤T = θ × E(T)⊤ E(TT⊤) −1 T = θ × ˆ 1 = ⇒ If Λ = Rk, it will at least as difficult to estimate λ∗, than θ itself. Λ = Rk is not a good choice

SLIDE 21

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Choice of the constraint set Λ

Clearly : the larger Λ, the better the oracle. Linear: Λ = Rk The oracle is ˆ θ∗=λ∗⊤T with λ∗ = arg min

λ∈Rk E(λ⊤T − θ)2 = θ

E(TT⊤)

−1 E(T), where

E(TT⊤)
denotes the matrix with entries E(TiTj).

This gives ˆ θ∗ = λ∗⊤T = θ × E(T)⊤ E(TT⊤) −1 T = θ × ˆ 1 = ⇒ If Λ = Rk, it will at least as difficult to estimate λ∗, than θ itself. Λ = Rk is not a good choice

SLIDE 22

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Choice of the constraint set

Affine: Λ = {λ ∈ Rk : λ⊤1 = 1} The oracle seems more accessible ˆ θ − ˆ θ∗ = (ˆ λ − λ∗)⊤T = (ˆ λ − λ∗)⊤(T − θ1) The risk writes in terms of the MSE matrix Σ = E

(T − θ1)(T − θ1)⊤

R(λ) = E(λ⊤T − θ)2 = E

λ⊤(T − θ1)

2 = λ⊤Σλ Explicit formula for the oracle λ∗ = arg min

λ∈Rk :λ⊤1=1 λ⊤Σλ =

Σ−11 1⊤Σ−11

SLIDE 23

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Our framework

Maximal constraint set (affine): Λmax = {λ ∈ Rk : λ⊤1 = 1} Conditions on Λ:

1

Λ ⊆ Λmax

2

Λ closed and non-empty (existence of the solution) Then for Σ = E

(T − θ1)(T − θ1)⊤

and ˆ Σ an estimate of Σ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min

λ∈Λ λ⊤Σλ

The averaging estimator is ˆ θ= ˆ λ⊤T where ˆ λ = arg min

λ∈Λ λ⊤ ˆ

Σλ If Λ = Λmax, explicit formulas λ∗ = Σ−11 1⊤Σ−11 ˆ λ = ˆ Σ−11 1⊤ ˆ Σ−11

SLIDE 24

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 25

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Error bound

For Λ ⊂ Λmax, set δΛ( Σ, Σ) = sup

λ∈Λ

λ⊤Σλ

λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ

.

Proposition Let Λ be a non-empty closed convex subset of Λmax then (ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − θ1)2

Green term : MSE of the oracle Orange term : plays the role of a constant in view of EΣ− 1

2 (T − 1θ)2 = k.

Blue term : is small, provided ˆ Σ is ” close”to Σ Lemma : δΛ( Σ, Σ) ≤

ΣΣ−1 − Σ

Σ−1

SLIDE 26

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Asymptotic results

Let n denote the size of the sample used to produce T, and αn := E(ˆ θ∗

n − θ)2 = λ∗⊤ n Σnλ∗ n,

ˆ αn = ˆ λ⊤

n ˆ

Σnˆ λn. Corollary If ˆ ΣnΣ−1

n P

− → I, then (ˆ θn − θ)2 = (ˆ θ∗

n − θ)2 + o P(αn).

Moreover, if α

− 1

2

n

(ˆ θ∗

n − θ) d

− → N(0, 1), then ˆ α

− 1

2

n

(ˆ θn − θ)

d

− → N(0, 1). Condition in between Σn − Σn

P

− → 0 and Σ−1

n

− Σ−1

n P

− → 0 No independence assumption on Tn and Σn Optimality in L2 under stronger assumptions The last statement allows to construct asymptotic confidence intervals for θ, without further approximation (since ˆ αn is already computed to get ˆ θ).

SLIDE 27

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Sketch of proof

(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − θ1)2

where δΛ( Σ, Σ) = supλ∈Λ

λ⊤Σλ

λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ

.

1

(ˆ θ − ˆ θ∗)2 =

(ˆ

λ − λ∗)⊤(T − θ1) 2 =

(ˆ

λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1

2 (T − θ1)2 2

(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0

3

(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2

SLIDE 28

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Sketch of proof

(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − θ1)2

where δΛ( Σ, Σ) = supλ∈Λ

λ⊤Σλ

λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ

.

1

(ˆ θ − ˆ θ∗)2 =

(ˆ

λ − λ∗)⊤(T − θ1) 2 =

(ˆ

λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1

2 (T − θ1)2 2

(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0

3

(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2

SLIDE 29

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Sketch of proof

(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − θ1)2

where δΛ( Σ, Σ) = supλ∈Λ

λ⊤Σλ

λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ

.

1

(ˆ θ − ˆ θ∗)2 =

(ˆ

λ − λ∗)⊤(T − θ1) 2 =

(ˆ

λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1

2 (T − θ1)2 2

(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0

3

(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2

SLIDE 30

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Sketch of proof

(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − θ1)2

where δΛ( Σ, Σ) = supλ∈Λ

λ⊤Σλ

λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ

.

1

(ˆ θ − ˆ θ∗)2 =

(ˆ

λ − λ∗)⊤(T − θ1) 2 =

(ˆ

λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1

2 (T − θ1)2 2

(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0

3

(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2

SLIDE 31

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 32

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Estimation of Σ

Parametric model : Σn = Σn(θ) 1) Plug-in Σn = Σn(ˆ θ0) Requires a consistent initial guess ˆ θ0 Computable with only Tn (no need to observe the sample X1, ..., Xn) Condition ˆ ΣnΣ−1

n P

− → I : OK under regularity conditions on Σn. Example : Σn(θ) = anW (θ) + o(an) for an → 0 and W continuous. 2) Parametric Bootstrap : the same Semi or Non-Parametric model : 1) Asymptotic plug-in if we know an asymptotic form like Σn(θ, η) = anW (θ, η) + o(an) where η : nuisance parameter Condition ˆ ΣnΣ−1

n P

− → I : OK if W continuous 2) Bootstrap Condition ˆ ΣnΣ−1

n P

− → I : ?

SLIDE 33

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Estimation of Σ

Parametric model : Σn = Σn(θ) 1) Plug-in Σn = Σn(ˆ θ0) Requires a consistent initial guess ˆ θ0 Computable with only Tn (no need to observe the sample X1, ..., Xn) Condition ˆ ΣnΣ−1

n P

− → I : OK under regularity conditions on Σn. Example : Σn(θ) = anW (θ) + o(an) for an → 0 and W continuous. 2) Parametric Bootstrap : the same Semi or Non-Parametric model : 1) Asymptotic plug-in if we know an asymptotic form like Σn(θ, η) = anW (θ, η) + o(an) where η : nuisance parameter Condition ˆ ΣnΣ−1

n P

− → I : OK if W continuous 2) Bootstrap Condition ˆ ΣnΣ−1

n P

− → I : ?

SLIDE 34

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 35

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Generalization to several parameters

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each θj (the Tj may have different sizes kj). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

This is the previous setting, where we considered the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td

and we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

− → The oracle then depends on the MSE block matrix Σn, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤ − → The theory is the same, i.e. ”

ptimality”whenever ˆ

ΣnΣ−1

n P

− → I.

SLIDE 36

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Generalization to several parameters

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each θj (the Tj may have different sizes kj). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

This is the previous setting, where we considered the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td

and we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

− → The oracle then depends on the MSE block matrix Σn, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤ − → The theory is the same, i.e. ”

ptimality”whenever ˆ

ΣnΣ−1

n P

− → I.

SLIDE 37

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Generalization to several parameters

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each θj (the Tj may have different sizes kj). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

This is the previous setting, where we considered the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td

and we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

− → The oracle then depends on the MSE block matrix Σn, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤ − → The theory is the same, i.e. ”

ptimality”whenever ˆ

ΣnΣ−1

n P

− → I.

SLIDE 38

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 39

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 1: Position of a symmetric distribution

X1, ..., Xn iid with density f symmetric around θ ∈ R Estimators of θ: T1 = X n = 1 n

n

i=1

Xi and T2 = X(n/2) = median(X1, ..., Xn) d = 1, k = 2, Λ = Λmax Estimation of Σn:

Asymptotic plug-in with ˆ θ0 = X(n/2) (always consistent): Σn = 1 n W + o 1 n

with

W =   σ2

E|X−θ| 2f (θ) E|X−θ| 2f (θ) 1 4f (θ)2

  Bootstrap (100 replications)

SLIDE 40

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 1: Position of a symmetric distribution

Estimated MSE (104 replications) with standard errors in parenthesis Several distributions f with θ = 0

MEAN: X n MED: X(n/2) AV: ˆ θ from plug-in AVB: ˆ θ from Bootstrap.

n=30 n=50 n=100 MEAN MED AV AVB MEAN MED AV AVB MEAN MED AV AVB

Cauchy

2.106 9 8.95 8.99 4.107 5.07 4.92 4.9 2.107 2.56 2.49 2.49 (1.106) (0.14) (0.15) (0.15) (4.107) (0.08) (0.08) (0.08) (2.107) (0.04) (0.04) (0.04)

St(4)

6.68 5.71 5.4 5.43 4.12 3.53 3.33 3.34 1.99 1.74 1.61 1.62 (0.1) (0.08) (0.08) (0.08) (0.06) (0.05) (0.05) (0.05) (0.03) (0.02) (0.02) (0.02)

St(7)

4.8 5.51 4.6 4.64 2.82 3.32 2.74 2.8 1.42 1.67 1.37 1.38 (0.07) (0.08) (0.07) (0.07) (0.04) (0.05) (0.04) (0.04) (0.02) (0.02) (0.02) (0.02)

Logistic

10.89 12.7 10.76 10.87 6.64 7.93 6.52 6.6 3.3 4 3.2 3.26 (0.16) (0.18) (0.16) (0.16) (0.09) (0.11) (0.09) (0.09) (0.05) (0.06) (0.05) (0.05)

Gauss

3.39 5.11 3.53 3.61 2.04 3.1 2.1 2.15 1 1.51 1.02 1.06 (0.05) (0.07) (0.05) (0.05) (0.03) (0.04) (0.03) (0.03) (0.01) (0.02) (0.01) (0.01)

Mix

16.79 87 15.03 13.41 10.08 66.53 7.57 6.68 5.05 42.35 3.09 2.36 (0.23) (0.82) (0.29) (0.3) (0.14) (0.64) (0.15) (0.18) (0.07) (0.43) (0.06) (0.07)

SLIDE 41

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 2: Weibull model

X1, ..., Xn iid with Weibull density f (x) = β

η

x

η

β−1 exp(−( x

η )β), x > 0.

3 estimators are considered for β T1 = MLE, T2 = MM, T3 = OLS η is estimated by MLE: ˆ ηML d = 2, k1 = 3, k2 = 1, Λ = Λmax: ˆ βAV = λ1T1 + λ2T2 + λ3T3 with λ1 + λ2 + λ3 = 1 ˆ ηAV = ˆ ηML + λ1T1 + λ2T2 + λ3T3 with λ1 + λ2 + λ3 = 0. Σn estimated by parametric Bootstrap (100 replications).

SLIDE 42

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 2 : Weibull model

Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20)

ML

MM OLS AG 0.2 0.4 0.6 0.8 1.0 1.2 1.4

ML

MM OLS AG 1 2 3 4 5 6

SLIDE 43

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 2 : Weibull model

Simulations for several values of β, η = 10, and different sample sizes n. Estimated MSE (104 replications) with standard errors in parenthesis For the estimation of β:

n=10 n=20 n=50 ML MM OLS AV ML MM OLS AV ML MM OLS AV β = 0.5 35.53 76.95 24.41 25.27 12.06 35.57 13.74 10.5 3.7 14.19 6.04 3.52 (0.91) (1.27) (0.40) (0.64) (0.26) (0.52) (0.19) (0.19) (0.07) (0.20) (0.08) (0.06) β = 1 152.4 131.6 98.1 85.5 49.2 53.6 54.2 36.9 14.4 19.3 23.9 12.8 (3.8) (3.1) (1.5) (1.7) (1.1) (1.1) (0.7) (0.7) (0.2) (0.3) (0.3) (0.2) β = 2 596.4 444.6 399.4 355.5 194.5 164.5 218 163.3 57.9 53.9 94.8 54.3 (14.4) (11.9) (6.3) (6.7) (3.8) (3.3) (2.8) (2.7) (1.0) (0.9) (1.3) (0.9) β = 3 1369 1080 905 770 452 394 486 343 128 122 211 120 (34.6) (29.7) (14.6) (18.1) (9.8) (8.9) (6.7) (6.2) (2.2) (2.0) (2.7) (1.9)

SLIDE 44

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 2 : Weibull model

For the estimation of η:

n=10 n=20 n=50 ML AV ML AV ML AV β = 0.5 60.59 55.61 25.96 24.56 9.57 9.38 (1.60) (1.48) (0.53) (0.5) (0.17) (0.17) β = 1 11.15 10.88 5.53 5.43 2.23 2.22 (0.18) (0.17) (0.08) (0.08) (0.03) (0.03) β = 2 2.71 2.74 1.36 1.37 0.55 0.56 (0.04) (0.04) (0.02) (0.02) (0.01) (0.01) β = 3 1.21 1.23 0.61 0.61 0.247 0.248 (0.02) (0.02) (0.01) (0.01) (0.003) (0.004)

SLIDE 45

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 3: Boolean model

Germ-grain model : Germs follow a homogeneous Poisson process with intensity ρ Grains are balls with radii distributed according to B(1, α) Two parameters : ρ and α

SLIDE 46

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 3: Boolean model

ˆ ρ1: parametric estimator of ρ (based on area and perimeter fraction) ˆ ρ2: non-parametric estimator of ρ (based on tangent lines) ˆ α1: parametric estimator of α (based on area and perimeter fraction) d = 2, k1 = 2, k2 = 1, Λ = Λmax ˆ ρAV = λ1ˆ ρ1 + λ2ˆ ρ2 with λ1 + λ2 = 1 ˆ αAV = ˆ α1 + λ1ˆ ρ1 + λ2ˆ ρ2 with λ1 + λ2 = 0 Σn estimated by parametric Bootstrap (100 replications)

ˆ ρ1 ˆ ρ2 ˆ ρAV ˆ α1 ˆ αAV ρ = 25 34.15 14.63 14.60 8.09 6.70 (0.55) (0.22) (0.22) (0.15) (0.13) ρ = 50 131.63 47.41 45.65 4.69 3.24 (2.26) (0.72) (0.67) (0.067) (0.048) ρ = 100 949 272 223 5.70 2.29 (21.8) (4.9) (3.6) (0.086) (0.034) ρ = 150 7606 1656 1005 14.7 4.1 (341) (46.5) (24.4) (0.34) (0.11)

SLIDE 47

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 4 : kernel density estimation

Estimation of a density f based on a sample of size n. We choose the Gaussian kernel and we consider 4 choices of bandwidth in the function density (option bw): h1 : nrd0 (Silverman rule of thumb), h2 : nrd (a variation) h3 : ucv (unbiased cross validation), h4 : SJ (Sheather and Jones method). Denoted the initial estimators by T = (ˆ fn,h1, . . . , ˆ fn,h4)⊤, the average estimator

f f over Λmax is

ˆ fAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T where Σ is the MISE matrix with entries

E(ˆ

fn,hi (x) − f (x))(ˆ fn,hj (x) − f (x))dx.

SLIDE 48

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 4 : kernel density estimation

d = 1, k = 4, Λ = Λmax Σn estimated by asymptotic plug-in as in Jones and Sheater (1991)

n=250 n=500 n=1000 h1 h2 h3 h4 AV h1 h2 h3 h4 AV h1 h2 h3 h4 AV

Gauss

29.9 27.2 26.8 29.9 24.9 17.7 16.2 16.2 17.3 14.4 10.5 9.7 9.8 10.1 8.4

Mix

24.0 27.5 27.1 25.2 26.7 14.8 17.6 15.3 14.9 14.2 9.1 11.1 8.9 8.8 7.4

Gamma

28.0 32.7 29.5 28.9 27.9 17.1 20.6 17.0 17.2 15.8 10.3 12.7 10.0 10.3 9.0

Cauchy

31.2 37.0 830 132 32.8 18.9 23.2 945 180 18.7 11.4 14.4 1068 226 10.6

Table : The MISE are estimated by the mean over 104 replications of the integrated square error,

btained by summing up the square error of 100 equally spaced points on the support of f .

SLIDE 49

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 4 : kernel density estimation

Estimated MSE of ˆ f (x) in terms of x Left : Gaussian law; Right : Mixture distribution. n = 500 ˆ fh1, ˆ fh2, ˆ fh3, ˆ fh4 : crosses (black,red,green, blue resp.) ˆ fAV : black circles

−2

−1 1 2 2e−04 3e−04 4e−04 5e−04 6e−04 7e−04 8e−04

−3

−2 −1 1 2 3 1e−04 2e−04 3e−04 4e−04 5e−04 6e−04

SLIDE 50

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 5: quantile estimation under misspecified models

X1, ..., Xn iid with unknown distribution µ The estimation of the p-quantile q depends on the tail of µ when p ≈ 1 (here p = 0.99) Estimators of q:

the non parametric estimator ˆ qNP = x(⌊np⌋) the parametric estimator ˆ qW associated to the Weibull distribution the parametric estimator ˆ qG associated to the Gamma distribution the parametric estimator ˆ qB associated to the Burr distribution

The parametric estimators of q are obtained from three parametric models with different tail indexes: Weibull (light-tailed), Gamma distribution (heavy-tailed) and Burr distribution (fat-tailed). Most estimators are built from misspecified models and are not consistent

SLIDE 51

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Example 5: quantile estimation under misspecified models

d = 1, k = 5, convex combination : Λ = {λ ∈ Rk : λj ≥ 0, λ⊤1 = 1} Σn estimated by Bootstrap with ˆ θ0 = ˆ qNP MSE estimation based on 104 replications. p = 0.99 and n = 1000

ˆ qW ˆ qG ˆ qB ˆ qNP ˆ qAV Weibull 21 2340 1.106 57 47 (0.30) (6.1) (1.103) (0.77) (0.70) Gamma 171 18 3.108 62 60 (1.07) (0.26) (5.106) (0.85) (0.83) Burr 896 1274 65 243 182 (4.3) (5.7) (0.96) (4.26) (2.81) Lognormal 72.9 98.7 133.9 13.8 13.8 (0.27) (0.26) (0.86) (0.22) (0.20)

SLIDE 52

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

1

Introduction

2

The method

3

Theoretical results

4

Estimation of the MSE matrix Σ

5

Generalization to several parameters

6

Simulations

7

Conclusion

SLIDE 53

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Conclusion

The oracle ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1

n P

− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...

SLIDE 54

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Conclusion

The oracle ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1

n P

− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...

SLIDE 55

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Conclusion

The oracle ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1

n P

− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...

SLIDE 56

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Conclusion

The oracle ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1

n P

− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...

SLIDE 57

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Conclusion

The oracle ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1

n P

− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...

SLIDE 58

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion

Conclusion

The oracle ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1

n P

− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...

SLIDE 59

Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion