Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
A general procedure to combine estimators Fr ed eric Lavancier and - - PowerPoint PPT Presentation
A general procedure to combine estimators Fr ed eric Lavancier and - - PowerPoint PPT Presentation
Introduction Method Theory Estimation of Several parameters Simulations Conclusion A general procedure to combine estimators Fr ed eric Lavancier and Paul Rochet Laboratoire de Math ematiques Jean Leray University of Nantes
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
The problem
Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate. Natural approach : choose a suitable combination ˆ θλ =
k
- j=1
λjTj = λ⊤T, λ ∈ Λ ⊆ Rk. where T = (T1, ..., Tk)⊤. This amounts to find ˆ λ. Standard settings : Selection: Λ = {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)} Convex: Λ = {λ ∈ Rk : λj ≥ 0,
j λj = 1}
Affine: Λ = {λ ∈ Rk :
j λj = 1}
Linear: Λ = Rk
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
The problem
Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate. Natural approach : choose a suitable combination ˆ θλ =
k
- j=1
λjTj = λ⊤T, λ ∈ Λ ⊆ Rk. where T = (T1, ..., Tk)⊤. This amounts to find ˆ λ. Standard settings : Selection: Λ = {(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)} Convex: Λ = {λ ∈ Rk : λj ≥ 0,
j λj = 1}
Affine: Λ = {λ ∈ Rk :
j λj = 1}
Linear: Λ = Rk
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Existing works : Aggregation and Averaging
Aggregation T1, ..., Tk are not random (in practice: built from an independent training sample). Non-parametric regression: Y = θ(X) + ε ˆ λ = arg min
λ∈Λ
- Y − ˆ
θλ(X)2 + pen(λ)
- (Juditsky, Nemirovsky 2000).
Density estimation: X1, ..., Xn iid with density θ ˆ λ = arg min
λ∈Λ
- ˆ
θλ2 − 2 n
n
- i=1
ˆ θλ(Xi)
- (Rigollet, Tsybakov 2007).
Flexibility in the choice of Λ Strong results (oracle inequalities, minimax rates, lower bounds...)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Existing works : Aggregation and Averaging
Averaging Forecast averaging (time series): X1, ..., Xt Predictors T1(t), ..., Tk(t) ˆ λ = arg minλ∈Λ t
i=1
- Xi − λ⊤T(i)
2 (Bates, Granger 1969). Model averaging (between misspecifed models) Regression: Yi = µ(Xi) + εi ˆ λ minimizes an estimator of the risk Compromise estimator (Hjort, Claeskens 2003), Jackknife (Hansen, Racine 2012), Mallows’ Cp (Benito 2012) Bayesian Model Averaging Likelihood: Yi ∼ f (y, θ, γ) Jackknife (Ando, Li 2014), AIC (Hjort, Claeskens 2003)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Other examples
Example 1 : mean and median Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Other examples
Example 1 : mean and median Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Other examples
Example 2 : Weibull model Let x1, . . . , xn i.i.d with respect to the Weibull distribution f (x) = β η x η β−1 e−(x/η)β, x > 0. We consider 3 standard methods to estimate β and η the maximum likelihood estimator (ML) the method of moments (MM) the ordinary least squares method or Weibull plot (OLS)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20) Simulations based on 104 replications.
- ML
MM OLS 0.2 0.4 0.6 0.8 1.0 1.2 1.4
- ML
MM OLS 1 2 3 4 5 6 7
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Other examples
Example 3 : Boolean model (Germ-grain model) the germs (centers of the discs) are drawn from a homogeneous Poisson process on R2 with intensity ρ the grains (discs) are independent with radius distributed according to a probability law µ ∼ B(1, α), α > 0.
Figure : Samples from a Boolean model on [0, 1]2 with intensity, from left to right,
ρ = 25, 50, 100, 150 and law of radii B(1, α) on [0, 0.1] where α = 1.
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
We do not observe the individual grains: likelihood-based inference impossible Let Aobs and Pobs the observed area and perimeter per unit area of the set N(u) the number of tangent lines orthogonal to u with convex boundary in direction u |W | the area of the observation window Estimator of α: ˆ α1 = Pobs 10(Aobs − 1) log(1 − Aobs) − 2 Estimators of ρ: ˆ ρ1 = 5 (ˆ α1 + 1)Pobs π(1 − Aobs)
- r
ˆ ρ2 =
1 k
k
i=1 N(ui)
|W |(1 − Aobs)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
General approach
Assume θ ∈ R. We consider the quadratic loss R(λ) := E(λ⊤T − θ)2, λ ∈ Λ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min
λ∈Λ R(λ)
General pattern to construct the averaging estimator :
1
Estimate the error: ˆ R(λ)
2
Compute ˆ λ = arg min
λ∈Λ
ˆ R(λ)
3
Build the averaging estimator: ˆ θ = ˆ λ⊤T Two important choices :
1
the constraint set Λ
2
the estimate ˆ R(λ) Rule of thumb : these choices must imply |ˆ θ − ˆ θ∗|
P
< < |ˆ θ∗ − θ|
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
General approach
Assume θ ∈ R. We consider the quadratic loss R(λ) := E(λ⊤T − θ)2, λ ∈ Λ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min
λ∈Λ R(λ)
General pattern to construct the averaging estimator :
1
Estimate the error: ˆ R(λ)
2
Compute ˆ λ = arg min
λ∈Λ
ˆ R(λ)
3
Build the averaging estimator: ˆ θ = ˆ λ⊤T Two important choices :
1
the constraint set Λ
2
the estimate ˆ R(λ) Rule of thumb : these choices must imply |ˆ θ − ˆ θ∗|
P
< < |ˆ θ∗ − θ|
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
General approach
Assume θ ∈ R. We consider the quadratic loss R(λ) := E(λ⊤T − θ)2, λ ∈ Λ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min
λ∈Λ R(λ)
General pattern to construct the averaging estimator :
1
Estimate the error: ˆ R(λ)
2
Compute ˆ λ = arg min
λ∈Λ
ˆ R(λ)
3
Build the averaging estimator: ˆ θ = ˆ λ⊤T Two important choices :
1
the constraint set Λ
2
the estimate ˆ R(λ) Rule of thumb : these choices must imply |ˆ θ − ˆ θ∗|
P
< < |ˆ θ∗ − θ|
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Choice of the constraint set Λ
Clearly : the larger Λ, the better the oracle. Linear: Λ = Rk The oracle is ˆ θ∗=λ∗⊤T with λ∗ = arg min
λ∈Rk E(λ⊤T − θ)2 = θ
- E(TT⊤)
−1 E(T), where
- E(TT⊤)
- denotes the matrix with entries E(TiTj).
This gives ˆ θ∗ = λ∗⊤T = θ × E(T)⊤ E(TT⊤) −1 T = θ × ˆ 1 = ⇒ If Λ = Rk, it will at least as difficult to estimate λ∗, than θ itself. Λ = Rk is not a good choice
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Choice of the constraint set Λ
Clearly : the larger Λ, the better the oracle. Linear: Λ = Rk The oracle is ˆ θ∗=λ∗⊤T with λ∗ = arg min
λ∈Rk E(λ⊤T − θ)2 = θ
- E(TT⊤)
−1 E(T), where
- E(TT⊤)
- denotes the matrix with entries E(TiTj).
This gives ˆ θ∗ = λ∗⊤T = θ × E(T)⊤ E(TT⊤) −1 T = θ × ˆ 1 = ⇒ If Λ = Rk, it will at least as difficult to estimate λ∗, than θ itself. Λ = Rk is not a good choice
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Choice of the constraint set
Affine: Λ = {λ ∈ Rk : λ⊤1 = 1} The oracle seems more accessible ˆ θ − ˆ θ∗ = (ˆ λ − λ∗)⊤T = (ˆ λ − λ∗)⊤(T − θ1) The risk writes in terms of the MSE matrix Σ = E
- (T − θ1)(T − θ1)⊤
R(λ) = E(λ⊤T − θ)2 = E
- λ⊤(T − θ1)
2 = λ⊤Σλ Explicit formula for the oracle λ∗ = arg min
λ∈Rk :λ⊤1=1 λ⊤Σλ =
Σ−11 1⊤Σ−11
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Our framework
Maximal constraint set (affine): Λmax = {λ ∈ Rk : λ⊤1 = 1} Conditions on Λ:
1
Λ ⊆ Λmax
2
Λ closed and non-empty (existence of the solution) Then for Σ = E
- (T − θ1)(T − θ1)⊤
and ˆ Σ an estimate of Σ The oracle is ˆ θ∗=λ∗⊤T where λ∗ = arg min
λ∈Λ λ⊤Σλ
The averaging estimator is ˆ θ= ˆ λ⊤T where ˆ λ = arg min
λ∈Λ λ⊤ ˆ
Σλ If Λ = Λmax, explicit formulas λ∗ = Σ−11 1⊤Σ−11 ˆ λ = ˆ Σ−11 1⊤ ˆ Σ−11
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Error bound
For Λ ⊂ Λmax, set δΛ( Σ, Σ) = sup
λ∈Λ
- λ⊤Σλ
λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ
- .
Proposition Let Λ be a non-empty closed convex subset of Λmax then (ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1
2 (T − θ1)2
Green term : MSE of the oracle Orange term : plays the role of a constant in view of EΣ− 1
2 (T − 1θ)2 = k.
Blue term : is small, provided ˆ Σ is ” close”to Σ Lemma : δΛ( Σ, Σ) ≤
- ΣΣ−1 − Σ
Σ−1
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Asymptotic results
Let n denote the size of the sample used to produce T, and αn := E(ˆ θ∗
n − θ)2 = λ∗⊤ n Σnλ∗ n,
ˆ αn = ˆ λ⊤
n ˆ
Σnˆ λn. Corollary If ˆ ΣnΣ−1
n P
− → I, then (ˆ θn − θ)2 = (ˆ θ∗
n − θ)2 + o P(αn).
Moreover, if α
− 1
2
n
(ˆ θ∗
n − θ) d
− → N(0, 1), then ˆ α
− 1
2
n
(ˆ θn − θ)
d
− → N(0, 1). Condition in between Σn − Σn
P
− → 0 and Σ−1
n
− Σ−1
n P
− → 0 No independence assumption on Tn and Σn Optimality in L2 under stronger assumptions The last statement allows to construct asymptotic confidence intervals for θ, without further approximation (since ˆ αn is already computed to get ˆ θ).
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Sketch of proof
(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1
2 (T − θ1)2
where δΛ( Σ, Σ) = supλ∈Λ
- λ⊤Σλ
λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ
- .
1
(ˆ θ − ˆ θ∗)2 =
- (ˆ
λ − λ∗)⊤(T − θ1) 2 =
- (ˆ
λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1
2 (T − θ1)2 2
(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0
3
(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Sketch of proof
(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1
2 (T − θ1)2
where δΛ( Σ, Σ) = supλ∈Λ
- λ⊤Σλ
λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ
- .
1
(ˆ θ − ˆ θ∗)2 =
- (ˆ
λ − λ∗)⊤(T − θ1) 2 =
- (ˆ
λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1
2 (T − θ1)2 2
(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0
3
(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Sketch of proof
(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1
2 (T − θ1)2
where δΛ( Σ, Σ) = supλ∈Λ
- λ⊤Σλ
λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ
- .
1
(ˆ θ − ˆ θ∗)2 =
- (ˆ
λ − λ∗)⊤(T − θ1) 2 =
- (ˆ
λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1
2 (T − θ1)2 2
(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0
3
(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Sketch of proof
(ˆ θ − ˆ θ∗)2 ≤ E(ˆ θ∗ − θ)2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1
2 (T − θ1)2
where δΛ( Σ, Σ) = supλ∈Λ
- λ⊤Σλ
λ⊤ Σλ − λ⊤ Σλ λ⊤Σλ
- .
1
(ˆ θ − ˆ θ∗)2 =
- (ˆ
λ − λ∗)⊤(T − θ1) 2 =
- (ˆ
λ − λ∗)⊤Σ1/2Σ−1/2(T − θ1) 2 ≤ (ˆ λ − λ∗)⊤Σ1/22 Σ− 1
2 (T − θ1)2 2
(ˆ λ − λ∗)⊤Σ1/22 = ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ − 2λ∗⊤Σ(ˆ λ − λ∗) Since R(λ) = λ⊤Σλ is convex and Λ is convex, for any λ ∈ Λ, ∇R(λ∗).(λ − λ∗) ≥ 0 ⇒ λ∗⊤Σ(ˆ λ − λ∗) ≥ 0
3
(ˆ λ − λ∗)⊤Σ1/22 ≤ ˆ λ⊤Σˆ λ − λ∗⊤Σλ∗ = (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (ˆ λ⊤ ˆ Σˆ λ − λ∗⊤Σλ∗) ≤ (ˆ λ⊤Σˆ λ − ˆ λ⊤ ˆ Σˆ λ) + (λ∗⊤ ˆ Σλ∗ − λ∗⊤Σλ∗) . . . ≤ λ∗⊤Σλ∗ 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Estimation of Σ
Parametric model : Σn = Σn(θ) 1) Plug-in Σn = Σn(ˆ θ0) Requires a consistent initial guess ˆ θ0 Computable with only Tn (no need to observe the sample X1, ..., Xn) Condition ˆ ΣnΣ−1
n P
− → I : OK under regularity conditions on Σn. Example : Σn(θ) = anW (θ) + o(an) for an → 0 and W continuous. 2) Parametric Bootstrap : the same Semi or Non-Parametric model : 1) Asymptotic plug-in if we know an asymptotic form like Σn(θ, η) = anW (θ, η) + o(an) where η : nuisance parameter Condition ˆ ΣnΣ−1
n P
− → I : OK if W continuous 2) Bootstrap Condition ˆ ΣnΣ−1
n P
− → I : ?
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Estimation of Σ
Parametric model : Σn = Σn(θ) 1) Plug-in Σn = Σn(ˆ θ0) Requires a consistent initial guess ˆ θ0 Computable with only Tn (no need to observe the sample X1, ..., Xn) Condition ˆ ΣnΣ−1
n P
− → I : OK under regularity conditions on Σn. Example : Σn(θ) = anW (θ) + o(an) for an → 0 and W continuous. 2) Parametric Bootstrap : the same Semi or Non-Parametric model : 1) Asymptotic plug-in if we know an asymptotic form like Σn(θ, η) = anW (θ, η) + o(an) where η : nuisance parameter Condition ˆ ΣnΣ−1
n P
− → I : OK if W continuous 2) Bootstrap Condition ˆ ΣnΣ−1
n P
− → I : ?
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Generalization to several parameters
Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each θj (the Tj may have different sizes kj). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤
1 T1
This is the previous setting, where we considered the constraint ˆ λ⊤
1 1 = 1.
Or we can consider the full combination ˆ θ1 = ˆ λ⊤
1 T1 + · · · + ˆ
λ⊤
d Td
and we consider the constraints : ˆ λ⊤
1 1 = 1
and ∀j = 1, ˆ λ⊤
j 1 = 0.
− → The oracle then depends on the MSE block matrix Σn, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤ − → The theory is the same, i.e. ”
- ptimality”whenever ˆ
ΣnΣ−1
n P
− → I.
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Generalization to several parameters
Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each θj (the Tj may have different sizes kj). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤
1 T1
This is the previous setting, where we considered the constraint ˆ λ⊤
1 1 = 1.
Or we can consider the full combination ˆ θ1 = ˆ λ⊤
1 T1 + · · · + ˆ
λ⊤
d Td
and we consider the constraints : ˆ λ⊤
1 1 = 1
and ∀j = 1, ˆ λ⊤
j 1 = 0.
− → The oracle then depends on the MSE block matrix Σn, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤ − → The theory is the same, i.e. ”
- ptimality”whenever ˆ
ΣnΣ−1
n P
− → I.
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Generalization to several parameters
Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each θj (the Tj may have different sizes kj). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤
1 T1
This is the previous setting, where we considered the constraint ˆ λ⊤
1 1 = 1.
Or we can consider the full combination ˆ θ1 = ˆ λ⊤
1 T1 + · · · + ˆ
λ⊤
d Td
and we consider the constraints : ˆ λ⊤
1 1 = 1
and ∀j = 1, ˆ λ⊤
j 1 = 0.
− → The oracle then depends on the MSE block matrix Σn, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤ − → The theory is the same, i.e. ”
- ptimality”whenever ˆ
ΣnΣ−1
n P
− → I.
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 1: Position of a symmetric distribution
X1, ..., Xn iid with density f symmetric around θ ∈ R Estimators of θ: T1 = X n = 1 n
n
- i=1
Xi and T2 = X(n/2) = median(X1, ..., Xn) d = 1, k = 2, Λ = Λmax Estimation of Σn:
Asymptotic plug-in with ˆ θ0 = X(n/2) (always consistent): Σn = 1 n W + o 1 n
- with
W = σ2
E|X−θ| 2f (θ) E|X−θ| 2f (θ) 1 4f (θ)2
Bootstrap (100 replications)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 1: Position of a symmetric distribution
Estimated MSE (104 replications) with standard errors in parenthesis Several distributions f with θ = 0
MEAN: X n MED: X(n/2) AV: ˆ θ from plug-in AVB: ˆ θ from Bootstrap.
n=30 n=50 n=100 MEAN MED AV AVB MEAN MED AV AVB MEAN MED AV AVB
Cauchy
2.106 9 8.95 8.99 4.107 5.07 4.92 4.9 2.107 2.56 2.49 2.49 (1.106) (0.14) (0.15) (0.15) (4.107) (0.08) (0.08) (0.08) (2.107) (0.04) (0.04) (0.04)
St(4)
6.68 5.71 5.4 5.43 4.12 3.53 3.33 3.34 1.99 1.74 1.61 1.62 (0.1) (0.08) (0.08) (0.08) (0.06) (0.05) (0.05) (0.05) (0.03) (0.02) (0.02) (0.02)
St(7)
4.8 5.51 4.6 4.64 2.82 3.32 2.74 2.8 1.42 1.67 1.37 1.38 (0.07) (0.08) (0.07) (0.07) (0.04) (0.05) (0.04) (0.04) (0.02) (0.02) (0.02) (0.02)
Logistic
10.89 12.7 10.76 10.87 6.64 7.93 6.52 6.6 3.3 4 3.2 3.26 (0.16) (0.18) (0.16) (0.16) (0.09) (0.11) (0.09) (0.09) (0.05) (0.06) (0.05) (0.05)
Gauss
3.39 5.11 3.53 3.61 2.04 3.1 2.1 2.15 1 1.51 1.02 1.06 (0.05) (0.07) (0.05) (0.05) (0.03) (0.04) (0.03) (0.03) (0.01) (0.02) (0.01) (0.01)
Mix
16.79 87 15.03 13.41 10.08 66.53 7.57 6.68 5.05 42.35 3.09 2.36 (0.23) (0.82) (0.29) (0.3) (0.14) (0.64) (0.15) (0.18) (0.07) (0.43) (0.06) (0.07)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 2: Weibull model
X1, ..., Xn iid with Weibull density f (x) = β
η
- x
η
β−1 exp(−( x
η )β), x > 0.
3 estimators are considered for β T1 = MLE, T2 = MM, T3 = OLS η is estimated by MLE: ˆ ηML d = 2, k1 = 3, k2 = 1, Λ = Λmax: ˆ βAV = λ1T1 + λ2T2 + λ3T3 with λ1 + λ2 + λ3 = 1 ˆ ηAV = ˆ ηML + λ1T1 + λ2T2 + λ3T3 with λ1 + λ2 + λ3 = 0. Σn estimated by parametric Bootstrap (100 replications).
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 2 : Weibull model
Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20)
- ML
MM OLS AG 0.2 0.4 0.6 0.8 1.0 1.2 1.4
- ML
MM OLS AG 1 2 3 4 5 6
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 2 : Weibull model
Simulations for several values of β, η = 10, and different sample sizes n. Estimated MSE (104 replications) with standard errors in parenthesis For the estimation of β:
n=10 n=20 n=50 ML MM OLS AV ML MM OLS AV ML MM OLS AV β = 0.5 35.53 76.95 24.41 25.27 12.06 35.57 13.74 10.5 3.7 14.19 6.04 3.52 (0.91) (1.27) (0.40) (0.64) (0.26) (0.52) (0.19) (0.19) (0.07) (0.20) (0.08) (0.06) β = 1 152.4 131.6 98.1 85.5 49.2 53.6 54.2 36.9 14.4 19.3 23.9 12.8 (3.8) (3.1) (1.5) (1.7) (1.1) (1.1) (0.7) (0.7) (0.2) (0.3) (0.3) (0.2) β = 2 596.4 444.6 399.4 355.5 194.5 164.5 218 163.3 57.9 53.9 94.8 54.3 (14.4) (11.9) (6.3) (6.7) (3.8) (3.3) (2.8) (2.7) (1.0) (0.9) (1.3) (0.9) β = 3 1369 1080 905 770 452 394 486 343 128 122 211 120 (34.6) (29.7) (14.6) (18.1) (9.8) (8.9) (6.7) (6.2) (2.2) (2.0) (2.7) (1.9)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 2 : Weibull model
For the estimation of η:
n=10 n=20 n=50 ML AV ML AV ML AV β = 0.5 60.59 55.61 25.96 24.56 9.57 9.38 (1.60) (1.48) (0.53) (0.5) (0.17) (0.17) β = 1 11.15 10.88 5.53 5.43 2.23 2.22 (0.18) (0.17) (0.08) (0.08) (0.03) (0.03) β = 2 2.71 2.74 1.36 1.37 0.55 0.56 (0.04) (0.04) (0.02) (0.02) (0.01) (0.01) β = 3 1.21 1.23 0.61 0.61 0.247 0.248 (0.02) (0.02) (0.01) (0.01) (0.003) (0.004)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 3: Boolean model
Germ-grain model : Germs follow a homogeneous Poisson process with intensity ρ Grains are balls with radii distributed according to B(1, α) Two parameters : ρ and α
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 3: Boolean model
ˆ ρ1: parametric estimator of ρ (based on area and perimeter fraction) ˆ ρ2: non-parametric estimator of ρ (based on tangent lines) ˆ α1: parametric estimator of α (based on area and perimeter fraction) d = 2, k1 = 2, k2 = 1, Λ = Λmax ˆ ρAV = λ1ˆ ρ1 + λ2ˆ ρ2 with λ1 + λ2 = 1 ˆ αAV = ˆ α1 + λ1ˆ ρ1 + λ2ˆ ρ2 with λ1 + λ2 = 0 Σn estimated by parametric Bootstrap (100 replications)
ˆ ρ1 ˆ ρ2 ˆ ρAV ˆ α1 ˆ αAV ρ = 25 34.15 14.63 14.60 8.09 6.70 (0.55) (0.22) (0.22) (0.15) (0.13) ρ = 50 131.63 47.41 45.65 4.69 3.24 (2.26) (0.72) (0.67) (0.067) (0.048) ρ = 100 949 272 223 5.70 2.29 (21.8) (4.9) (3.6) (0.086) (0.034) ρ = 150 7606 1656 1005 14.7 4.1 (341) (46.5) (24.4) (0.34) (0.11)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 4 : kernel density estimation
Estimation of a density f based on a sample of size n. We choose the Gaussian kernel and we consider 4 choices of bandwidth in the function density (option bw): h1 : nrd0 (Silverman rule of thumb), h2 : nrd (a variation) h3 : ucv (unbiased cross validation), h4 : SJ (Sheather and Jones method). Denoted the initial estimators by T = (ˆ fn,h1, . . . , ˆ fn,h4)⊤, the average estimator
- f f over Λmax is
ˆ fAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T where Σ is the MISE matrix with entries
- E(ˆ
fn,hi (x) − f (x))(ˆ fn,hj (x) − f (x))dx.
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 4 : kernel density estimation
d = 1, k = 4, Λ = Λmax Σn estimated by asymptotic plug-in as in Jones and Sheater (1991)
n=250 n=500 n=1000 h1 h2 h3 h4 AV h1 h2 h3 h4 AV h1 h2 h3 h4 AV
Gauss
29.9 27.2 26.8 29.9 24.9 17.7 16.2 16.2 17.3 14.4 10.5 9.7 9.8 10.1 8.4
Mix
24.0 27.5 27.1 25.2 26.7 14.8 17.6 15.3 14.9 14.2 9.1 11.1 8.9 8.8 7.4
Gamma
28.0 32.7 29.5 28.9 27.9 17.1 20.6 17.0 17.2 15.8 10.3 12.7 10.0 10.3 9.0
Cauchy
31.2 37.0 830 132 32.8 18.9 23.2 945 180 18.7 11.4 14.4 1068 226 10.6
Table : The MISE are estimated by the mean over 104 replications of the integrated square error,
- btained by summing up the square error of 100 equally spaced points on the support of f .
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 4 : kernel density estimation
Estimated MSE of ˆ f (x) in terms of x Left : Gaussian law; Right : Mixture distribution. n = 500 ˆ fh1, ˆ fh2, ˆ fh3, ˆ fh4 : crosses (black,red,green, blue resp.) ˆ fAV : black circles
- −2
−1 1 2 2e−04 3e−04 4e−04 5e−04 6e−04 7e−04 8e−04
- −3
−2 −1 1 2 3 1e−04 2e−04 3e−04 4e−04 5e−04 6e−04
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 5: quantile estimation under misspecified models
X1, ..., Xn iid with unknown distribution µ The estimation of the p-quantile q depends on the tail of µ when p ≈ 1 (here p = 0.99) Estimators of q:
the non parametric estimator ˆ qNP = x(⌊np⌋) the parametric estimator ˆ qW associated to the Weibull distribution the parametric estimator ˆ qG associated to the Gamma distribution the parametric estimator ˆ qB associated to the Burr distribution
The parametric estimators of q are obtained from three parametric models with different tail indexes: Weibull (light-tailed), Gamma distribution (heavy-tailed) and Burr distribution (fat-tailed). Most estimators are built from misspecified models and are not consistent
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Example 5: quantile estimation under misspecified models
d = 1, k = 5, convex combination : Λ = {λ ∈ Rk : λj ≥ 0, λ⊤1 = 1} Σn estimated by Bootstrap with ˆ θ0 = ˆ qNP MSE estimation based on 104 replications. p = 0.99 and n = 1000
ˆ qW ˆ qG ˆ qB ˆ qNP ˆ qAV Weibull 21 2340 1.106 57 47 (0.30) (6.1) (1.103) (0.77) (0.70) Gamma 171 18 3.108 62 60 (1.07) (0.26) (5.106) (0.85) (0.83) Burr 896 1274 65 243 182 (4.3) (5.7) (0.96) (4.26) (2.81) Lognormal 72.9 98.7 133.9 13.8 13.8 (0.27) (0.26) (0.86) (0.22) (0.20)
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
1
Introduction
2
The method
3
Theoretical results
4
Estimation of the MSE matrix Σ
5
Generalization to several parameters
6
Simulations
7
Conclusion
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Conclusion
The oracle ˆ θ∗ = k
i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is
ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1
n P
− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Conclusion
The oracle ˆ θ∗ = k
i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is
ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1
n P
− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Conclusion
The oracle ˆ θ∗ = k
i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is
ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1
n P
− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Conclusion
The oracle ˆ θ∗ = k
i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is
ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1
n P
− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Conclusion
The oracle ˆ θ∗ = k
i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is
ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1
n P
− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion
Conclusion
The oracle ˆ θ∗ = k
i=1 λiTi with k i=1 λi = 1, i.e. λ ∈ Λmax, is
ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T For Λ ⊂ Λmax, ˆ θ∗ = λ∗⊤T with λ∗ = arg minλ∈Λ λ⊤Σλ. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute T. Simplest case: parametric Bootstrap. If ˆ ΣnΣ−1
n P
− → I, ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free. Open questions : theory when ˆ Σ is obtained by Bootstrap? when Λ is not convex? much more...
Introduction Method Theory Estimation of Σ Several parameters Simulations Conclusion