[PPT] - On the Distribution of the Adaptive LASSO Estimator U. Schneider PowerPoint Presentation

SLIDE 1

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

On the Distribution of the Adaptive LASSO Estimator

U. Schneider

(joint with B. M. P¨

tscher)

Universit¨ at Wien

Workshop on Current Trends and Challenges in Model Selection, Vienna, July 24, 2008

SLIDE 2

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Penalized ML Estimators

Linear regression model y = Xθ + u, consider estimator ˆ θ for θ ˆ θ = arg min

θ∈Rk

y − Xθ2

likelihood(LS)−part

+ λn p(θ)

penalty

λn is a tuning parameter.

Bridge estimators (lp - type penalties, Frank and Friedman, 1993, LASSO for p = 1, Tibshirani, 1996). Hard- and soft-thresholding estimators. Smoothly clipped absolute deviation (SCAD) estimator (Fan and Li, 2001). Adaptive LASSO estimator (Zou, 2006). These estimators can be viewed to simultaneously perform model selection and parameter estimation. (p ≤ 1 for Bridge est.)

SLIDE 3

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Some terminology

Conservative model selection – Zero coefficients are found with asymptotic probability less than 1. Consistent model selection – Zero coefficients are found with asymptotic probability equal to 1. Oracle property – Asymptotic distribution coincides with the

ne of the unpenalized estimator of the true model.

Consistent vs. conservative model selection is in our context driven by the asymptotic choice of tuning parameters λn. (“Sparsely” vs. “non-sparsely” tuned procedures).

SLIDE 4

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Some literature on distributional properties of PMLEs

Knight and Fu, 2000. Moving-parameter asymptotics for (non-sparsely tuned) LASSO and Bridge estimators in general. Fan and Li, 2001. Fixed-parameter asymptotics for SCAD. Zou, 2006. Fixed-parameter asymptotics for LASSO and adaptive LASSO. P¨

tscher and Leeb, 2007. Finite-sample distribution,

moving-parameter asymptotics for hard-thresholding, LASSO, and SCAD. Impossibility result for the estimation of the cdf. P¨

tscher and Schneider, 2007. Analogous results for the

adaptive LASSO. P¨

tscher and Schneider, 2008. Finite-sample and asymptotic

coverage probabilities of confidence sets for hard-thresholing, LASSO, ad. LASSO. . . .

SLIDE 5

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Definition of the adaptive LASSO estimator ˆ θAL

Linear regression model y = Xθ + u.

X is n × k, non-stochastic, rk(X) = k. u ∼ Nn (0, σ2In)

Adaptive LASSO estimator, Zou, 2006 (random penalty weights)

ˆ θAL = arg min

θ∈Rk

y − Xθ2 + 2nµ2

n k

j=1

|θj|/|ˆ θOLS,j|, µn > 0

For the theoretical analysis, assume that σ2 is known and that

X ′X is diagonal, in particular X ′X = nIk.

Remove these assumptions for simulation results concerning the finite-sample distribution.

SLIDE 6

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Explicit solution in the simplified model

Wlog consider Gaussian location model y1, . . . , yn ∼ N(θ, 1). Then ˆ

θOLS = ¯ y and

ˆ θAL =

if |¯

y| ≤ µn ¯ y − µ2

n/¯

y if |¯ y| > µn

¯ y ˆ θAL µn

SLIDE 7

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Consistency of ˆ θAL

Estimation consistency:

The condition µn → 0 is equivalent to the consistency of ˆ θAL. Then ˆ θAL is also is also uniformly consistent for θ, i.e. for all ε > 0 lim

n→∞ sup θ∈R

Pn,θ

ˆ

θAL − θ

> ε
= 0

Model selection consistency: two possible regimes arise.

1

The case µn → 0 and n1/2µn → m, 0 ≤ m < ∞, corresponds to conservative model selection (non-sparsely tuned).

2

The case µn → 0 and n1/2µn → ∞ corresponds to consistent model selection (sparsely tuned).

SLIDE 8

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

The finite-sample distribution of ˆ θAL

Fn,θ(x) = Pn,θ(n1/2(ˆ θAL − θ) ≤ x) is given by 1(n1/2θ + x ≥ 0) Φ

z(2)

n,θ(x)

+ 1(n1/2θ + x < 0) Φ
z(1)

n,θ(x)

.

z(2)

n,θ(x) and z(1) n,θ(x) are −(n1/2θ − x)/2 ±

p ((n1/2θ + x)/2)2 + nµ2

n.

dFn,θ(x) = { Φ(n1/2(−θ + µn)) − Φ(n1/2(−θ − µn)) } dδ−n1/2θ(x) + 0.5 × {1(n1/2θ + x > 0) φ

z(2)

n,θ(x)

(1 + tn,θ(x)) +

1(n1/2θ + x < 0) φ

z(1)

n,θ(x)

(1 − tn,θ(x)) } dx

where tn,θ(x) := “ ((n1/2θ + x)/2)2 + nµ2

n

”−1/2 . Φ and φ the cdf and pdf of N(0, 1), resp.

SLIDE 9

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

The finite-sample distribution of ˆ θAL

n = 40, θ = 0.05, µn = 0.05

SLIDE 10

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotics – both regimes

1 Conservative case. Fn,θ converges weakly to

 1(x ≥ 0) Φ ` x

2 +

p ( x

2 )2 + m2´

+ 1(x < 0) Φ ` x

2 −

p ( x

2 )2 + m2´

θ = 0 Φ(x) θ = 0

2 Consistent case. Fn,θ converges weakly to

 1(x ≥ 0) θ = 0 Φ(x + ρθ) θ = 0 and n1/2µ2

n → ρ

If n1/4µn → 0, Fn,θ(x) → Φ(x) for θ = 0 (“oracle property”, Zou, 2006).

SLIDE 11

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 1, µn = n−1/3 (consistent case)

SLIDE 12

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 10, µn = n−1/3 (consistent case)

SLIDE 13

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 50, µn = n−1/3 (consistent case)

SLIDE 14

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 100, µn = n−1/3 (consistent case)

SLIDE 15

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 200, µn = n−1/3 (consistent case)

SLIDE 16

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 500, µn = n−1/3 (consistent case)

SLIDE 17

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 1000, µn = n−1/3 (consistent case)

SLIDE 18

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 2000, µn = n−1/3 (consistent case)

SLIDE 19

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 5000, µn = n−1/3 (consistent case)

SLIDE 20

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 104, µn = n−1/3 (consistent case)

SLIDE 21

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 5 × 104, µn = n−1/3 (consistent case)

SLIDE 22

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 5 × 105, µn = n−1/3 (consistent case)

SLIDE 23

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 106, µn = n−1/3 (consistent case)

SLIDE 24

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 106, µn = n−1/3 (consistent case) Is the non-normality of the finite-sample distribution a transient feature as n → ∞?

SLIDE 25

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Fixed-parameter asymptotic – consistent case

n = 106, µn = n−1/3 (consistent case) Is the non-normality of the finite-sample distribution a transient feature as n → ∞? Need to look at moving-parameter asymptotics!

SLIDE 26

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics

1 Conservative case.

Let µn → 0 and n1/2µn → m, 0 ≤ m < ∞. Suppose the true parameter θn ∈ R satisfies n1/2θn → ν ∈ R ∪ {−∞, ∞}. Then FA,n,θn converges weakly to If ν ∈ R

1(ν + x ≥ 0) Φ

−(ν − x)/2 +
((ν + x)/2)2 + m2
+

1(ν + x < 0) Φ

−(ν − x)/2 −
((ν + x)/2)2 + m2
Φ(x) if |ν| = ∞.

Note: Same as finite-sample distribution, except that n1/2θn and

n1/2µn have settled down to their limiting values.

SLIDE 27

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics

1 Consistent case.

Let µn → 0 and n1/2µn → ∞. Suppose the true parameter θn ∈ R satisfies θn/µn → ζ ∈ R ∪ {−∞, ∞} and n1/2θn → ν ∈ R ∪ {−∞, ∞}. Then FA,n,θn converges weakly to If 0 < |ζ| < ∞: pointmass at −ν If |ζ| = ∞:

Φ(. + ρθ) where n1/2µ2

n → ρ.

For |ν|, |ρ| = ∞, above expressions mean total mass escaping to

±∞. Depending on ζ and ν, three possible (weak) limits arise.

Distribution collapses at a point. Total mass escapes to ±∞. Limit distribution is normal. Non-normality persists!!

SLIDE 28

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 1, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 29

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 10, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 30

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 50, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 31

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 100, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 32

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 200, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 33

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 500, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 34

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 1000, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 35

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 2000, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 36

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 5000, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 37

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 104, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 38

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Moving-parameter asymptotics – consistent case

n = 5 × 104, ζ = 0, ν = 2

(µn = n−1/3, θn = 2n−1/2)

SLIDE 39

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Uniform consistency with rate an

For which rate an is n1/2(ˆ

θAL − θ) uniformly an-consistent, i.e. lim

M→∞ sup n∈N

sup

θ∈R

Pn,θ

an
ˆ

θAL − θ

> M
= 0 ??

1 Conservative case. Rate an is O(n1/2) (see prev. theorem). 2 Consistent case. Rate an is only O(µ−1

n ).

(In a moving-parameter framework, the asymptotic distribution of µ−1

n (ˆ

θAL − θ) collapses to pointmass.)

SLIDE 40

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Other PMLEs

Results are similar for hard-thresholding, soft-thresholding (LASSO), and SCAD estimator. (P¨

tscher and Leeb, 2007).

Identical consistency results. Analogous asymptotic results.

SLIDE 41

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Confidence sets based on PMLEs

Based on P¨

tscher and Schneider, 2008.

Let Cn = [ˆ

θ − an, ˆ θ + an] be a confidence set for θ with infimal

coverage probability of at least δ, ie inf

θ∈R Pn,θ(θ ∈ Cn) ≥ δ.

For each n ∈ N, we have an,H > an,L > an,A > an,MLE for a given δ > 0 Asymptotically, the following holds.

1

Conservative case. All quantities are of the same order n−1/2.

2

Consistent case. an,H, an,L, and an,A are one order of magnitude larger than an,MLE.

SLIDE 42

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Confidence sets based on PMLEs

Plot of n1/2an against n1/2µn for δ = 0.95.

SLIDE 43

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

µn = n−1/3

θ1

SLIDE 44

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

µn = n−1/3

θ2

SLIDE 45

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

µn = n−1/3

θ3

SLIDE 46

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

µn = n−1/3

θ4

SLIDE 47

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

Choose µn through cross-validation.

θ1

SLIDE 48

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

Choose µn through cross-validation.

θ2

SLIDE 49

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

Choose µn through cross-validation.

θ3

SLIDE 50

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Simulations - remove orthogonality assumption

k = 4, n = 200, θ = (3, 1.5, 0, 0)′ + 2/n1/2(0, 0, 1, 1)′, X ′X = nΩ with Ωij = 0.5|i−j|, 1000 simulations

Choose µn through cross-validation.

θ4

SLIDE 51

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

An impossibility result on the estimation of the cdf

Results rest on Leeb and P¨

tscher, 2006.

Let µn → 0 and n1/2µn → m with 0 ≤ m ≤ ∞. Then every consistent estimator ˆ

Fn(t) of Fn,θ(t) satisfies lim

n→∞

sup

|θ|<c/n1/2 Pn,θ

ˆ

Fn(t) − Fn,θ(t)

> ε
=

1

for each ε < (Φ(t + m) − Φ(t − m))/2 and each c > 1. In particular no uniformly consistent estimator for Fn,θ(t) exists.

SLIDE 52

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

An impossibility result on the estimation of the cdf

Results rest on Leeb and P¨

tscher, 2006.

Let µn → 0 and n1/2µn → m with 0 ≤ m ≤ ∞. Then every estimator ˆ Fn(t) of Fn,θ(t) satisfies

sup

|θ|<c/n1/2 Pn,θ

ˆ

Fn(t) − Fn,θ(t)

> ε
≥

1 2

for each ε < (Φ(t + n1/2µn) − Φ(t − n1/2µn))/2, for each c > |t|, and for each fixed sample size n. This is a finite-sample result for each estimator of Fn,θ(t).

SLIDE 53

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

Conclusions

The finite-sample distribution of the adaptive LASSO estimator and other PMLEs are highly non-normal. Non-normality persists in large samples. This can be seen through a “moving-parameter” asymptotic framework. Fixed-parameter asymptotics (as underlying the oracle-proper- ty) paint a misleading picture of the performance of the estimator due to the non-uniformity of these results. Relying

n fixed-parameter asymptotics in this context is dangerous.

Confidence intervals in the consistent case are larger by one

rder of magnitude compared to unpenalized estimator.

Sparsity at all costs?

SLIDE 54

Introduction Adaptive LASSO Consistency Distributions Other PMLEs Simulations CDF Estimation Conclusion

References

J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am.
Stat. Ass., 96:1348–1360, 2001.
I. E. Frank and J. H. Friedman. A statistical view of some chemometrics regression tools (with discussion).

Technom., 35:109–148, 1993.

K. Knight and W. Fu. Asymptotics of lasso-type estimators. Ann. Stat., 28:1356–1378, 2000.
H. Leeb and B. M. P¨
tscher. Performance limits for estimators of the risk or distribution of shrinkage-type

estimators, and some general lower risk-bound results. Economet. Theory, 22:69–97, 2006.

B. M. P¨
tscher. Confidence sets based on sparse estimators are necessarily large. Manuscript, 2007.

arXiv:0711.1036.

B. M. P¨
tscher and H. Leeb. On the distribution of penalized maximum likelihood estimators: The LASSO,

SCAD, and thresholding. Manuscript, 2007. arXiv:0711.0660.

B. M. P¨
tscher and U. Schneider. Confidence sets based on penalized maximum likelihood estimators.

Manuscript, 2008. arXiv:0806.1652.

B. M. P¨
tscher and U. Schneider. On the distribution of the adaptive lasso estimator. Manuscript, 2008.

arXiv:0801.4627.

R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B, 58:267–288, 1996.
H. Zou. The adaptive lasso and its oracle properties. J. Am. Stat. Ass., 101:1418–1429, 2006.