[PPT] - The Price of Competition: Effect Size Heterogeneity Matters in High PowerPoint Presentation

SLIDE 1

The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions!

joint work with Yachong Yang and Weijie Su

Hua Wang

The Wharton School, University of Pennsylvania

June 2, 2020

Hua Wang (Wharton) The Price of Competition June 2, 2020 1 / 29

SLIDE 2

Settings: Model selection in high dimensions

High-dimensional linear regression y = X β + z n × 1 n × p p × 1 n × 1 An important question of great practical value is model selection. How hard is model selection?

Hua Wang (Wharton) The Price of Competition June 2, 2020 2 / 29

SLIDE 3

Settings: Model selection in high dimensions

High-dimensional linear regression y = X β + z n × 1 n × p p × 1 n × 1 An important question of great practical value is model selection. How hard is model selection? An intuitive answer: It depends on sparsity (as long as signals are large enough, e.g. beta-min).

Hua Wang (Wharton) The Price of Competition June 2, 2020 2 / 29

SLIDE 4

Performance criteria: FDP and TPP

Relevant variables (or signals). S = {j : βj = 0}

Hua Wang (Wharton) The Price of Competition June 2, 2020 3 / 29

SLIDE 5

Performance criteria: FDP and TPP

Relevant variables (or signals). S = {j : βj = 0} Discoveries, or model selected at λ

S = {j : ˆ

βj(λ) = 0}

Hua Wang (Wharton) The Price of Competition June 2, 2020 3 / 29

SLIDE 6

Performance criteria: FDP and TPP

Relevant variables (or signals). S = {j : βj = 0} Discoveries, or model selected at λ

S = {j : ˆ

βj(λ) = 0} FDP(λ) := #{j : j ∈ S, βj = 0} # S = 200 100 + 200 TPP(λ) := #{j : j ∈ S, βj = 0} #{j : βj = 0} = 100 300 + 100

true model estimated model

100 200 300

Hua Wang (Wharton) The Price of Competition June 2, 2020 3 / 29

SLIDE 7

Folklore theorem of signal strength

When p > n, Lasso is the popular method to do variable selection.

Hua Wang (Wharton) The Price of Competition June 2, 2020 4 / 29

SLIDE 8

Folklore theorem of signal strength

When p > n, Lasso is the popular method to do variable selection.

Belief (Some folks, nowadays)

With β0 fixed, the stronger all signals are, the better a model selector (e.g. Lasso) will perform.

Hua Wang (Wharton) The Price of Competition June 2, 2020 4 / 29

SLIDE 9

Folklore theorem of signal strength

When p > n, Lasso is the popular method to do variable selection.

Belief (Some folks, nowadays)

With β0 fixed, the stronger all signals are, the better a model selector (e.g. Lasso) will perform. Is it really the case?

Hua Wang (Wharton) The Price of Competition June 2, 2020 4 / 29

SLIDE 10

In which setting does Lasso perform best in?

n = 1000, p = 1000, s = 200, with weak noise σ = 0.01. The structure of signals: Setting 1: Strongest. Setting 2: Strong. Setting 3: Weak. Setting 4: Weakest.

Hua Wang (Wharton) The Price of Competition June 2, 2020 5 / 29

SLIDE 11

The result...

The tpp and fdp are calculated along Lasso path with λ varies from ∞ to 0.

Hua Wang (Wharton) The Price of Competition June 2, 2020 6 / 29

SLIDE 12

Surprisingly...

The tpp and fdp are calculated along Lasso path with λ varies from ∞ to 0.

Hua Wang (Wharton) The Price of Competition June 2, 2020 7 / 29

SLIDE 13

Surprisingly...

The TPP and FDP are calculated along Lasso path with λ varies from ∞ to 0.

Hua Wang (Wharton) The Price of Competition June 2, 2020 8 / 29

SLIDE 14

Surprisingly...

The TPP and FDP are calculated along Lasso path with λ varies from ∞ to 0.

Hua Wang (Wharton) The Price of Competition June 2, 2020 9 / 29

SLIDE 15

Lasso prefers weak signals??

Everything (including sparsity) except the strength of the signals are the same. The Lasso perform better with weaker signals!

Hua Wang (Wharton) The Price of Competition June 2, 2020 10 / 29

SLIDE 16

Lasso prefers weak signals??

Everything (including sparsity) except the strength of the signals are the same. The Lasso perform better with weaker signals! Our explanation: Lasso favors strong signals as we expected, but it “prefers” signals that are wildly differing with each other.

Hua Wang (Wharton) The Price of Competition June 2, 2020 10 / 29

SLIDE 17

Lasso prefers weak signals??

Everything (including sparsity) except the strength of the signals are the same. The Lasso perform better with weaker signals! Our explanation: Lasso favors strong signals as we expected, but it “prefers” signals that are wildly differing with each other. We term this diverse structure of signals as “Effect Size Heterogeneity”.

Hua Wang (Wharton) The Price of Competition June 2, 2020 10 / 29

SLIDE 18

Lasso prefers weak signals??

Everything (including sparsity) except the strength of the signals are the same. The Lasso perform better with weaker signals! Our explanation: Lasso favors strong signals as we expected, but it “prefers” signals that are wildly differing with each other. We term this diverse structure of signals as “Effect Size Heterogeneity”. With everything else fixed, Lasso performs the best with the most heterogeneous signals.

Hua Wang (Wharton) The Price of Competition June 2, 2020 10 / 29

SLIDE 19

Effect size heterogeneity matters!

Everything (including sparsity) except the strength of the signals are the same. The Lasso perform better with weaker signals! Our explanation: Lasso favors strong signals as we expected, but it “prefers” signals that are wildly differing with each other. We term this diverse structure of signals as “Effect Size Heterogeneity”. With everything else fixed, Lasso performs the best with the most heterogeneous signals.

Hua Wang (Wharton) The Price of Competition June 2, 2020 10 / 29

SLIDE 20

Effect size heterogeneity matters!

Everything (including sparsity) except the strength of the signals are the same. The Lasso perform better with weaker signals! Our explanation: Lasso favors strong signals as we expected, but it “prefers” signals that are wildly differing with each other. We term this diverse structure of signals as “Effect Size Heterogeneity”. With everything else fixed, Lasso performs the best with the most heterogeneous signals. Effect Size Heterogeneity matters!

Hua Wang (Wharton) The Price of Competition June 2, 2020 10 / 29

SLIDE 21

Which setting will Lasso perform best in? (Re-visit)

Setting 1: Most Homogeneous Setting 2: Homogeneous. Setting 3: Heterogeneous. Setting 4: Most Heterogeneous.

Hua Wang (Wharton) The Price of Competition June 2, 2020 11 / 29

SLIDE 22

Theory of Lasso in literature

Belief (Literature1, nowadays (informal))

Given the information of k = β0, and the structure of X (n, p, RIP conditions, etc.), we can understand Lasso (as a model selector) well, especially if signals are sufficiently large (beta-min condition).

1e.g. E. Candes, T. Tao 2007; PJ. Bickel, Y. Ritov, AB. Tsybakov 2009;

MJ. Wainwright 2009...

Hua Wang (Wharton) The Price of Competition June 2, 2020 12 / 29

SLIDE 23

Theory of Lasso in literature

Belief (Literature1, nowadays (informal))

Given the information of k = β0, and the structure of X (n, p, RIP conditions, etc.), we can understand Lasso (as a model selector) well, especially if signals are sufficiently large (beta-min condition).

Theorem (W., Yang and Su, 2020 (informal))

The information of (β0, X) is not enough, we need to know more about the inner structure of β.

1e.g. E. Candes, T. Tao 2007; PJ. Bickel, Y. Ritov, AB. Tsybakov 2009;

MJ. Wainwright 2009...

Hua Wang (Wharton) The Price of Competition June 2, 2020 12 / 29

SLIDE 24

Main results

Assume X has iid N(0, 1/n) entries, σ = 0, i.e. noise zi = 0, regression coefficients βi are iid from prior Π with EΠ2 < ∞ and P(Π = 0) = ǫ ∈ (0, 1), n/p → δ ∈ (0, ∞). Then

Theorem (W., Yang and Su, 2020+)

With probability tending to one, q△(TPP(λ)) − 0.001 ≤ FDP(λ) ≤ q▽(TPP(λ)) + 0.001 uniformly for all λ, where q△(·) = q△(·; δ, ǫ) > 0 and q▽(·) = q▽(·; δ, ǫ) < 1 are two deterministic function.

Hua Wang (Wharton) The Price of Competition June 2, 2020 13 / 29

SLIDE 25

The Lasso Crescent

Lasso Crescent

Unachievable Zone

𝒓∆ 𝒓𝛂

1

FDP TPP

Hua Wang (Wharton) The Price of Competition June 2, 2020 14 / 29

SLIDE 26

The sharpest of the Lasso Crescent

Definition (most favorable prior)

For M > 0 and an integer m > 0, we call the following the (ǫ, m, M)-prior: Π△ =              w.p. 1 − ǫ M w.p.

ǫ m

M2 w.p.

ǫ m

· · · · · · Mm w.p.

ǫ m .

Definition (least favorable prior)

For M > 0, we call the following the (ǫ, M)-prior: Π∇ =

w.p. 1 − ǫ

M w.p. ǫ.

Theorem (Effect Size Heterogeneity Matters!)

The Π▽ achieves q▽, and Π△ achieves q△, as M, m → ∞.

Hua Wang (Wharton) The Price of Competition June 2, 2020 15 / 29

SLIDE 27

The Lasso Crescent (Re-visit)

Lasso Crescent

Unachievable Zone

𝒓∆ 𝒓𝛂

1

FDP TPP

Hua Wang (Wharton) The Price of Competition June 2, 2020 16 / 29

SLIDE 28

Remarks on the results

Theorem (W., Yang and Su, 2020+)

With probability tending to one, q△(TPP(λ)) − 0.001 ≤ FDP(λ) ≤ q▽(TPP(λ)) + 0.001 for all λ > 0.01, where q△(·) and q▽(·) are two deterministic function. And the Π▽ (absolutely homogeneous) gives q▽, and Π△ (absolutely heterogeneous) gives q△.

Hua Wang (Wharton) The Price of Competition June 2, 2020 17 / 29

SLIDE 29

Remarks on the results

Theorem (W., Yang and Su, 2020+)

With probability tending to one, q△(TPP(λ)) − 0.001 ≤ FDP(λ) ≤ q▽(TPP(λ)) + 0.001 for all λ > 0.01, where q△(·) and q▽(·) are two deterministic function. And the Π▽ (absolutely homogeneous) gives q▽, and Π△ (absolutely heterogeneous) gives q△. 0.001 can be any small number.

Hua Wang (Wharton) The Price of Competition June 2, 2020 17 / 29

SLIDE 30

Remarks on the results

Theorem (W., Yang and Su, 2020+)

With probability tending to one, q△(TPP(λ)) − 0.001 ≤ FDP(λ) ≤ q▽(TPP(λ)) + 0.001 for all λ > 0.01, where q△(·) and q▽(·) are two deterministic function. And the Π▽ (absolutely homogeneous) gives q▽, and Π△ (absolutely heterogeneous) gives q△. 0.001 can be any small number. Lower curve q△(·) is first discovered in (W. Su, M. Bogdan,

E. Cand`

es 2017), yet here we first prove it is tight and uniformly achieveable by some prior.

Hua Wang (Wharton) The Price of Competition June 2, 2020 17 / 29

SLIDE 31

Remarks on the results

Theorem (W., Yang and Su, 2020+)

With probability tending to one, q△(TPP(λ)) − 0.001 ≤ FDP(λ) ≤ q▽(TPP(λ)) + 0.001 for all λ > 0.01, where q△(·) and q▽(·) are two deterministic function. And the Π▽ (absolutely homogeneous) gives q▽, and Π△ (absolutely heterogeneous) gives q△. 0.001 can be any small number. Lower curve q△(·) is first discovered in (W. Su, M. Bogdan,

E. Cand`

es 2017), yet here we first prove it is tight and uniformly achieveable by some prior. Approximate message passing (Donoho et al, 2009).

Hua Wang (Wharton) The Price of Competition June 2, 2020 17 / 29

SLIDE 32

The first false variable

Let T denotes the number of true variables before the first false variable (including itself). i.e. T :=

β(λ∗ − 0)
0 =
β(λ∗)
0 + 1,

where λ∗ is the first time along the Lasso path when a false variable is about to be selected: λ∗ = sup{λ : there exists 1 ≤ i ≤ p such that βi(λ) = 0, βi = 0}. Intuitively, the larger T is, the better the performance as a model selector.

Hua Wang (Wharton) The Price of Competition June 2, 2020 18 / 29

SLIDE 33

The most favorable and least favorable prior (Re-visit)

Recall the most favorable and least favorable prior we defined.

Definition (most favorable prior)

For M > 0 and an integer m > 0, we call the following the (ǫ, m, M)-prior: Π△ =              w.p. 1 − ǫ M w.p.

ǫ m

M2 w.p.

ǫ m

· · · · · · Mm w.p.

ǫ m .

Definition (least favorable prior)

For M > 0, we call the following the (ǫ, M)-prior: Π∇ =

w.p. 1 − ǫ

M w.p. ǫ.

Hua Wang (Wharton) The Price of Competition June 2, 2020 19 / 29

SLIDE 34

The most favorable and least favorable prior (Re-visit)

Recall the most favorable and least favorable prior we defined. Any critique?

Definition (most favorable prior)

For M > 0 and an integer m > 0, we call the following the (ǫ, m, M)-prior: Π△ =              w.p. 1 − ǫ M w.p.

ǫ m

M2 w.p.

ǫ m

· · · · · · Mm w.p.

ǫ m .

Definition (least favorable prior)

For M > 0, we call the following the (ǫ, M)-prior: Π∇ =

w.p. 1 − ǫ

M w.p. ǫ.

Hua Wang (Wharton) The Price of Competition June 2, 2020 19 / 29

SLIDE 35

The most favorable and least favorable prior (Re-visit)

Recall the most favorable and least favorable prior we defined. Any critique? One might argue that the actually signal should be fixed!

Definition (most favorable prior)

For M > 0 and an integer m > 0, we call the following the (ǫ, m, M)-prior: Π△ =              w.p. 1 − ǫ M w.p.

ǫ m

M2 w.p.

ǫ m

· · · · · · Mm w.p.

ǫ m .

Definition (least favorable prior)

For M > 0, we call the following the (ǫ, M)-prior: Π∇ =

w.p. 1 − ǫ

M w.p. ǫ.

Hua Wang (Wharton) The Price of Competition June 2, 2020 19 / 29

SLIDE 36

The best T via heterogeneous signal

The following considered a typical realization of most favorable prior.

Proposition (The (fixed) most heterogeneous signal)

Consider fixed signal structure βj = Mk+1−j for 1 ≤ j ≤ k, and βj = 0 for j > k. When M is sufficiently large, the rank T satisfies: T ≥ (1 + op(1)) n 2 log p a.s.

Hua Wang (Wharton) The Price of Competition June 2, 2020 20 / 29

SLIDE 37

The best T via heterogeneous signal

The following considered a typical realization of most favorable prior.

Proposition (The (fixed) most heterogeneous signal)

Consider fixed signal structure βj = Mk+1−j for 1 ≤ j ≤ k, and βj = 0 for j > k. When M is sufficiently large, the rank T satisfies: T ≥ (1 + op(1)) n 2 log p a.s.

Theorem (The most favorable is the most favorable)

For arbitrary regression coefficients β with sparsity satisfying k ≤ ǫp, the rank T of the first false variable selected by the Lasso satisfies T ≤ (1 + oP(1)) n 2 log p a.s.

Hua Wang (Wharton) The Price of Competition June 2, 2020 20 / 29

SLIDE 38

The homogeneous signal gives early false discovery

Proposition (W. Su 2018)

Consider the fixed signal structure as βj = M for 1 ≤ j ≤ k, and βj = 0 for j > k. The rank T satisfies log T = (1 + oP(1))

2δ log p

ǫ . It is much earlier than that of heterogeneous signal: e(1+oP(1))

2δ log p

ǫ

≪ (1 + oP(1)) n 2 log p,

Hua Wang (Wharton) The Price of Competition June 2, 2020 21 / 29

SLIDE 39

Simulation: Rank of the first false discovery by Lasso

25 50 75 100 125 50 100 150 200 Sparsity Rank

Heterogeneous Homogeneous

25 50 75 100 125 40 80 120 Sparsity Rank

Heterogeneous Homogeneous

Left: n = 1000, p = 1000, σ = 1, Right: n = 800, p = 1200, σ = 1. All averaged over 500 replicates.

Hua Wang (Wharton) The Price of Competition June 2, 2020 22 / 29

SLIDE 40

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance.

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 41

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value.

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 42

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value. X T

j (y − X ˆ

β)

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 43

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value. X T

j (y − X ˆ

β) ≈ X T

j (y − Xβ ˆ S)

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 44

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value. X T

j (y − X ˆ

β) ≈ X T

j (y − Xβ ˆ S) = X T j XS\ ˆ SβS\ ˆ S

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 45

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value. X T

j (y − X ˆ

β) ≈ X T

j (y − Xβ ˆ S) = X T j XS\ ˆ SβS\ ˆ S ∼ N(0, βS\ ˆ S2/n).

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 46

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value. X T

j (y − X ˆ

β) ≈ X T

j (y − Xβ ˆ S) = X T j XS\ ˆ SβS\ ˆ S ∼ N(0, βS\ ˆ S2/n).

When homogeneous, standard deviation is 1/√nβS\ ˆ

S ≈

k/n sup

j∈S\ ˆ S

βj.

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 47

Explanation: the price of competition

Well-matched (unselected) signals are the cause for bad performance. Consider ˆ β(λ), with support ˆ S ⊂ S. The next variable j ∈ S will be falsely selected if it has the largest X T

j (y − X ˆ

β) in absolute value. X T

j (y − X ˆ

β) ≈ X T

j (y − Xβ ˆ S) = X T j XS\ ˆ SβS\ ˆ S ∼ N(0, βS\ ˆ S2/n).

When homogeneous, standard deviation is 1/√nβS\ ˆ

S ≈

k/n sup

j∈S\ ˆ S

βj. When heterogeneous, standard deviation is 1/√nβS\ ˆ

S ≈ 1/√n sup j∈S\ ˆ S

βj

Hua Wang (Wharton) The Price of Competition June 2, 2020 23 / 29

SLIDE 48

Reflections on the assumptions

Assume X has iid N(0, 1/n) entries, σ = 0, i.e. noise zi = 0, regression coefficients βi are iid from prior Π with EΠ2 < ∞ and P(Π = 0) = ǫ ∈ (0, 1), n/p → δ ∈ (0, ∞). Then

Theorem (W., Yang and Su, 2020+)

With probability tending to one, q△(TPP(λ)) − 0.001 ≤ FDP(λ) ≤ q▽(TPP(λ)) + 0.001 for all λ > 0.01, where q△(·) = q△(·; δ, ǫ) > 0 and q▽(·) = q▽(·; δ, ǫ) < 1 are two deterministic function.

Hua Wang (Wharton) The Price of Competition June 2, 2020 24 / 29

SLIDE 49

The 4 settings (Re-visit)

Setting 1: Most Homogeneous Setting 2: Homogeneous. Setting 3: Heterogeneous. Setting 4: Most Heterogeneous.

Hua Wang (Wharton) The Price of Competition June 2, 2020 25 / 29

SLIDE 50

Non-Gaussian design matrix: The same phenomenon

0.00 0.05 0.10 0.15 0.20 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

0.00 0.05 0.10 0.15 0.20 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

n = 1000, p = 1000, k = 200, σ = 0. Consider 4 different structure of signals. Left: Autoregressive design matrix with ρ = 0.5 ; Right: Bernoulli design matrix with success prob = 0.5

Hua Wang (Wharton) The Price of Competition June 2, 2020 26 / 29

SLIDE 51

Real data as design matrix: Still the same phenomenon

0.0 0.1 0.2 0.3 0.4 0.5 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

0.0 0.1 0.2 0.3 0.4 0.5 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Use the HIV real data.n = 634, p = 463,σ = 0, with 4 different signal structures. Left: Original HIV data as X design matrix; Right: perturbed X design matrix with unit Gaussian noise then re-normalize.

Hua Wang (Wharton) The Price of Competition June 2, 2020 27 / 29

SLIDE 52

From noiseless to noisy: Still similar phenomenon!

Setting 1: No noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Hua Wang (Wharton) The Price of Competition June 2, 2020 28 / 29

SLIDE 53

From noiseless to noisy: Still similar phenomenon!

Setting 1: No noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Setting 2: Small noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Hua Wang (Wharton) The Price of Competition June 2, 2020 28 / 29

SLIDE 54

From noiseless to noisy: Still similar phenomenon!

Setting 1: No noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Setting 2: Small noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Setting 3: Moderate Noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Hua Wang (Wharton) The Price of Competition June 2, 2020 28 / 29

SLIDE 55

From noiseless to noisy: Still similar phenomenon!

Setting 1: No noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Setting 2: Small noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Setting 3: Moderate Noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Setting 4: Large Noise.

0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.25 0.50 0.75 1.00 TPP FDP

Most Heterogeneous Heterogeneous Homogeneous Most Homogeneous

Hua Wang (Wharton) The Price of Competition June 2, 2020 28 / 29

SLIDE 56

Take-home messages

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 57

Take-home messages

Our results: Always Effect Size Heterogeneity Matters

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 58

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 59

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 60

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 61

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods Possible future work: Quantify its effect.

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 62

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods Possible future work: Quantify its effect. Some Methodologies?

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 63

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods Possible future work: Quantify its effect. Some Methodologies? General design?

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 64

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods Possible future work: Quantify its effect. Some Methodologies? General design? The same phenomenon in other methods?

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 65

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods Possible future work: Quantify its effect. Some Methodologies? General design? The same phenomenon in other methods? Thanks!

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29

SLIDE 66

Take-home messages

Our results: Always Effect Size Heterogeneity Matters Perspectives The TPP-FDP tradeoff curve The rank of the first false selection Occurs when Non-vanishing sparsity ratio Local orthogonal Gaussian design Cause The price of competition of Lasso Inherent shrinkage of ℓ1-methods Possible future work: Quantify its effect. Some Methodologies? General design? The same phenomenon in other methods? Thanks! :)

Hua Wang (Wharton) The Price of Competition June 2, 2020 29 / 29