Improving the Performance of the FDR Procedure Using an Estimator - - PowerPoint PPT Presentation

▶

Jun 01, 2023 333 likes •509 views

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 15, 2009 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR

SLIDE 1

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Amit Zeisel, Or Zuk, Eytan Domany

W.I.S.

June 15, 2009

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 1 / 17

SLIDE 2

Introduction/Motivation

The vast use of high throughput technologies involves testing thousands of hypotheses simultaneously. The field of multiple testing deals with developing methods to determine the level of significance in such a scenario.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 2 / 17

SLIDE 3

Multiple testing

m null hypotheses H0,i, ∀i = 1, 2, ....m. Example: for m variables H0,i : µA

i = µB i . Calculate p-values pi, and set a

threshold for significance. The set of random variables (U, V , T, S), and parameters (m, m0, m1) describes this scenario: ”ground truth” non-rejected rejected total hypotheses hypotheses null hypothesis is true U V m0 null hypothesis is false T S m1 total m − R R m Fraction of false discoveries = V

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 3 / 17

SLIDE 4

Control of the FDR=E( V

R+)

In 1995 Benjamini and Hochberg proposed a procedure (BH95) to control the FDR for a given set of p-values:

Sort and re-label the p-values, p(1) ≤ p(2) ≤ ... ≤ p(m).

Choose 0 ≤ q ≤ 1 the desired FDR level.

Define the set of constants αi = i

mq,

i = 1, 2, , , , m.

Identify R = max{i : p(i) ≤ αi}.

If R ≥ 1 reject all hypotheses (i) = 1, 2, , , R, else no hypothesis is rejected. BH proved that FDR ≤ m0q

m ≤ q.

The bound is not tight, there is room for improvement.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 4 / 17

SLIDE 5

Control vs. estimation, and improved procedures

Control (q ⇒ R) - significance is preset at a desired level q. The procedure yields a set of rejected hypotheses with FDR≤ q. Estimation (R ⇒ q) - the threshold is preset at a level that yields a desired number of rejections R. The corresponding FDR is estimated. There were many attempts to produce tighter bounds on the FDR, using an estimator for m0. Difficulty: the estimator ˆ m0 is a fluctuating random variable.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 5 / 17

SLIDE 6

Aim Produce an improved BH procedure using an estimator ˆ m0 for m0, the number of true null hypotheses.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 6 / 17

SLIDE 7

Definitions: monotonic estimator

An estimator for m0 is a family of functions ˆ m0 ≡ ˆ m(m) : [0, 1]m → R, ˆ m0 ≡ ˆ m0(p1, .., pm). ˆ m0 is a monotonic estimator if it satisfies:

ˆ m(m) (p1, .., pi, .., pm) ≥ ˆ m(m) (p1, .., p′

i, .., pm),

if pi ≥ p′

∀ i = 1, 2, , , m, m ≥ 1

ˆ m(m) (p1, .., pi, .., pm) ≥ ˆ m(m−1) (p1, .., pi−1, pi+1, .., pm), ∀ i = 1, 2, , , m, m ≥ 2

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 7 / 17

SLIDE 8

Definitions: modified BH procedure (m → ˆ m0)

Given m hypotheses of which m0 are null, let p1, .., pm be the respective p-values. The modified BH procedure with estimator ˆ m0 is:

Compute ˆ m0 ≡ ˆ m0(p1, .., pm).

Sort and relabel the p-values p(1) ≤ ... ≤ p(m).

Define the set of constants qk = qk

ˆ m0

k = 1, 2..., m.

Let R = max{k : p(k) ≤ qk}.

If R ≥ 1 reject p(1), .., p(R) else don’t reject any hypothesis.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 8 / 17

SLIDE 9

Theorem for control (see Benjamini et al 2006 (BKY))

Let ˆ m0 ≡ ˆ m0(p1, .., pm) be a monotonic estimator for m0. Let ˆ m(1)

0 (p1, .., pm) ≡ ˆ

m0(p2, .., pm) be the same estimator, but disregarding the first p-value p1. Assume that the null p-values are i.i.d. U[0, 1]. Then the modified BH procedure satisfies: FDR = E V R+

≤ m0qE
1

ˆ m(1)

Note: if E

ˆ m(1)

1 m0, then FDR≤ q.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 9 / 17

SLIDE 10

Two improved procedures

The two procedures are based on the estimators:

IBHsum: ˆ m0 = C(m) · min

m, max
s(m), 2 m

j=1 pj

C(m), s(m) are universal correction factors.

10 2 10 3 10 4 10 5 1 1.01 1.02 1.03 C (a) 10 2 10 3 10 4 10 5 0.1 0.2 0.3 m s/m (b)

IBHlog: ˜ m0 = 2 − m

i=1 log(1 − pi) .

Both procedures satisfy E(V /R+) ≤ q

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 10 / 17

SLIDE 11

Performance: simulations (IBHsum)

ρ is the correlation between test statistics:

0.02 0.03 0.03 . 4 0.04 . 4 0.05 . 5 0.06 m0/m µ1

q=0.05, ρ=0.8

0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4 . 6 . 6 0.08 . 8 0.1 0.1 0.1 0.12 . 1 4 0.16 0.18 m0/m µ1

q=0.2, ρ=0.8

0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4 m0/m µ1

q=0.05, ρ=0

0.015 . 2 5 0.03 . 3 5 0.035 . 4 0.04 0.045 0.045 0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4 m0/m µ1

q=0.2, ρ=0

. 6 0.08 . 1 0.12 0.12 . 1 4 0.14 0.16 0.16 . 1 8 . 1 8 0.18 0.2 0.4 0.6 0.8 1 1 1.5 2 2.5 3 3.5 4

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 11 / 17

SLIDE 12

Compare to other methods

Results from simulations, m = 500, µ1 = 3.5:

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14

ρ=0

E(V/R) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 E(S)/m1

m0/m

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14

ρ=0.8

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

m0/m

Oracle BH (1995) BKY (Benjamini et al 2006) STS (Storey et al 2004) IBHsum IBHlog

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 12 / 17

SLIDE 13

Applying to 33 gene expression datasets

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Large number of discoveries p 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Small number of discoveries 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 i/m p 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 i/m One tailed Two tailed Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 13 / 17

SLIDE 14

q BKY STS IBHsum IBHlog

a. Two tailed, large number of discoveries (10 studies)

0.05 1.110 1.239 1.200 1.222 (0.043) (0.138) (0.110) (0.130)

b. Two tailed, small number of discoveries (10 studies)

0.05 1.003 1.316 1.231 1.291 (0.003) (0.197) (0.140) (0.179)

c. One tailed, large number of discoveries (8 studies)

0.05 1.049 1.011 1.014 0.108 (0.019) (0.033) (0.026) (0.306)

d. One tailed, small number of discoveries (5 studies)

0.05 0.998 1.027 1.025 0.882 (0.020) (0.052) (0.017) (0.123)

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 14 / 17

SLIDE 15

Summary and conclusions

We proved a theorem that provides a bound on the FDR for any improved procedure based on an monotonic estimator of m0. We proposed two improved procedures based on the estimators 2 m

j=1 pj and m i=1 log(1 − pi).

For the case of independent statistics all improved procedures provide similar results: saturation of the bound and more power than BH95. We showed by simulations that even in the case of dependent statistics our procedures provide a reliable bound and improved power. For real gene expression data, where dependencies are expected, our methods improve, in general, over existing ones.

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 15 / 17

SLIDE 16

Theorem for monotonicity

Let p = (p1..m) be a set of independent p-values. Assume that f , the marginal probability density function of the alternatives, is monotonically non-increasing and differentiable. Let B(i) be two threshold FDR procedures rejecting R(i)( p) hypotheses and each having FDR(i), i = 1, 2. Assume that for any q, R(1)( p) ≤ R(2)( p), ∀

p. Then it also holds that

FDR(1) ≤ FDR(2).

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 16 / 17

SLIDE 17

THANKS

Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving the Performance of the FDR Procedure Using an Estimator for the Number of T June 15, 2009 17 / 17