An R package for analyzing truncated data Alvarez 1 and Rosa M. - - PowerPoint PPT Presentation

an r package for analyzing truncated data
SMART_READER_LITE
LIVE PREVIEW

An R package for analyzing truncated data Alvarez 1 and Rosa M. - - PowerPoint PPT Presentation

An R package for analyzing truncated data Alvarez 1 and Rosa M. Crujeiras 2 na- Carla Moreira 1 , Jacobo de U 1 Department of Statistics and OR, University of Vigo 2 Department of Statistics and OR, University of Santiago de Compostela


slide-1
SLIDE 1

An R package for analyzing truncated data

Carla Moreira1∗, Jacobo de U˜ na-´ Alvarez1 and Rosa M. Crujeiras2

1 Department of Statistics and OR, University of Vigo 2 Department of Statistics and OR, University of Santiago de Compostela ∗carlamgmm@gmail.com

slide-2
SLIDE 2

Introduction Algorithms for DTD Package description Conclusions

Outline

1

Introduction

2

Algorithms for DTD

3

Package description

4

Conclusions

Moreira et al. useR! 2009 DTDA package 2/30

slide-3
SLIDE 3

Introduction Algorithms for DTD Package description Conclusions

Motivation examples

Astronomy Epidemiology Economy Survival Analysis In these cases, we must apply specialized statistical models and methods due the need to accommodate the event of losses in the sample, such as grouping, censoring or truncation.

Moreira et al. useR! 2009 DTDA package 3/30

slide-4
SLIDE 4

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

Moreira et al. useR! 2009 DTDA package 4/30

slide-5
SLIDE 5

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

t1 Moreira et al. useR! 2009 DTDA package 4/30

slide-6
SLIDE 6

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

t1 t2 Observational Window Moreira et al. useR! 2009 DTDA package 4/30

slide-7
SLIDE 7

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

Let X∗ be the ultimate time of interest with df F (U∗, V ∗) the pair of truncation times, with joint df K We observe (U∗, X∗, V ∗) if and only if U∗ ≤ X∗ ≤ V ∗ Let (Ui, Xi, Vi), i = 1, ..., n be the observed data. Under the assumption of independence between X∗ and (U∗, V ∗): The full likelihood is given by: Ln(f, k) =

n

  • j=1

fjkj n

i=1 Fiki

Moreira et al. useR! 2009 DTDA package 5/30

slide-8
SLIDE 8

Introduction Algorithms for DTD Package description Conclusions

Truncation Scheme

Where: f = (f1, f2, ..., fn) k = (k1, k2, ..., kn) Fi = n

m=1 fmJim

and Jim = I[Ui≤Xm≤Vi] = 1 if Ui ≤ Xm ≤ Vi,

  • r zero otherwise.

As noted by Shen (2008): Ln(f, k) =

n

  • j=1

fj Fj ×

n

  • j=1

Fjkj n

i=1 Fiki

= L1(f) × L2(f, k)

Moreira et al. useR! 2009 DTDA package 6/30

slide-9
SLIDE 9

Introduction Algorithms for DTD Package description Conclusions

Efron-Petrosian estimators

The condicional NPMLE of F (Efron-Petrosian, 1999) is defined as the maximizer of L1(f). 1 ˆ fj =

n

  • i=1

Jij × 1 ˆ Fi , j = 1, ..., n where ˆ Fi =

n

  • m=1

ˆ fmJim. This equation was used by Efron and Petrosian (1999) to introduce the EM algorithm to compute ˆ f.

Moreira et al. useR! 2009 DTDA package 7/30

slide-10
SLIDE 10

Introduction Algorithms for DTD Package description Conclusions

EM algorithm from Efron and Petrosian (1999)

  • EP1. Compute the initial estimate ˆ

F(0) corresponding to ˆ f(0) = (1/n, ..., 1/n);

  • EP2. Apply (1) to get an improved estimator ˆ

f(1) to compute the ˆ F(1) pertaining to ˆ f(1);

  • EP3. Repeat Step EP2 until convergence criterion is reached.

Moreira et al. useR! 2009 DTDA package 8/30

slide-11
SLIDE 11

Introduction Algorithms for DTD Package description Conclusions

Shen Estimator

Interchanging the roles of X’s and (Ui, Vi): Ln(f, k) =

n

  • j=1

kj Kj ×

n

  • j=1

Kjfj n

i=1 Kifi

= L1(k) × L2(k, f) where Ki =

n

  • m=1

kmI[Um≤Xi≤Vm] =

n

  • m=1

kmJim and maximizing L1(k): 1 ˆ kj =

n

  • i=1

Jji 1 ˆ Ki , j = 1, ..., n with ˆ Ki =

n

  • m=1

ˆ kmJim.

Moreira et al. useR! 2009 DTDA package 9/30

slide-12
SLIDE 12

Introduction Algorithms for DTD Package description Conclusions

Shen Estimator

Shen (2008) showed that the solutions are the unconditional NPMLE of F and K, respectively, and both estimators can be

  • btained by:

ˆ fj = n

  • i=1

1 ˆ Kj −1 1 ˆ Kj , j = 1, ..., n ˆ kj = n

  • i=1

1 ˆ Fj −1 1 ˆ Fj , j = 1, ..., n

Moreira et al. useR! 2009 DTDA package 10/30

slide-13
SLIDE 13

Introduction Algorithms for DTD Package description Conclusions

EM algorithm from Shen (2008)

  • S1. Compute the initial estimate ˆ

F(0) corresponding to ˆ f(0) = (1/n, ..., 1/n);

  • S2. Apply (4) to get the first step estimator ˆ

k(1) and compute the ˆ K(1) pertaining to ˆ k(1);

  • S3. Apply (3) to get the first step estimator ˆ

f(1) and its corresponding ˆ F(1);

  • S4. Repeat Steps S2 and S3 until convergence criterion is reached.

Moreira et al. useR! 2009 DTDA package 11/30

slide-14
SLIDE 14

Introduction Algorithms for DTD Package description Conclusions

DTDA-package

efron.petrosian(X,...) lynden(X,...) shen(X,...)

Moreira et al. useR! 2009 DTDA package 12/30

slide-15
SLIDE 15

Introduction Algorithms for DTD Package description Conclusions

DTDA-package

efron.petrosian(X,...) lynden(X,...) shen(X,...) 3 examples data sets with X ∼ Unif(0,1)and:

Ex.1 U ∼ Unif(0,0.5), V ∼ Unif(0.5,1) Ex.2 U ∼ Unif(0,0.25), V ∼ Unif(0.75,1) Ex.3 U ∼ Unif(0,0.67), V ∼ Unif(0.33,1)

Moreira et al. useR! 2009 DTDA package 13/30

slide-16
SLIDE 16

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under double truncation

EX.1-50% of truncation efron.petrosian(X,U,V,. . .)

>iter >f >FF >S >Sob >upperF >lowerF >upperS >lowerS

0.2 0.6 1.0 0.0 0.2 0.4 0.6 0.8 1.0

EP estimator

Time of interest 0.2 0.6 1.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 14/30

slide-17
SLIDE 17

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under double truncation

0.2 0.6 1.0 0.0 0.2 0.4 0.6 0.8 1.0

EP estimator

Time of interest 0.2 0.6 1.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 15/30

slide-18
SLIDE 18

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under left truncation

EX.1 efron.petrosian(X,U,...)

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

EP estimator

Time of interest 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 16/30

slide-19
SLIDE 19

Introduction Algorithms for DTD Package description Conclusions

efron.petrosian illustration under left truncation

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

EP estimator

Time of interest 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 17/30

slide-20
SLIDE 20

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under double truncation

EX.2-25% of truncation lynden(X,U,V,...)

>iter >NJ >f >FF >h >S >Sob >upperF >lowerF >upperS >lowerS

0.2 0.6 0.0 0.2 0.4 0.6 0.8 1.0

EP estimator

Time of interest 0.2 0.6 0.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 18/30

slide-21
SLIDE 21

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under double truncation

0.2 0.6 0.0 0.2 0.4 0.6 0.8 1.0

EP estimator

Time of interest 0.2 0.6 0.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 19/30

slide-22
SLIDE 22

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under right truncation

EX.2 lynden(X,V,...)

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 20/30

slide-23
SLIDE 23

Introduction Algorithms for DTD Package description Conclusions

lynden illustration under right truncation

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0

Survival

Time of interest

Moreira et al. useR! 2009 DTDA package 21/30

slide-24
SLIDE 24

Introduction Algorithms for DTD Package description Conclusions

shen illustration under double truncation

EX.3-67% of truncation shen(X,U,V...)

>iter >f >FF >S >Sob >k >fU >fV >upperF >lowerF >upperS >lowerS

0.2 0.4 0.6 0.8 0.0 0.4 0.8

Shen estimator

Time of interest 0.2 0.4 0.6 0.8 0.2 0.6 1.0

Survival

Time of interest 0.2 0.4 0.6 0.8 0.0 0.4 0.8

Marginal U

Time of interest 0.2 0.4 0.6 0.8 0.0 0.4 0.8

Marginal V

Time of interest

Moreira et al. useR! 2009 DTDA package 22/30

slide-25
SLIDE 25

Introduction Algorithms for DTD Package description Conclusions

shen illustration under double truncation

0.2 0.4 0.6 0.8 0.0 0.4 0.8

Shen estimator

Time of interest 0.2 0.4 0.6 0.8 0.2 0.6 1.0

Survival

Time of interest 0.2 0.4 0.6 0.8 0.0 0.4 0.8

Marginal U

Time of interest 0.2 0.4 0.6 0.8 0.0 0.4 0.8

Marginal V

Time of interest

Moreira et al. useR! 2009 DTDA package 23/30

slide-26
SLIDE 26

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms for analyzing randomly truncated data, one-sided and two-sided (i.e. doubly) truncated data being allowed.

Moreira et al. useR! 2009 DTDA package 24/30

slide-27
SLIDE 27

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms for analyzing randomly truncated data, one-sided and two-sided (i.e. doubly) truncated data being allowed. This package incorporates the functions efron.petrosian, lynden and shen, which call the iterative methods introduced by Efron and Petrosian (1999)and Shen (2008).

Moreira et al. useR! 2009 DTDA package 25/30

slide-28
SLIDE 28

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms for analyzing randomly truncated data, one-sided and two-sided (i.e. doubly) truncated data being allowed. This package incorporates the functions efron.petrosian, lynden and shen, which call the iterative methods introduced by Efron and Petrosian (1999)and Shen (2008). Estimation of the lifetime and truncation times distributions is possible, together with the corresponding pointwise confidence limits based on the bootstrap.

Moreira et al. useR! 2009 DTDA package 26/30

slide-29
SLIDE 29

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms for analyzing randomly truncated data, one-sided and two-sided (i.e. doubly) truncated data being allowed. This package incorporates the functions efron.petrosian, lynden and shen, which call the iterative methods introduced by Efron and Petrosian (1999)and Shen (2008). Estimation of the lifetime and truncation times distributions is possible, together with the corresponding pointwise confidence limits based on the bootstrap. Plots of cumulative distributions and survival functions are provided.

Moreira et al. useR! 2009 DTDA package 27/30

slide-30
SLIDE 30

Introduction Algorithms for DTD Package description Conclusions

Summary

The DTDA package provides different algorithms for analyzing randomly truncated data, one-sided and two-sided (i.e. doubly) truncated data. This package incorporates the functions efron.petrosian, lynden and shen, which call the iterative methods introduced by Efron and Petrosian (1999)and Shen (2008). Estimation of the lifetime and truncation times distributions is possible, together with the corresponding pointwise confidence limits based on the bootstrap. Plots of marginal cumulative distributions and survival functions are provided. There are no R packages with double truncation scheme.

Moreira et al. useR! 2009 DTDA package 28/30

slide-31
SLIDE 31

Introduction Algorithms for DTD Package description Conclusions

Acknowledgments

Work supported by the research Grant MTM2008-03129 and MTM2008-0310 of the Spanish Ministerio de Ciencia e Innovaci´

  • n

Grant PGIDIT07PXIB300191PR of the Xunta de Galicia

Moreira et al. useR! 2009 DTDA package 29/30

slide-32
SLIDE 32

Introduction Algorithms for DTD Package description Conclusions

References

Efron, B. and Petrosian, V. (1999) Nonparametric methods for doubly truncated data. Journal of the American Statistical Association, 94, 824-834. Lynden-Bell, D. (1971) A method of allowing for known observational selection in small samples applied to 3CR quasars.

  • Mon. Not. R. Astr. Soc, 155, 95-118.

Moreira, C. and de U˜ na ´ Alvarez, J.(Under revision) Bootstrapping the NPMLE for doubly truncated data. Journal of Nonparametric Statistics. Shen P-S. (2008) Nonparametric analysis of doubly truncated data. Annals of the Institute of Statistical Mathematics, DOI 10.1007/s10463-008-0192-2.

Moreira et al. useR! 2009 DTDA package 30/30