1 / 11
CEM: A Matching Method for Observational Data in the Social Sciences
S.M. Iacus (Univ. of Milan) & G. King (Harvard Univ.) & G. Porro (Univ. of Trieste)
Rennes, useR! 2009, July 8th - 10th
CEM: A Matching Method for Observational Data in the Social Sciences - - PowerPoint PPT Presentation
CEM: A Matching Method for Observational Data in the Social Sciences S.M. Iacus (Univ. of Milan) & G. King (Harvard Univ.) & G. Porro (Univ. of Trieste) Rennes, useR! 2009, July 8th - 10th 1 / 11 The problem of matching Estimation of
1 / 11
Rennes, useR! 2009, July 8th - 10th
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
2 / 11
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
3 / 11
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
4 / 11
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
4 / 11
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
4 / 11
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
4 / 11
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
5 / 11
COARSEN THE DATA X INTO C(X) DO EXACT MATCHING ON COARSENED DATA C(X) CEM weights pass original uncoarsened data X to the analysis stage ORIGINAL DATA X THE ANALYSIS STAGE lm glm randomForest coxph etc
6 / 11
R> library(cem) R> data(LL) # The Lalonde(1986) benchmark data R> # initial imbalance R> imb <- imbalance(LL$treated,LL,drop=c("re78","treated")) R> imb Multivariate Imbalance Measure: L1=0.735 Percentage of local common support: LCS=17.8% Univariate Imbalance Measures: statistic type L1 min 25% 50% 75% max age 1.792038e-01 (diff) 4.705882e-03 1 0.00000
education 1.922361e-01 (diff) 9.811844e-02 1 1.00000 1.0000 2.0000 black 1.346801e-03 (diff) 1.346801e-03 0.00000 0.0000 0.0000 married 1.070311e-02 (diff) 1.070311e-02 0.00000 0.0000 0.0000 nodegree
0.00000 0.0000 0.0000 re74
69.73096 584.9160 -2139.0195 re75 3.941545e+01 (diff) 5.551115e-17 0 294.18457 660.6865 490.3945 hispanic
0.00000 0.0000 0.0000 u74
0.00000 0.0000 0.0000 u75
0.00000 0.0000 0.0000
7 / 11
R> mat <- cem("treated", LL, drop="re78",L1.breaks=imb$L1$breaks) R> mat G0 G1 All 425 297 Matched 222 163 Unmatched 203 134 Multivariate Imbalance Measure: L1=0.432 Percentage of local common support: LCS=44.7% Univariate Imbalance Measures: statistic type L1 min 25% 50% 75% max age 1.862046e-01 (diff) 5.551115e-17 0.0000 1.00000 1.000 education 1.022495e-02 (diff) 1.022495e-02 0.0000 0.00000 0.000 black
0.0000 0.00000 0.000 married 0.000000e+00 (diff) 5.898060e-17 0.0000 0.00000 0.000 nodegree
0.0000 0.00000 0.000 re74 7.197514e+00 (diff) 5.551115e-17 0.0000 -70.85522 416.416 re75 1.220698e+01 (diff) 5.551115e-17 0 234.4843 140.79126 -852.252 hispanic 0.000000e+00 (diff) 5.551115e-17 0.0000 0.00000 0.000 u74 0.000000e+00 (diff) 2.775558e-17 0.0000 0.00000 0.000 u75 0.000000e+00 (diff) 5.551115e-17 0.0000 0.00000 0.000
8 / 11
R> relax.cem(mat,LL) Executing 42 different relaxations .......[20%]....[40%].....[60%]....[80%]....[100%] Pre−relax: 163 matched (54.9 %)
education(9) education(8) hispanic(1) re74(7) re74(8) re74(9) re74(5) re74(6) education(7) u75(1) black(1) age(9) re75(7) re75(8) re75(9) age(8) re75(5) re75(6) nodegree(1) education(5) re74(4) u74(1) education(6) married(1) age(7) re74(3) re74(2) re74(1) age(6) education(4) age(5) re75(3) re75(4) re75(1) re75(2) education(3) education(2) age(4) education(1) age(2) age(3) age(1) 54.9 55.2 56.6 56.9 57.2 57.6 57.9 58.2 58.6 58.9 59.3 59.6 60.3 60.6 61.3 62.0 62.6 63.0 63.3 64.6 66.7 68.7 70.4 71.4 74.1 163 164 168 169 170 171 172 173 174 175 176 177 179 180 182 184 186 187 188 192 198 204 209 212 220 . 5 9 . 5 9 . 5 9 . 6 . 6 1 . 6 1 . 6 1 . 6 1 . 6 1 . 6 1 . 6 2 . 6 2 . 6 2 . 6 1 . 6 1 . 6 1 . 6 2 . 6 1 . 6 1 . 6 2 . 6 2 . 6 2 . 6 2 . 6 2 . 6 3 . 6 3 . 6 . 6 . 6 . 6 4 . 6 4 . 6 4 . 6 3 . 6 3 . 6 3 . 6 3 . 6 4 . 6 5 . 6 7 . 6 7 . 6 9 . 6 9 . 7 1
number of matched % matched
9 / 11
R> att(mat, re78 ~ treated, LL) -> TE R> TE G0 G1 All 425 297 Matched 222 163 Unmatched 203 134 Linear regression model on CEM matched data: SATT point estimate: 550.962564 (p.value=0.368242) 95% conf. interval: [-647.777701, 1749.702830]
R> att(mat, re78 ~ treated, LL, extrapolate=TRUE) G0 G1 All 425 297 Matched 222 163 Unmatched 203 134 Linear regression model with extrapolation: SATT point estimate: 1290.247549 (p.value=0.062168) 95% conf. interval: [391.886467, 2188.608631]
R> plot(TE,mat,LL,vars=c("re75","re74","education","age","hispanic"))
10 / 11
Linear regression model on CEM matched data
Treatment Effect CEM Strata
−5000 5000 10000 15000 20000
re75 re74 education age hispanic Min Max
zero
re75 re74 education age hispanic Min Max
positive
re75 re74 education age hispanic Min Max
Estimation of TE Matching solutions in R (incomplete list) CEM Overview Infos
11 / 11