SLIDE 1 R A N D O M I Z AT I O N
PMAP 8521: Program Evaluation for Public Service October 14, 2019
Fill out your reading report
SLIDE 2
P L A N F O R T O D A Y The “Gold” Standard The magic of randomization Running and analyzing RCTs
SLIDE 3 T H R E A T S T O V A L I D I T Y
Internal validity External validity Construct validity Statistical conclusion validity
Omitted variable bias Trends Study calibration Contamination
SLIDE 4 I N T E R N A L V A L I D I T Y
Omitted variable bias Trends Study calibration Contamination
Selection Attrition Maturation Secular trends Testing Regression Measurement error Time frame of study Seasonality Hawthorne John Henry Spillovers Intervening events
SLIDE 5
T H E M AG I C O F R A N D O M I Z AT I O N
SLIDE 6
W H Y R A N D O M I Z E ?
Fundamental problem of causal inference
δi = Y 1
i − Y 0 i
<latexit sha1_base64="6honxTkUB64g6L3bUQhexACzE10=">ACAXicbVDLSsNAFJ3UV62vqBvBzWAR3FiSKuhGKLpxWcE+pI1hMrlph04ezEyEurGX3HjQhG3/oU7/8Zpm4W2Hhju4Zx7uXOPl3AmlWV9G4WFxaXleJqaW19Y3PL3N5pyjgVFBo05rFoe0QCZxE0FMc2okAEnocWt7gauy3HkBIFke3apiAE5JexAJGidKSa+51feCKuAxf4Lt7W9djXS2XuWbZqlgT4Hli56SMctRd86vrxzQNIVKUEyk7tpUoJyNCMcphVOqmEhJCB6QHU0jEoJ0skFI3yoFR8HsdAvUni/p7ISCjlMPR0Z0hUX856Y/E/r5Oq4NzJWJSkCiI6XRSkHKsYj+PAPhNAFR9qQqhg+q+Y9okgVOnQSjoEe/bkedKsVuyTSvXmtFy7zOMon10gI6Qjc5QDV2jOmogih7RM3pFb8aT8WK8Gx/T1oKRz+yiPzA+fwCpaJUW</latexit>
Individual-level effects are impossible to observe
SLIDE 7
W H Y R A N D O M I Z E ?
Average treatment effect
ATE = E(Y1 − Y0) = E(Y1) − E(Y0)
<latexit sha1_base64="pN7mJOGZdI4pMNJmbJ2I7RQyEFU=">ACDXicbVDLSgMxFM3UV62vUZduglVoF5aZKuhGqErBZYU+aYchk2ba0MyDJCOUoT/gxl9x40IRt+7d+Tdm2hG09UDg3HPu5eYeJ2RUSMP40jJLyura9n13Mbm1vaOvrvXFEHEMWngAW87SBGPVJQ1LJSDvkBHkOIy1ndJP4rXvCBQ38uhyHxPLQwKcuxUgqydaPrupVeAmrhY5twhPYsY3iT1lUdUKMoq3njZIxBVwkZkryIEXN1j97/QBHvElZkiIrmE0oRlxQzMsn1IkFChEdoQLqK+sgjwoqn10zgsVL60A24er6EU/X3RIw8Icaeozo9JIdi3kvE/7xuJN0LK6Z+GEni49kiN2JQBjCJBvYpJ1iysSIc6r+CvEQcYSlCjCnQjDnT14kzXLJPC2V787yles0jiw4AIegAExwDirgFtRA2DwAJ7AC3jVHrVn7U17n7VmtHRmH/yB9vENh0KWKQ=</latexit>
δ = ( ¯ Y |P = 1) − ( ¯ Y |P = 0)
<latexit sha1_base64="togvVy7XxoWsr9z5bpvtjw7BhDE=">ACF3icbVDLSsNAFJ3UV62vqEs3g0VoF4akCroRim5cVrAPaUKZTCbt0MkzEyEvsXbvwVNy4Ucas7/8Zpm4W2Hrhw5px7mXuPnzAqlW1/G4Wl5ZXVteJ6aWNza3vH3N1ryTgVmDRxzGLR8ZEkjHLSVFQx0kEQZHPSNsfXk389j0Rksb8Vo0S4kWoz2lIMVJa6pmWGxCmELyAFdHIrsbwfY0E+nCo/nNbvaM8u2ZU8BF4mTkzLI0eiZX24Q4zQiXGpOw6dqK8DAlFMSPjkptKkiA8RH3S1ZSjiEgvm941hkdaCWAYC1cwan6eyJDkZSjyNedEVIDOe9NxP+8bqrCcy+jPEkV4Xj2UZgyqGI4CQkGVBCs2EgThAXVu0I8QAJhpaMs6RCc+ZMXSatmOSdW7ea0XL/M4yiCA3AIKsABZ6AOrkEDNAEGj+AZvI348l4Md6Nj1lrwchn9sEfGJ8/YUmbpA=</latexit>
SLIDE 8
W H Y R A N D O M I Z E ?
This only works if subgroups that received/didn’t receive treatment look the same
δ = ( ¯ Y |P = 1) − ( ¯ Y |P = 0)
<latexit sha1_base64="togvVy7XxoWsr9z5bpvtjw7BhDE=">ACF3icbVDLSsNAFJ3UV62vqEs3g0VoF4akCroRim5cVrAPaUKZTCbt0MkzEyEvsXbvwVNy4Ucas7/8Zpm4W2Hrhw5px7mXuPnzAqlW1/G4Wl5ZXVteJ6aWNza3vH3N1ryTgVmDRxzGLR8ZEkjHLSVFQx0kEQZHPSNsfXk389j0Rksb8Vo0S4kWoz2lIMVJa6pmWGxCmELyAFdHIrsbwfY0E+nCo/nNbvaM8u2ZU8BF4mTkzLI0eiZX24Q4zQiXGpOw6dqK8DAlFMSPjkptKkiA8RH3S1ZSjiEgvm941hkdaCWAYC1cwan6eyJDkZSjyNedEVIDOe9NxP+8bqrCcy+jPEkV4Xj2UZgyqGI4CQkGVBCs2EgThAXVu0I8QAJhpaMs6RCc+ZMXSatmOSdW7ea0XL/M4yiCA3AIKsABZ6AOrkEDNAEGj+AZvI348l4Md6Nj1lrwchn9sEfGJ8/YUmbpA=</latexit>
SLIDE 9
W H Y R A N D O M I Z E ?
With big enough numbers, the magic of randomization helps make comparison groups comparable
SLIDE 10
R example
SLIDE 11
How big of a sample?
SLIDE 12
T H E “ G O L D ” S TA N DA R D
SLIDE 13 T Y P E S O F R E S E A R C H Experimental studies vs.
Which is better?
SLIDE 14
SLIDE 15 T Y P E S O F R E S E A R C H Experimental studies vs.
Medicine Social science Epidemiology DAGs in RCTs?
SLIDE 16
SLIDE 17
RCTs are great! Super impractical to do all the time though!
SLIDE 18
SLIDE 19
“Gold standard” implies that all causal inferences will be valid if you do the experiment right
We don’t care if studies are experimental or not We care if our causal inferences are valid RCTs are a helpful baseline/rubric for other methods
SLIDE 20
Moving to Opportunity
SLIDE 21 Randomization fixes a ton
- f internal validity issues
R C T S & V A L I D I T Y
Selection
Treatment and control groups are comparable; people don’t self-select
Trends
Maturation, secular trends, seasonality, regression to the mean all generally average out
SLIDE 22 RCTs don’t fix attrition! R C T S & V A L I D I T Y
Worst threat to internal validity in RCTs
If attrition is correlated with treatment, that’s bad
People might drop out because of the treatment,
- r because they got/didn’t get the control group
SLIDE 23 A D D R E S S I N G A T T R I T I O N Recruit as effectively as possible
You don’t just want weird/WEIRD participants
Get people on board
Get participants invested in the experiment
Collect as much baseline information as possible
Check for randomization of attrition
SLIDE 24
Randomization failures R C T S & V A L I D I T Y
Check baseline pre-data
Noncompliance
Intent-to-treat (ITT) vs. Treatment-on-the treated (TTE) Some people assigned to treatment won’t take it; some people assigned to control will take it
SLIDE 25
O T H E R L I M I T A T I O N S RCTs don’t magically fix construct validity and statistical conclusion validity RCTs definitely don’t magically fix external validity
SLIDE 26
SLIDE 27
W H E N T O R A N D O M LY A S S I G N
Demand for treatment exceeds supply Treatment will be phased in over time Treatment is in equipoise Local culture open to randomization When you’re a nondemocratic monopolist When people won’t know (and it’s ethical!) When lotteries are going to happen anyway
SLIDE 28
W H E N T O N O T R A N D O M L Y A S S I G N When you need immediate results When it’s unethical or illegal When it’s something that happened in the past When it involves universal ongoing phenomena
SLIDE 29
R U N N I N G & A N A LY Z I N G R C T S
SLIDE 30
SLIDE 31
R A N D O M A S S I G N M E N T Coins Dice Unbiased lottery Atmospheric noise
random.org
Random numbers + threshold
SLIDE 32
R example
SLIDE 33
RCT with Qualtrics