[PPT] - R A N D O M I Z AT I O N PMAP 8521: Program Evaluation for Public PowerPoint Presentation

SLIDE 1

R A N D O M I Z AT I O N

PMAP 8521: Program Evaluation for Public Service October 14, 2019

Fill out your reading report

n iCollege!

SLIDE 2

P L A N F O R T O D A Y The “Gold” Standard The magic of randomization Running and analyzing RCTs

SLIDE 3

T H R E A T S T O V A L I D I T Y

Internal validity External validity Construct validity Statistical conclusion validity

Omitted variable bias Trends Study calibration Contamination

SLIDE 4

I N T E R N A L V A L I D I T Y

Omitted variable bias Trends Study calibration Contamination

Selection Attrition Maturation Secular trends Testing Regression Measurement error Time frame of study Seasonality Hawthorne John Henry Spillovers Intervening events

SLIDE 5

T H E M AG I C O F R A N D O M I Z AT I O N

SLIDE 6

W H Y R A N D O M I Z E ?

Fundamental problem of causal inference

δi = Y 1

i − Y 0 i

<latexit sha1_base64="6honxTkUB64g6L3bUQhexACzE10=">ACAXicbVDLSsNAFJ3UV62vqBvBzWAR3FiSKuhGKLpxWcE+pI1hMrlph04ezEyEurGX3HjQhG3/oU7/8Zpm4W2Hhju4Zx7uXOPl3AmlWV9G4WFxaXleJqaW19Y3PL3N5pyjgVFBo05rFoe0QCZxE0FMc2okAEnocWt7gauy3HkBIFke3apiAE5JexAJGidKSa+51feCKuAxf4Lt7W9djXS2XuWbZqlgT4Hli56SMctRd86vrxzQNIVKUEyk7tpUoJyNCMcphVOqmEhJCB6QHU0jEoJ0skFI3yoFR8HsdAvUni/p7ISCjlMPR0Z0hUX856Y/E/r5Oq4NzJWJSkCiI6XRSkHKsYj+PAPhNAFR9qQqhg+q+Y9okgVOnQSjoEe/bkedKsVuyTSvXmtFy7zOMon10gI6Qjc5QDV2jOmogih7RM3pFb8aT8WK8Gx/T1oKRz+yiPzA+fwCpaJUW</latexit>

Individual-level effects are impossible to observe

SLIDE 7

W H Y R A N D O M I Z E ?

Average treatment effect

ATE = E(Y1 − Y0) = E(Y1) − E(Y0)

<latexit sha1_base64="pN7mJOGZdI4pMNJmbJ2I7RQyEFU=">ACDXicbVDLSgMxFM3UV62vUZduglVoF5aZKuhGqErBZYU+aYchk2ba0MyDJCOUoT/gxl9x40IRt+7d+Tdm2hG09UDg3HPu5eYeJ2RUSMP40jJLyura9n13Mbm1vaOvrvXFEHEMWngAW87SBGPVJQ1LJSDvkBHkOIy1ndJP4rXvCBQ38uhyHxPLQwKcuxUgqydaPrupVeAmrhY5twhPYsY3iT1lUdUKMoq3njZIxBVwkZkryIEXN1j97/QBHvElZkiIrmE0oRlxQzMsn1IkFChEdoQLqK+sgjwoqn10zgsVL60A24er6EU/X3RIw8Icaeozo9JIdi3kvE/7xuJN0LK6Z+GEni49kiN2JQBjCJBvYpJ1iysSIc6r+CvEQcYSlCjCnQjDnT14kzXLJPC2V787yles0jiw4AIegAExwDirgFtRA2DwAJ7AC3jVHrVn7U17n7VmtHRmH/yB9vENh0KWKQ=</latexit>

δ = ( ¯ Y |P = 1) − ( ¯ Y |P = 0)

<latexit sha1_base64="togvVy7XxoWsr9z5bpvtjw7BhDE=">ACF3icbVDLSsNAFJ3UV62vqEs3g0VoF4akCroRim5cVrAPaUKZTCbt0MkzEyEvsXbvwVNy4Ucas7/8Zpm4W2Hrhw5px7mXuPnzAqlW1/G4Wl5ZXVteJ6aWNza3vH3N1ryTgVmDRxzGLR8ZEkjHLSVFQx0kEQZHPSNsfXk389j0Rksb8Vo0S4kWoz2lIMVJa6pmWGxCmELyAFdHIrsbwfY0E+nCo/nNbvaM8u2ZU8BF4mTkzLI0eiZX24Q4zQiXGpOw6dqK8DAlFMSPjkptKkiA8RH3S1ZSjiEgvm941hkdaCWAYC1cwan6eyJDkZSjyNedEVIDOe9NxP+8bqrCcy+jPEkV4Xj2UZgyqGI4CQkGVBCs2EgThAXVu0I8QAJhpaMs6RCc+ZMXSatmOSdW7ea0XL/M4yiCA3AIKsABZ6AOrkEDNAEGj+AZvI348l4Md6Nj1lrwchn9sEfGJ8/YUmbpA=</latexit>

SLIDE 8

W H Y R A N D O M I Z E ?

This only works if subgroups that received/didn’t receive treatment look the same

δ = ( ¯ Y |P = 1) − ( ¯ Y |P = 0)

<latexit sha1_base64="togvVy7XxoWsr9z5bpvtjw7BhDE=">ACF3icbVDLSsNAFJ3UV62vqEs3g0VoF4akCroRim5cVrAPaUKZTCbt0MkzEyEvsXbvwVNy4Ucas7/8Zpm4W2Hrhw5px7mXuPnzAqlW1/G4Wl5ZXVteJ6aWNza3vH3N1ryTgVmDRxzGLR8ZEkjHLSVFQx0kEQZHPSNsfXk389j0Rksb8Vo0S4kWoz2lIMVJa6pmWGxCmELyAFdHIrsbwfY0E+nCo/nNbvaM8u2ZU8BF4mTkzLI0eiZX24Q4zQiXGpOw6dqK8DAlFMSPjkptKkiA8RH3S1ZSjiEgvm941hkdaCWAYC1cwan6eyJDkZSjyNedEVIDOe9NxP+8bqrCcy+jPEkV4Xj2UZgyqGI4CQkGVBCs2EgThAXVu0I8QAJhpaMs6RCc+ZMXSatmOSdW7ea0XL/M4yiCA3AIKsABZ6AOrkEDNAEGj+AZvI348l4Md6Nj1lrwchn9sEfGJ8/YUmbpA=</latexit>

SLIDE 9

W H Y R A N D O M I Z E ?

With big enough numbers, the magic of randomization helps make comparison groups comparable

SLIDE 10

R example

SLIDE 11

How big of a sample?

SLIDE 12

T H E “ G O L D ” S TA N DA R D

SLIDE 13

T Y P E S O F R E S E A R C H Experimental studies vs.

bservational studies

Which is better?

SLIDE 14

SLIDE 15

T Y P E S O F R E S E A R C H Experimental studies vs.

bservational studies

Medicine Social science Epidemiology DAGs in RCTs?

SLIDE 16

SLIDE 17

RCTs are great! Super impractical to do all the time though!

SLIDE 18

SLIDE 19

“Gold standard” implies that all causal inferences will be valid if you do the experiment right

We don’t care if studies are experimental or not We care if our causal inferences are valid RCTs are a helpful baseline/rubric for other methods

SLIDE 20

Moving to Opportunity

SLIDE 21

Randomization fixes a ton

f internal validity issues

R C T S & V A L I D I T Y

Selection

Treatment and control groups are comparable; people don’t self-select

Trends

Maturation, secular trends, seasonality, regression to the mean all generally average out

SLIDE 22

RCTs don’t fix attrition! R C T S & V A L I D I T Y

Worst threat to internal validity in RCTs

If attrition is correlated with treatment, that’s bad

People might drop out because of the treatment,

r because they got/didn’t get the control group

SLIDE 23

A D D R E S S I N G A T T R I T I O N Recruit as effectively as possible

You don’t just want weird/WEIRD participants

Get people on board

Get participants invested in the experiment

Collect as much baseline information as possible

Check for randomization of attrition

SLIDE 24

Randomization failures R C T S & V A L I D I T Y

Check baseline pre-data

Noncompliance

Intent-to-treat (ITT) vs. Treatment-on-the treated (TTE) Some people assigned to treatment won’t take it; some people assigned to control will take it

SLIDE 25

O T H E R L I M I T A T I O N S RCTs don’t magically fix construct validity and statistical conclusion validity RCTs definitely don’t magically fix external validity

SLIDE 26

SLIDE 27

W H E N T O R A N D O M LY A S S I G N

Demand for treatment exceeds supply Treatment will be phased in over time Treatment is in equipoise Local culture open to randomization When you’re a nondemocratic monopolist When people won’t know (and it’s ethical!) When lotteries are going to happen anyway

SLIDE 28

W H E N T O N O T R A N D O M L Y A S S I G N When you need immediate results When it’s unethical or illegal When it’s something that happened in the past When it involves universal ongoing phenomena

SLIDE 29

R U N N I N G & A N A LY Z I N G R C T S

SLIDE 30

SLIDE 31

R A N D O M A S S I G N M E N T Coins Dice Unbiased lottery Atmospheric noise

random.org

Random numbers + threshold

SLIDE 32

R example

SLIDE 33

R A N D O M I Z AT I O N

P L A N F O R T O D A Y The “Gold” Standard The magic of randomization Running and analyzing RCTs

T H R E A T S T O V A L I D I T Y

Internal validity External validity Construct validity Statistical conclusion validity

I N T E R N A L V A L I D I T Y

Omitted variable bias Trends Study calibration Contamination

T H E M AG I C O F R A N D O M I Z AT I O N

W H Y R A N D O M I Z E ?

Fundamental problem of causal inference

δi = Y 1

i − Y 0 i

Individual-level effects are impossible to observe

W H Y R A N D O M I Z E ?

Average treatment effect

ATE = E(Y1 − Y0) = E(Y1) − E(Y0)

δ = ( ¯ Y |P = 1) − ( ¯ Y |P = 0)

W H Y R A N D O M I Z E ?

This only works if subgroups that received/didn’t receive treatment look the same

δ = ( ¯ Y |P = 1) − ( ¯ Y |P = 0)

W H Y R A N D O M I Z E ?

With big enough numbers, the magic of randomization helps make comparison groups comparable

R example

How big of a sample?

T H E “ G O L D ” S TA N DA R D

T Y P E S O F R E S E A R C H Experimental studies vs.

Which is better?

T Y P E S O F R E S E A R C H Experimental studies vs.

Medicine Social science Epidemiology DAGs in RCTs?

RCTs are great! Super impractical to do all the time though!

“Gold standard” implies that all causal inferences will be valid if you do the experiment right

We don’t care if studies are experimental or not We care if our causal inferences are valid RCTs are a helpful baseline/rubric for other methods

Moving to Opportunity

Randomization fixes a ton

R C T S & V A L I D I T Y

Selection

Trends

RCTs don’t fix attrition! R C T S & V A L I D I T Y

Worst threat to internal validity in RCTs

If attrition is correlated with treatment, that’s bad

People might drop out because of the treatment,

A D D R E S S I N G A T T R I T I O N Recruit as effectively as possible

Get people on board

Collect as much baseline information as possible

Randomization failures R C T S & V A L I D I T Y

Check baseline pre-data

Noncompliance

Intent-to-treat (ITT) vs. Treatment-on-the treated (TTE) Some people assigned to treatment won’t take it; some people assigned to control will take it

O T H E R L I M I T A T I O N S RCTs don’t magically fix construct validity and statistical conclusion validity RCTs definitely don’t magically fix external validity

W H E N T O R A N D O M LY A S S I G N

Demand for treatment exceeds supply Treatment will be phased in over time Treatment is in equipoise Local culture open to randomization When you’re a nondemocratic monopolist When people won’t know (and it’s ethical!) When lotteries are going to happen anyway

W H E N T O N O T R A N D O M L Y A S S I G N When you need immediate results When it’s unethical or illegal When it’s something that happened in the past When it involves universal ongoing phenomena

R U N N I N G & A N A LY Z I N G R C T S

R A N D O M A S S I G N M E N T Coins Dice Unbiased lottery Atmospheric noise

random.org

Random numbers + threshold

R example

RCT with Qualtrics