Economics for Data Science Chiara Binelli Academic year 2019-2020 - - PowerPoint PPT Presentation
Economics for Data Science Chiara Binelli Academic year 2019-2020 - - PowerPoint PPT Presentation
Economics for Data Science Chiara Binelli Academic year 2019-2020 Email: chiara.binelli@unimib.it Data Science and Economics Economics approach: 1. Have a theory that identifies a relationship of interest (ex. impact of completing
Data Science and Economics
- Economics approach:
1. Have a theory that identifies a relationship of interest (ex. impact of completing college on wages). 2. Estimate the impact of a treatment (ex. completing college) on an
- utcome variable (ex. wages) holding everything else constant.
- Focus on some coefficients of interest to estimate causal effects.
- Effort to estimate unbiased effects with carefully constructed standard errors.
- Data Science approach – data-driven approach:
1. Predict how a given outcome varies with a large number of potential predictors. 2. May or not use prior theory to establish which predictors are relevant.
- Data-driven model selection to identify meaningful predictive variables.
- Less attention to statistical uncertainty and standard errors and more to
model uncertainty.
Two main limitations of the Data Science approach:
- 1. Lack of theory: data-driven approach (predictive models are
chosen using data driven cross-validation methods).
- 2. Lack of statistical significance: focus on predictions that
minimize mean-squared errors without much attention to statistical significance since the exact source of variation identifying the prediction is difficult to assess.
– Thus, bias is allowed in order to reduce variance. – Example: LASSO penalizes the inclusion of covariates so that if two covariates are correlated only one will be included and its parameter will reflect the impact of both included and excluded;
- OVB!
Data Science and Economics
Data Science and Economics
- Economists often interested in assessing the
effectiveness of a policy or testing theories that predict a causal relationship
– Main goal is to identify statistically significant causal effects. – A model with high degree of predictive fit is seen as secondary to finding an empirical specification that identifies a causal effect.
- Common Data Science techniques such as classification
and regression trees, lasso, boosting, and cross- validation have not been much used in Economics.
Data Science and Economics
- Concrete example (Einav and Levin 2014): assess if
taking online classes improves earnings.
- Economics approach:
– Either design an experiment that induces some workers to take on line classes for reasons unrelated to their earning potential.
- e.g. change in the price of online classes.
– Or, absent the experiment, use observational data to estimate the impact of online classes on earnings in an unbiased way. – Focus on:
- Obtaining a point estimate of the impact of online classes on earnings
that is precisely estimated.
- Discussing whether there are omitted variables that might confound a
causal interpretation (e.g. workers’ ambition driving a decision to take classes and work harder at the same time).
Data Science and Economics
- Data Science approach:
– Identify which variables predict earnings, given a vast set of predictors in the data, and the potential for building a model that predicts earnings well, both in sample and out of sample. – Focus on:
- Model that predicts earnings both for individuals that have and have
not taken online classes.
– NOTE: focus is not on causal effect and statistical significance but rather on prediction.
Machine Learning and Statistical Inference
- Flexibility of machine learning algorithms means that two
different functions that use different variables can produce similar predictions:
– In traditional estimation, large standard errors express the uncertainty in attributing effects. – In machine learning, lack of consistency in model’s selection – how to measure this?
- Computing standard errors in machine learning
algorithms is difficult due to the data-driven approach:
– Leeb and Potscher (2006, 2008) develop conditions under which it is impossible to compute consistent estimates of model parameters after data-driven selection.
Big Data and Statistical Inference
- When big data represent all the data for a given set of
variables, should we compute standard errors?
- Very much YES!
– The error of a model comes from two sources: omitted variables and measurement error
- Omitted variables error: some relevant explanatory
variables are omitted, thus get in then error term.
- Measurement error: the dependent variable is measured
with error.
Big Data and Statistical Inference
- Sample error is very different from model error: it is the
difference between the sample-based regression results and the results based on the full population.
- Probability theory tells us that for a well constructed
sample, regression coefficients are unbiased estimates
- f population regression coefficients.
- Tests of statistical significance are relevant both for
samples and for entire data populations.
– To read more on this, see Babones, S. J. 2013.Statistical Modeling with Cross-Sectional Designs , Chapter 5, pp. 107-118.
Data Science and Economics
- Due to theory, the Economics approach is more
interpretable in terms of which variation identifies the impact of interest and its statistical significance.
- The Data Science approach is better for predictions:
– Examples: comparison of performance of OLS vs machine learning algorithms (regression trees, random forest, LASSO, ensemble) in Mullainathan and Spiess (2017 Journal of Economic Perspectives); advantages of using ensembles methods to improve predictions (Athey et al. 2019). – Intuition: machine learning algorithms easily allow introducing pairwise interactions between all potential predictors.
- The two approaches have mutual benefits.
Economics for Data Science
- From Economics to Data Science: 2 main contributions
1. Provide a theory: theory to ask interesting questions and to analyze complex big datasets. With data complexity, crucial to have models to guide choice of variables, relationships between variables, hypothesis to test and experiments to run. 2. Focus on causality: crucial to answer important questions.
- From Data Science to Economics: 3 main contributions
1. Test robustness to misspecifications (Athey and Imbens 2015). 2. New tools for causal inference. 3. Better predictions.
From Economics to Data Science:
- 1. Provide a Theory
- Example: online advertising auctions.
- Important question for Google or Facebook:
– Which ads to show online and how much to charge for the ads? 1. Machine learning methods to build a predictive model to assess the likelihood that a user will click on an ad. By exploiting the enormous amount of data available online, this predictive model tells us which ads to show. 2. Economic theory to build auction models to set prices.
- Several e-commerce companies have built teams of
economists (often with PhDs in Economics), statisticians and computer scientists.
From Economics to Data Science:
- 1. Provide a Theory
- A theory is a way to investigate the mechanism through which
X affects Y. It is a way to make ML interpretable.
- “Interpretable machine learning”: ML field to go beyond a
“black box” approach to explain the logic behind predictions.
– To interpret a model, we require the following insights: 1. Identify the most important features. 2. For any single prediction, the effect of each feature in the data
- n that particular prediction.
3. Effect of each feature over a large number of possible predictions
- Molnar (2019): https://christophm.github.io/interpretable-ml-
book/ and Kaggle crash course on ML explainability: https://www.kaggle.com/learn/machine-learning-explainability
From Economics to Data Science:
- 2. Focus on Causality
- Machine learning algorithms optimize properties of the observed
data: improve performance by optimizing parameters over a set of
- inputs. E.g. to build a predictive model we minimize over fit.
– “As long as we optimize some properties of the observed data, however noble or sophisticated, while making no reference to the world outside the observed data, we are limited to questions of association.” Pearl (2018)
- However, lots of important questions involve cause-and-effect
- relationships. Until recently we had no mathematical framework to
articulate and answer these questions.
- “More has been learned about causal inference in the last few
decades than the sum total of everything that had been learned about it in all prior recorded history“ Garry King, Harvard. “The Causal Revolution" (Pearl and Mackenzie, 2018).
From Economics to Data Science:
- 2. Focus on Causality
(Pearl 2018)
- Human-level AI cannot emerge solely from model-blind
learning machines; it requires the symbiotic collaboration
- f data and models.
- Data science is only as much of a science as it facilitates
the interpretation of data - a two-body problem, connecting data to reality.
- Data alone are hardly a science, regardless how big
they get and how skillfully they are manipulated.
– We need a theory to interpret the data.
From Data Science to Economics:
- 1. Test Robustness to Misspecifications
(Athey and Imbens 2015)
- Researches interested in the effect of a given variable
(x) on an outcome (y) typically report the estimated impact of x on y and a measure of uncertainty of the estimate such as the standard error (se).
- Problem: the measure of uncertainty, that is the
statistical significance of the estimate, depends on the model’s specification:
– Different specifications of the model (variables included, functional forms, etc.) produce different estimates of x on y and associated se.
- Athey and Imbens (2015) propose a simple machine
learning approach to assess the sensitivity of the point estimates to model specification.
From Data Science to Economics:
- 1. Test Robustness to Misspecifications
(Athey and Imbens 2015)
- Idea: compute a measure of how sensitive the estimates
are to a range of alternative models.
- Estimate a series of models and compute the standard
deviation of the point estimate of the effect of x on y over the set of models.
- Each member of the set of model specifications is
generated by splitting the sample into subsamples based
- n covariate values, estimating the model separately for
each subsample, and then combining the results to form a new estimate of the overall effect.
From Data Science to Economics:
- 1. Test Robustness to Misspecifications
- The idea to measure how sensitive the estimates are to
a range of alternative model specifications has been long discussed in Economics starting with Leamer (1983):
– Similar statistical inference results are obtained under different sets of assumptions.
- Machine learning provides effective tools to do this.
– Calculate the standard deviation of the estimated effect of x on y in different samples.
From Data Science to Economics:
- 2. New Tools for Causal Inference
- When estimating the causal effect of a given treatment,
we want to compare the observed outcome with the hypothetical outcome in the absence of the treatment (counterfactual).
- Machine learning methods can be used to build the best
predictive model for the counterfactual without the (sometimes excessive) monetary costs of running a randomized controlled experiment.
- Ex.: compare actual visits to a website following an
advertisement campaign (observed outcome) to the predicted visits absent the advertisement (counterfactual outcome) using time series data on past visits, seasonal effects, data on Google queries (pages 22-24 Varian 2014).
- Flexibly control for a large number of covariates and for
heterogeneous effects.
- Answer causal questions left unanswered (Varian 2016, PNAS).
- Emerging literature combining machine learning methods with
applied econometrics’ techniques to improve the estimation of causal effects (Section 4.2 in Athey and Imbens 2016).
– Random forests, boosting or LASSO to estimate the propensity score in the presence of many covariates (matching). – Improved LASSO through a double-selection method of selecting covariates that are correlated with the outcome, and covariates that are correlated with the treatment so that LASSO can produce causal effects (Belloni et al. 2013). – Wager and Athey (2018): “causal forests” - modification of random forests to produce asymptotically unbiased estimates with CI.
From Data Science to Economics:
- 2. New Tools for Causal Inference
- Provide predictions that can be used for estimations
such as for the first stage of an IV estimation.
- Construct counterfactuals for policy evaluation.
- Answer PREDICTION POLICY PROBLEMS
(Kleinberg et al. 2015).
From Data Science to Economics:
- 3. Better Predictions
Prediction Policy Problems (Kleinberg et al. 2015)
- Many questions where causal inference is not necessary.
– Example 1: is the chance of rain high enough to require an umbrella? The benefits of an umbrella depend on rain. – Example 2: Are the benefits of a hip surgery high enough to justify the surgery? The benefits of a hip surgery depend on whether the patient lives long enough after the surgery (Kleinberg et al. 2015). – Example 3: detain or release someone arrested before trial’s decision? The decision depends on prediction of arrestee’s probability of committing a crime.
- Therefore, pure prediction problems.
Prediction Policy Problems (Kleinberg et al. 2015)
- OLS focuses on unbiasedness (it is the best linear unbiased
estimator!) and provides poor predictions.
- OLS: given a dataset D of n points (y,x), pick a function that
predicts the y value of a new data point. Goal is to minimize a loss function that we take to be
- OLS finds that minimizes in-sample error:
- PROBLEM: ensuring zero bias in sample creates problems
- ut of sample.
2
)) ( ˆ ( x f y
n i i i
x f y
1 2
)) ( ˆ ( min
OLS
f ˆ
Prediction Policy Problems (Kleinberg et al. 2015)
Prediction Policy Problems (Kleinberg et al. 2015)
- Machine learning techniques maximize predictive
performance by exploiting this variance-bias trade off.
- Instead of minimizing only in-sample error, ML minimizes:
+ R(f)
- R(f) is a regularizer that penalizes functions that create
- variance. is the price at which we trade off variance to bias.
- OLS: ; LASSO: ; RIDGE: d=2
1 , ) ( d f R
d
n i i i
x f y
1 2
)) ( ˆ ( min
Prediction Policy Problems (Kleinberg et al. 2015)
- chosen using cross-validation :
– For a set of lambdas, estimate algorithm on (f-1) folds and see which value of lambda produces the best prediction in the fth fold. – Thus, we use the data itself to decide how to make the bias- variance trade off.
- Advantages:
– Allow for more flexible functional forms:
- Higher order interaction terms and decision trees that allow for lots of
interactivity.
– Allow to make predictions using “wide” data, that is datasets that have more variables than data points:
- Example: language data that often have ten times as many variables
than data.
Examples of Prediction Policy Problems (Chalfin et al. 2016)
- How to reduce excessive use of force by police in the US? Can we
reduce police excessive use of force by using machine learning techniques to identify high-risk officers and replace them with average- risk officers?
- Predictive model of whether an officer was ever involved in police
shooting or accused of abuse as a function of sociodemographic variables, previous behaviors and polygraph results. Training and test data.
- Significant reduction in predicted shootings if the predicted bottom
percentile of officers is replaced with the middle segment of the predicted distribution.
- On the contrary, replacement using the rank-ordering of applications
from the police hiring system increases predicted shootings.
Examples of Prediction Policy Problems (Chalfin et al. 2016)
- How can districts choose which teachers to retain after a
probationary period?
- Predictive model of students’ test scores (assumed
measure that the schools use to decide if teachers do well and should be retained) as a function of teachers’ variables (sociodemographic, classrooms observations, etc.), students (sociodemographic, test scores, etc.), and principals (survey about school and teachers).
- Much bigger gains from replacing the predicted bottom
10% of teachers with average quality teachers rather than by using the system of principal rating of teachers currently in place.
One More Example
- f Prediction Policy Problems
(Kleinberg et al. 2017)
- After arrest, how can judges decide if the defendant is sent
home rather than to jail while waiting for trial?
- Predictive model of risk of committing a crime using an
algorithm that finds past defendants who are like the ones currently in court, and uses the crime rates of these similar defendants to predict the crime rates of those in court.
- Findings: pre-trial decisions made using the algorithm’s
predictions could reduce crimes committed by released defendants by up to 25%, and reduce incarceration by up to 42% without increasing crime.
Caveats of Prediction Policy Problems (Kleinberg et al. 2016)
- Kleinberg et al. (2016) make some important points on how best to use
machine learning for policy: – Focus on problems that require prediction. – Think about the outcome to be predicted.
- Example: predict overall crime rate rather than rates of specific crimes since
crimes are correlated.
– The outcome to predict has to be easy to measure and not affected by factors that are difficult to measure.
- Example: build a prediction model to inform the decision of whom to sentence
rather than whom to bail.
- Problem: sentencing depends not only on recidivism risk, but also on factors that
are not directly measured such as society’s sense of mercy and redemption.
– Test the algorithm using an experiment on new data.
- Example: we only have data on the defendants that a judge decided to release.
- To test if the prediction is good, run an RCT or a natural experiment to compare
bail decisions made using machine learning to those made by the judges.
Prediction and Causal Questions (Athey 2017)
- Pure prediction problems do not answer the more complex question of
estimating heterogeneous effects.
- Hip surgery example: the benefits of a hip surgery depend on whether the
patient lives long enough after the surgery (Kleinberg et al. 2015).
- We know the effect of the treatment is negative for the patients that will
die, so for them it is easy to decide for no surgery.
- However, an important open question remains: which patients should be
given priority to receive surgery among the ones that are likely to survive more than one year?
- Causal question that requires estimating counterfactuals scenarios of
the effects of alternative policies of assigning patients to hip surgeries.
Prediction and Causal Questions (Athey 2017)
- Prediction and causal inference are different.
- Several research questions can be framed as a prediction or as a
causal question.
- Example: data on hotel prices and occupancy rates.
- Question 1: what is the best prediction of hotels’ occupancy
given unusually high observed prices on a given day?
– Prediction question: the answer is likely to be “high occupancy” since hotels tend to raise prices as they become full.
- Question 2: what is the effect of increasing the prices on a
given day on occupancy rates?
– Causal question: the answer is likely to be “low occupancy” since an increase in prices is unlikely to increase occupancy.
Prediction and Causal Questions (Blake et al. 2015)
- Question: how effective is paid search advertisement?
- From simple regressions of sales on the amount spent on
advertisement, huge returns to advertisement.
- When running randomized control experiments, the returns
are zero or negative since the majority of the clicks did not result in sales.
- Prediction models would lead to misleading results.
Prediction and Causal Questions
- The causal question is the question of a counterfactual, i.e.
the question of “What would happen if?”
- An emerging literature is merging machine learning and
Economics to use machine learning methods for causal inference.
- An important example of this literature is the TTTC
estimator (Varian 2016, PNAS), which we will discuss in the next weeks.
- Nowcasting (contemporaneous forecasts of economic
statistics):
– E.g.: proxy measures of unemployment claims and consumer confidence (Choi and Varian 2011).
- Improved decision making by exploiting real time
information and running experiments.
From Data Science to Economics: Other Advantages
- Novel measurement and research design:
– Machine learning can deal with data that are too high-dimensional for standard estimation methods (e.g. image information). – Very useful when reliable data on economic outcomes are missing such as in measuring poverty and wealth (Blumenstock 2018 and 2016, Jean et al. 2016, Blumestock, Cadamuro and On 2015), credit scores and loan repayment (Bjorkegren and Grissen 2018), and when we have datasets with missing data (Athey et al. 2019). – Innovative research designs: Bernheim et al. (2013) use a machine learning algorithm trained on a subset of respondents to a survey to predict actual choices from survey respondents, thus providing a tool to infer actual from reported behavior.
From Data Science to Economics: Other Advantages
- Access: much of the novel BIG DATA belong to private companies.
- Data management and computation.
- Asking the right question: huge amount of time only to open and
describe the data.
- Not from instruments designed to produce valid and reliable data
Lazer et al. (2014).
- Search algorithms, such as Google search, are not static, they are
constantly changed to improve performance.
- Search behavior is affected by the service provider, and the
data generating process changes.
- Studies using data from search engines, Facebook or Twitter
may not be replicated using data from earlier or later periods.
Challenges with Big Data
Machine Learning and Causal Inference
- The Problem of the “Missing” Counterfactual
- Program Evaluation and Machine Learning
- Randomized and Natural (or Quasi-) Experiments
Starred readings:
- * Angrist and Pischke, Mostly Harmless Econometrics, Princeton
and Oxford University Press, 2009, Chapter 2 pages 11-24.
- * Stock and Watson, Introduction to Econometrics, 3rd edition,
Chapter 13 pages 511-529 and 538-540.
- * Athey S. and G. Imbens. 2015. “Machine Leaning Methods in
Economics and Econometrics”, AER: Papers and Proceedings, 105(5): 476-480.
- * Varian, H. (2016) “Causal Inference in Economics and Marketing”,
PNAS, Vol. 113, No. 27, pages 7310-7315.
Topic 2
Challenge: Estimating the Causal Effect
- Drawing causal inference such as “What is the causal
effect of college education on earnings?” requires answering counterfactual questions:
1. How would earnings of individuals who did not go to college would have been if they had gone to college? 2. How would earnings of individuals who did go to college would have been if they had not gone to college?
- Problem: we never observe counterfactual outcomes
since we can not simultaneously observe a given person in two different states of the world.
- You either go to college or you do not…
The “Missing” Counterfactual’s Problem
- We can never observe the same person in two different
states of the world at the same time.
- We may have data on the same unit i in two consecutive
trials before and after the treatment (data on your wages before and after you complete college).
- BUT we can not be sure that the treatment effect for i is
the same we would have measured if we had observed i simultaneously in both states (with and without college):
– Carryover effects (effect of college on wages wears off slowly). – Time trends (“unobserved ability” may change over time).
Framework of Potential Outcomes (Rubin’s Causal Model)
- Each individual has two potential outcomes
– Potential outcome without treatment – Potential outcome with treatment
Treatment Effect: for each individual, but only one of the two outcomes is observed
- D=1 if i receives treatment, else D=0
- Observed outcome:
- If individual is treated:
– is observed, – is a counterfactual
- If individual is not treated:
– is observed, – is a counterfactual
Selection Problem
Therefore, we have: Y=Y0+(Y1-Y0)D (1) Given (1), the comparison of average value of Y for people observed in D=1 and D=0 is misleading since: E(Y|D=1)-E(Y|D=0) = = [E(Y1|D=1)-E(Y0|D=1)] + [E(Y0|D=1)- E(Y0|D=0)] = = Average effect of treatment on the treated + Bias Bias is the difference between (average) counterfactual Y0 in both populations (treated and untreated)
Selection Problem
Random assignment of D solves the selection problem since makes D independent of potential outcomes. Formally: E(Y|D=1)-E(Y|D=0) = E(Y1|D=1)-E(Y0|D=0) = E(Y1|D=1)-E(Y0|D=1) since the independence of Y0 and D implies that E(Y0|D=1)=E(Y0|D=0). And: E(Y1|D=1)-E(Y0|D=1)=E(Y1-Y0)
- The effect of the randomly assigned treatment on the
treated is the same as the effect of the treatment on a randomly chosen individual.
Why is the bias likely? Example of simple Roy model: “I will go to college if it is worth.”
Selection Rule: D=1 if Y1-Y0>C Then in general: E(Y0|D=1)=E(Y0|Y0<Y1-C) those who chose treatment which is different from E(Y0|D=0)=E(Y0|Y0>Y1-C) those who chose not to be treated Bias due to comparative advantages in terms of (Y1-Y0). For example:
- Participants can have larger potential gains.
- Heterogeneity in costs.
- Heterogeneity in preferences.
Parameters of Interest
Most commonly used, given some observables X:
- Average treatment effect (ATE):
- Average effect of treatment on the treated (TTE):
- Average effect of treatment on the untreated (UTE)
- ATE is the average of TTE and UTE:
Goal of “Program Evaluation”
Find a “good” comparison group to make up for not knowing counterfactual outcomes
Illustration:
- Identification problem:
we observe E(Y0|D=0), E(Y1|D=1), but not the counterfactuals: E(Y0|D=1), E(Y1|D=0)
- Example: to estimate the TTE, we would need
TTE = E(Y1|D=1) - E(Y0|D=1) (Problem: second term is unobserved)
- Assumption to identify TTE:
E(Y0|D=1)=E(Y0|D=0)=E(Y0), i.e. no selectivity based on outcome in untreated state
Substitute unobserved second term with observed E(Y0|D=0)
(it is satisfied in randomized experiments)
Different Approaches
- f Program Evaluation
1. Run an experiment and use simple differences estimator. 2. Use observational data to construct the counterfactual:
- a. Selection on observables:
- “Unconfoundedness assumption”: we assume to observe all X
variables that affect both participation decision or treatment (ex. completing college) and outcome of interest (ex. wages).
- Diff-in-Diff
- Matching
- Regression discontinuity
- b. Selection on unobservables
- Instrumental variable estimation
- Control function approach
Experiments
- Randomized experiment:
- Setting where the assignment of the treatment does not depend on
either observables or unobservables, and the researcher has control over the assignment (Cochran 1972).
- Designed and implemented consciously by human researchers.
- Entails a conscious use of a treatment and a control group with
random assignment (e.g. clinical trials of drugs).
- “Natural” or quasi-experiment:
- Source of randomization that is “as if” randomly assigned, but this
variation was not part of a conscious randomized treatment and control design.
Randomized Experiments
How can randomization solve the problem of not observing counterfactual outcomes?
- Comparison group selected using randomization device to exclude a
fraction of program applicants from a given treatment by definition: no selection into treatment (if randomization worked)
- Main advantage: comparability between program participants and
nonparticipants same distribution of observables and unobservables in treatment and control group
Randomized Experiments
How can randomization solve the problem of not observing counterfactual outcomes?
- Comparison group selected using randomization device to exclude a
fraction of program applicants from the program by definition: no selection into treatment (if randomization worked)
- Main advantage: comparability between program participants and
nonparticipants same distribution of observables and unobservables
- Formally: randomization leads to