A repository of papers that bring machine learning into economics and econometrics curated by John Coglianese, William Murdock, Ashesh Rambachan, Jonathan Roth, Elizabeth Santorella, and Jann Spiess

Inherent Trade-Offs in the Fair Determination of Risk Scores

Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan (2016): Inherent Trade-Offs in the Fair Determination of Risk Scores. ArXiv.

How do we tell if an algorithm is discriminating on the basis of race or other characteristics? Public concern over discriminatory algorithms is high. Books such as Weapons of Math Destruction and Automating Inequality detail the ways algorithms further disadvantage the disadvantaged. New York City’s City Council has passed an algorithmic accountability bill. Barocas and Selbst warn that machine learning algorithms have the potential to “inherit the prejudices of prior decision makers”, “reflect widespread biases,” and discover “preexisting patterns of exclusion and inequality.”

Kleinberg, Mullainathan, and Raghavan show that even without these nefarious factors at play, it is usually impossible for an algorithm to be fair by all of three seemingly sensible definitions. They study the case of risk assessments, which they define as “ways of dividing people up into sets [….] and then assigning each set a probability estimate that the people in this set belong to the positive class.” For concreteness, consider the COMPAS algorithm, a controversial tool that predicts recidivism. the “positive class” is those who do recidivize, and the negative class is those who do not. They define three criteria and show that, any risk assessment algorithm is unfair by at least one of the criteria, unless the algorithm makes perfect predictions or the groups have the same true rate of belonging to the positive class. More…

fairness discrimination risk scoring   

The Costs of Algorithmic Fairness

Sam Corbett-Davies, Sanjog Misra, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq (2017): Algorithmic decision making and the cost of fairness. ArXiv.

Society is increasingly relying on algorithms to make decisions in areas as diverse as the criminal justice system and healthcare, but concerns abound about whether algorithmic decision-making may induce racial or gender bias. This paper formalizes three notions of algorithmic fairness as constraints on the decision rule, and shows what the optimal decision rule looks like subject to these constraints. The authors then apply these rules to the context of bail decisions, and estimate the costs of imposing different notions of algorithmic fairness in terms of the number of additional crimes committed relative to an unrestrained decision rule. More…

algorithmic fairness algorithmic decision making   

An Application of Causal Forests

Jonathan M.V. Davis and Sara B. Heller (2017): Using Causal Forests to Predict Treatment Heterogeneity - An Application to Summer Jobs. American Economic Review.

Athey and Imbens (2016) and Wager and Athey (2017) introduced causal trees and causal forests as new methods for identifying treatment heterogeneity that have potential gains over traditional methods. This paper applies the causal forest method to data from two randomized experiments that evaluated the impact of a summer jobs program on disadvantaged youth in Chicago. More…

causal forests heterogeneous treatment effects causal inference   

Deep Learning for Instrumental Variables

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy (2017): Deep IV - A Flexible Approach to Counterfactual Prediction. Proceedings of the 34th International Conference on Machine Learning.

Instrumental variables (IV) is one of the most important tools used to identify causal effects in economic research. If we can find a suitable instrument (relevant and plausibly exogenous), we can exploit it to identify the coefficient on the regressor of interest when we are worred about omitted variables bias. As presented in any standard econometrics textbook, the standard IV set-up makes strong, linearity assumptions. In particular, we assume that the endogenous regressor relates to the outcome of interest linearly and the instrument relates to the endogenous regressor linearly. More…

deep learning instrumental variables causal inference   

AlphaGo meets Structural Estimation

Mitsuru Igami (2017): Artificial Intelligence as Structural Estimation - Economic Interpretations of Deep Blue, Bonanza, and AlphaGo. ArXiv.

Artificial intelligence has been able to achieve human-like and, in some cases, human-superior performance in a variety contexts such image recognition and natural language processing. Yet, the most famous achievements in artificial intelligence have occured in board games. Events like IBM’s DeepMind beating chess grandmaster, Gary Kasparov, and Deepmind’s AlphaGo beating world class Go player, Lee Sedol, captured the public imagination about the future of artificial intelligence. More…

alphago reinforcement learning structural estimation   

Sparsity in Economic Predictions

Domenico Giannone, Michele Lenza, and Giorgio Primiceri (2017): Economic Predictions with Big Data - The Illusion of Sparsity. Working Paper.

Underlying many machine learning prediction techniques is an implicit assumption of sparsity. That is, out of the many potential covariates, machine learning techniques typically assume that only a few are actually relevant for the prediction task at hand. More…

prediction sparsity spike and slab   

Data-Driven Tuning, 50 Years Earlier

Willard James and Charles Stein (1960): Estimation with Quadratic Loss. Fourth Berkeley Symposium.

Shrinkage reduces the variance of estimators at the cost of some bias. Whether this regularization improves precision depends on its tuning, i.e. the choice of shrinkage factor. James and Stein show that for the estimation of at least three Normal means there is a data-driven choice of the tuning parameter that always beats the unregularized estimator. More…

shrinkage beta-hat regularization