Identification Politics

June 9th, 2014 | Published in Statistics

When I first started to learn about the world of quantitative social science, it was approaching the high tide of what I call "identificationism". The basic argument of [this movement](http://orgtheory.wordpress.com/2009/07/30/why-the-identification-movement/) was as follows. Lots of social scientists are crafting elaborate models that basically only show the *correlations* between variables. They then must rely on a lot of assumptions and theoretical arguments in order to claim that an association between X and Y is indicative of X *causing* Y, rather than Y causing X or both being caused by something else. This can lead to a lot of [flimsy and misleading](http://liorpachter.wordpress.com/2014/04/17/does-researching-casual-marijuana-use-cause-brain-abnormalities/) published findings.

Starting in the 1980's, critics of these practices [started to emphasize](http://michaelperelman.wordpress.com/2007/03/31/what-is-the-matter-with-empirical-economics-freak-freakonomics-again/) what is called, in the statistical jargon, "clean identification". Clean identification means that your analysis is set up in a way that makes it possible to convincingly determine causal effects, not just correlations.

The most time-tested and well respected identification strategy is the randomized experiment, of the kind used in medical trials. If you randomly divide people into two groups that differ only by a single treatment, you can be pretty sure that subsequent differences between the two groups are actually caused by the treatment.

But most social science questions, especially the big and important ones, aren't ones you can do experiments on. You can't randomly assign one group of countries to have austerity economics, and another group to have Keynesian policies. So as a second best solution, scholars began looking for so-called "natural experiments". These are situations where, more or less by accident, people find themselves divided into two groups arbitrarily, almost *as if* they had been randomized in an experiment. This allows the identification of causality in non-experimental situations.

A famous early paper using this approach was David Card and Alan Krueger's 1992 [study](http://davidcard.berkeley.edu/papers/njmin-aer.pdf) of the minimum wage. In 1990, New Jersey had increased its minimum wage to be the highest in the country. Card and Krueger compared employment in the fast food industry both New Jersey and eastern Pennsylvania. Their logic was that these stores didn't differ systematically aside from the fact that some of them were subject to the higher New Jersey minimum wage, and some of them weren't. Thus any change in employment after the New Jersey hike could be interpreted as a consequence of the higher minimum wage. In a finding that is still cited by liberal advocates, they concluded that higher minimum wages did nothing to cause lower employment, despite the predictions of textbook neoclassical economics.

This was a useful and important paper, and the early wave of natural experiment analyses produced other useful results as well. But as time went on, the obsession with identification led to a wave of studies that were obsessed with proper methodology and unconcerned with whether they were studying interesting or important topics. Steve Levitt of "Freakonomics" fame is a product of this environment, someone who would never tackle a big hard question where an easy trivial one was available.

With the pool of natural experiments reaching exhaustion, some researchers began to turn toward running their own actual experiments. Hence the rise of the so-called ["randomistas"](http://rupertsimons.blogspot.com/2008/10/deaton-on-randomistas.html). These were people who performed randomized controlled trials, generally in poor countries, to answer small and precisely targeted questions about things like aid policy. This work includes things like Chris Blattman's [study](http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2439488) in which money was randomly distributed to Ugandan women.

But now, if former World Bank lead economist [Branko Milanovic](https://twitter.com/BrankoMilan/status/476026660781711360) is to be believed, the experimental identificationists are having their own [day of crisis](https://chronicle.com/article/Poverty-Under-the-Microscope). As with the natural experiment, the randomized trial sacrifices big questions and generalizable answers in favor of conclusions that are often trivial. With their lavishly funded operations in poor countries, there's an added aspect of liberal colonialism as well. It's the Nick Kristof or Bono approach to helping the global poor; as Milanovic [puts it](https://twitter.com/BrankoMilan/status/476029714637656064), "you can play God in poor countries, publish papers, make money and feel good about yourself."

If there's a backlash against the obsession with causal inference, it will be a victory for people who want to use data to answer real questions. Writing about these issues [years ago](http://www.peterfrase.com/2009/10/elster-on-the-social-sciences/), I argued that:

> It is often impossible to find an analytical strategy which is both free of strong assumptions about causality and applicable beyond a narrow and artificial situation. The goal of causal inference, that is, is a noble but often futile pursuit. In place of causal inference, what we must often do instead is causal interpretation, in which essentially descriptive tools (such as regression) are interpreted causally based on prior knowledge, logical argument and empirical tests that persuasively refute alternative explanations.

I still basically stand by that, or by the pithier formulation I added later, "Causal inference where possible, causal interpretation where necessary."

Peter Frase