## Identification Politics

June 9th, 2014  |  Published in Statistics

When I first started to learn about the world of quantitative social science, it was approaching the high tide of what I call “identificationism”. The basic argument of this movement was as follows. Lots of social scientists are crafting elaborate models that basically only show the correlations between variables. They then must rely on a lot of assumptions and theoretical arguments in order to claim that an association between X and Y is indicative of X causing Y, rather than Y causing X or both being caused by something else. This can lead to a lot of flimsy and misleading published findings.

Starting in the 1980’s, critics of these practices started to emphasize what is called, in the statistical jargon, “clean identification”. Clean identification means that your analysis is set up in a way that makes it possible to convincingly determine causal effects, not just correlations.

The most time-tested and well respected identification strategy is the randomized experiment, of the kind used in medical trials. If you randomly divide people into two groups that differ only by a single treatment, you can be pretty sure that subsequent differences between the two groups are actually caused by the treatment.

But most social science questions, especially the big and important ones, aren’t ones you can do experiments on. You can’t randomly assign one group of countries to have austerity economics, and another group to have Keynesian policies. So as a second best solution, scholars began looking for so-called “natural experiments”. These are situations where, more or less by accident, people find themselves divided into two groups arbitrarily, almost as if they had been randomized in an experiment. This allows the identification of causality in non-experimental situations.

A famous early paper using this approach was David Card and Alan Krueger’s 1992 study of the minimum wage. In 1990, New Jersey had increased its minimum wage to be the highest in the country. Card and Krueger compared employment in the fast food industry both New Jersey and eastern Pennsylvania. Their logic was that these stores didn’t differ systematically aside from the fact that some of them were subject to the higher New Jersey minimum wage, and some of them weren’t. Thus any change in employment after the New Jersey hike could be interpreted as a consequence of the higher minimum wage. In a finding that is still cited by liberal advocates, they concluded that higher minimum wages did nothing to cause lower employment, despite the predictions of textbook neoclassical economics.

This was a useful and important paper, and the early wave of natural experiment analyses produced other useful results as well. But as time went on, the obsession with identification led to a wave of studies that were obsessed with proper methodology and unconcerned with whether they were studying interesting or important topics. Steve Levitt of “Freakonomics” fame is a product of this environment, someone who would never tackle a big hard question where an easy trivial one was available.

With the pool of natural experiments reaching exhaustion, some researchers began to turn toward running their own actual experiments. Hence the rise of the so-called “randomistas”. These were people who performed randomized controlled trials, generally in poor countries, to answer small and precisely targeted questions about things like aid policy. This work includes things like Chris Blattman’s study in which money was randomly distributed to Ugandan women.

But now, if former World Bank lead economist Branko Milanovic is to be believed, the experimental identificationists are having their own day of crisis. As with the natural experiment, the randomized trial sacrifices big questions and generalizable answers in favor of conclusions that are often trivial. With their lavishly funded operations in poor countries, there’s an added aspect of liberal colonialism as well. It’s the Nick Kristof or Bono approach to helping the global poor; as Milanovic puts it, “you can play God in poor countries, publish papers, make money and feel good about yourself.”

If there’s a backlash against the obsession with causal inference, it will be a victory for people who want to use data to answer real questions. Writing about these issues years ago, I argued that:

It is often impossible to find an analytical strategy which is both free of strong assumptions about causality and applicable beyond a narrow and artificial situation. The goal of causal inference, that is, is a noble but often futile pursuit. In place of causal inference, what we must often do instead is causal interpretation, in which essentially descriptive tools (such as regression) are interpreted causally based on prior knowledge, logical argument and empirical tests that persuasively refute alternative explanations.

I still basically stand by that, or by the pithier formulation I added later, “Causal inference where possible, causal interpretation where necessary.”

## Infotainment Journalism

May 14th, 2014  |  Published in Data, Statistics

We seem, mercifully, to have reached a bit of a backlash to the data journalism/explainer hype typefied by sites like Vox and Fivethirtyeight. Nevertheless, editors in search of viral content find it irresistible to crank out clever articles that purport to illuminate or explain the world with “data”.

Now, I am a big partisan of using quantitative data to understand the world. And I think the hostility to quantification in some parts of the academic Left is often misplaced. But what’s so unfortunate about the wave of shoddy data journalism is that it mostly doesn’t use data as a real tool of empirical inquiry. Instead, data becomes something you sprinkle on top of your substanceless linkbait, giving it the added appearance of having some kind of scientific weight behind it.

Some of the crappiest pop-data science comes in the form of viral maps of various kinds. Ben Blatt at Slate goes over a few of these, pertaining to things like baby names and popular bands. He shows how easy it is to craft misleading maps, even leaving aside the inherent problems with using spatial areas to represent facts about populations that occur in wildly different densities.

Having identified the pitfalls, Blatt then decided to try his hand at making his own viral map. And judging by the number of times I’ve seen his maps of the most widely spoken language in each state on Facebook, he succeeded. But in what is either a sophisticated troll or an example of “knowing too little to know what you don’t know”, Blatt’s maps themselves are pretty uninformative and misleading.

The post consists of several maps. The first simply categorizes each state according to the most commonly spoken non-English language, which is almost always Spanish. Blatt calls this map “not too interesting”, but I’d say it’s the best of the bunch. It’s the least misleading while still containing some useful information about the French-speaking clusters in the Northeast and Louisiana, and the holdout German speakers in North Dakota.

The next map, which shows the most common non-English and non-Spanish language, is also decent. It’s when he starts getting down into more and more detailed subcategories that Blatt really gets into trouble. I’ll illustrate this with the most egregious example, the map of “Most Commonly Spoken Native American Language”.

Part of the problem is the familiar statistician’s issue of sample size. The American Community Survey data that Blatt used to make his maps is extremely large, but you can still run into trouble when you’re looking at a small population and dividing it up into 50 states. Native Americans are a tiny part of the population, and those who speak an indigenous language are an even smaller fraction. The more severe issue, though, is that this map would be misleading even if it were based on a complete census of the population.

That’s because the Native American population in the United States is extremely unevenly distributed, due to the way in which the American colonial project of genocide and resettlement played out historically. In some areas, like the southwest and Alaska, there are sizable populations. In much of the east of the country, there are vanishingly small populations of people who still speak Native American languages. And without even going to the original data (although I did do that), you can see that there are some things majorly wrong here. But you need a passing familiarity with the indigenous language families of North America, which is basically what I have from a cursory study of them as a linguistics major over a decade ago.

We see that Navajo is the most commonly spoken native language in New Mexico. That’s a fairly interesting fact, as it reflects a sizeable population of around 63,000 speakers. But then, we could have seen that already from the previous “non-English and Spanish speakers” map.

But now look at the northeast. We find that the most commonly spoken native language in New Hampshire is Hopi; in Connecticut it’s Navajo; in New Jersey it’s Sahaptian. What does this tell us? The answer is, approximately nothing. The Navajo and Hopi languages originate in the southwest, and the Sahaptian languages in the Pacific northwest, so these values just reflect a handful of people who moved to the east coast for whatever reason. And a handful of people it is: do we really learn anything from the fact there are 36 Hopi speakers in New Hampshire, compared to only 24 speaking Muskogee (which originates in the south)? That is, if we could even know these were the right numbers. The standard errors on these estimates are larger than the estimates themselves, meaning that there is a very good chance that Muskogee, or some other language, is actually the most common native language in New Hampshire.

I suppose this could be regarded as nitpicking, as could the similar things I could say about some of the other maps. Boy, finding out about those 170 Gujurati speakers in Wyoming sure shows me what sets that state apart from its neighbors! OMG, the few hundred Norwegian speakers in Hawaii might slightly outnumber the Swedish speakers! (Or not.) Even the “non-English and Spanish” map, which I generally kind of like, doesn’t quite say as much as it appears—or at least not what it appears to say. The large “German belt” in the plains and mountain west reflects low linguistic diversity more than a preponderance of Krauts. There is a small group of German speakers almost everywhere; in most of these states, the percentage of German speakers isn’t much greater than the national average, which is well under 1 percent. In some, like Idaho and Tennessee, it’s actually lower.

I belabor all this because I take data analysis seriously. The processing and presentation of quantitative data is a key way that facts are manufactured, a source of things people “know” about the world. So it bothers me to see the discursive pollution of things that are essentially vacuous “infotainment” dressed up in fancy terms like “data science” and “data journalism”. I mean, I get it: it’s fun to play with data and make maps! I just wish people would leave their experiments on their hard drives rather than setting them loose onto Facebook where they can mislead the unwary.

## We Have Always Been Rentiers

April 22nd, 2013  |  Published in anti-Star Trek, Political Economy, Statistics

In my periodic discussions of contemporary capitalism and its potential transition into a rentier-dominated economy, I have emphasized the point that an economy based on private property depends upon the state to define and enforce just what counts as property, and what rights come with owning that property. (The point is perhaps made most directly in this essay for The New Inquiry.) Just as capitalism required that the commons in land be enclosed and transformed into the property of individuals, so what I’ve called “rentism” requires the extension of intellectual property: the right to control the copying and modification of patterns, and not just of physical objects.

But the development of rentism entails not just a change in the laws, but in the way the economy itself is measured and defined. Since capitalism is rooted in the quantitative reduction of human action to the accumulation of money, the way in which it quantifies itself has great economic and political significance. To relate this back to my last post: much was made of the empirical and conceptual worthiness of Reinhart and Rogoff’s link between government debt and economic growth, but all such disputations presume agreement about the measurement of economic growth itself.

Which brings us to the United States Bureau of Economic Analysis, and its surprisingly fascinating “Preview of the 2013 Comprehensive Revision of the National Income and Product Accounts”. The paper describes a change in the way the government represents the size of various parts of the economy, and therefore economic growth. The most significant changes are these:

Recognize expenditures by business, government, and nonprofit institutions serving households (NPISH) on research and development as fixed investment.

Recognize expenditures by business and NPISH on entertainment, literary, and other artistic originals as fixed investment.

The essential issue is whether spending on Research and Development, and on the production of creative works, should be regarded merely as an input to other production processes, or instead as an investment in the creation of a distinct value-bearing asset. The BEA report observes that “expenditures for R&D have long been recognized as having the characteristics of fixed assets—defined ownership rights, long-lasting, and repeated use and benefit in the production process”, and that therefore the BEA “recogniz[es] that the asset boundary should be expanded to include innovative activities.” Likewise, “some entertainment, literary, and other artistic originals are designed to generate mass reproductions for sale to the general public and to have a useful lifespan of more than one year.” Thus the need for “a new asset category entitled ‘intellectual property products’,” which will encompass both types of property.

What the BEA calls “expanding the asset boundary” is precisely the redefinition of the property form that I’ve written about—only now it is a statistical rather than a legal redefinition. And that change in measurement will be written backwards into the past as well as forwards into the future: national accounts going back to 1929 will be revised to account for the newly expansive view of assets.

Here the statisticians are only following a long legal trend, in which the state treats immaterial patterns as a sort of physical asset. It may be a coincidence, but the BEA’s decision to start its revisionist statistical account in the 1920’s matches the point at which U.S. copyright law became fully disconnected from its original emphasis on limited and temporary protections subordinated to social benefits. Under the Copyright Term Extension Act, creative works made in 1923 and afterwards have remained out of the public domain, perpetually maintaining them as private assets rather than public goods.

A careful reading of the BEA report shows the way in which the very statistical definitions employed in the new accounts rely upon the prior efforts of the state to promote the profitability of the intellectual property form. In its discussion of creative works, the report notes that “entertainment originals are rarely sold in an open market, so it is difficult to observe market prices . . . a common problem with measuring the value of intangible assets.” As libertarian critics like to point out, an economy based on intellectual property must be organized around monopoly rather than direct competition.

In order to measure the value of intangible assets, therefore, the BEA takes a different approach. For R&D, “BEA analyzed the relationship between investment in R&D and future profits . . . in which each period’s R&D investment contributes to the profits in later periods.” Likewise for creative works, BEA will “estimate the value of these as­sets based on the NPV [Net Present Value] of expected future royalties or other revenue obtained from these assets”.

Here we see the reciprocal operation of state power and statistical measurement. Insofar as the state collaborates with copyright holders to stamp out unauthorized copying (“piracy”), and insofar as the courts uphold stringent patent rights, the potential revenue stream that can be derived from owning IP will grow. And now that the system of national accounts has validated such revenues as a part of the value of intangible assets, the copyright and patent cartels can justly claim to be important contributors to the growth of the Gross Domestic Product.

The BEA also has interesting things to say about how their new definitions will impact different components of the overall national accounts aggregate. They note that the categories of “corporate profits” and “proprietors’ income” will increase—an accounting convention perhaps, but one that accurately reflects the constituencies that stand to benefit from the control of intellectual property. Thus the new economic order being mapped by the BEA fits in neatly with Steve Waldman’s excellent recent post about late capitalism’s “technologically-driven resource curse, coalescing into groups of insiders and outsiders and people fighting at the margins not to be left behind.”

The changes related to R&D and artistic works may be the most significant, but the other three revisions in the report are worth noting as well. One has to do with the costs associated with transferring residential fixed assets (e.g., the closing costs related to buying a house), while another has to do with the accounting applied to pension plans. Only the final one, a technical harmonization, has to do directly with wages and salaries. This is perhaps an accurate reflection of an economic elite more preoccupied with asset values than with the direct returns to wage labor.

Finally, the reception of the BEA report provides another “peril of wonkery”, related to the one I described in my last post. The Wonkblog post about the report makes some effort to acknowledge the socially constructed nature of economic statistics: “the assumptions you make in creating your benchmark economic statistics can create big swings in the reality you see.” And yet the post then moves directly on to claim that in light of the statistical revisions, “the U.S. economy is even more heavily driven by the iPad designers and George Lucases of the world—and proportionally less by the guys who assemble washing machines—than we thought.” This is no doubt how the matter will be described going forward. But the new measurement strategies are only manifestations of a choice to attribute a greater share of our material wealth to designers and directors, and that choice has more to do with class struggle than with statistics.

## The Recession and the Decline in Driving

August 19th, 2011  |  Published in Data, Social Science, Statistical Graphics, Statistics

Jared Bernstein recently posted the graph of U.S. Vehicle Miles Traveled released by the Federal Highway Administration. Bernstein notes that normally, recessions and unemployment don’t affect our driving habits very much–until the recent recession, miles traveled just kept going up. That has changed in recent years, as VMT still hasn’t gotten back to the pre-recession peak. Bernstein:

What you see in the current period is a quite different—a massive decline in driving over the downturn with little uptick since. Again, both high unemployment and high [gas] prices are in play here, so there may be a bounce back out there once the economy gets back on track. But it bears watching—there may be a new behavioral response in play, with people’s driving habits a lot more responsive to these economic changes than they used to be.

Ok, but what’s the big deal? Well, I’ve generally been skeptical of arguments about “the new normal,” thinking that much of what we’re going through is cyclical, not structural, meaning things pretty much revert back to the old normal once we’re growing in earnest again. But it’s worth tracking signals like this that remind one that at some point, if it goes on long enough, cyclical morphs into structural.

What could explain this cultural shift? Maybe more young people are worried about the price of gas or the environment. But—and this is just a theory—technology could play a role, too. Once upon a time, newly licensed teens would pile all their friends into their new car and drive around aimlessly. For young suburban Americans, it was practically a rite of passage. Nowadays, however, teens can socialize via Facebook or texting instead—in the Zipcar survey, more than half of all young adults said they’d rather chat online than drive to meet their friends.

But that’s all just speculation at this point. As Bernstein says, it’s still unclear whether the decline in driving is a structural change or just a cyclical shift that will disappear once (if) the U.S. economy starts growing again.

Is it really plausible to posit this kind of cultural shift, particularly given the evidence about the price elasticity of oil? As it happens, I did a bit of analysis on this point a couple of years ago. Back then, Nate Silver wrote a column in which he tried to use a regression model to address this question of whether the decline in driving was a response to economic factors or an indication of a cultural trend. Silver argued that economic factors–in his model, unemployment and gas prices–couldn’t completely explain the decline in driving. If true, that result would support the “cultural shift” argument against the “cyclical downturn” argument.

I wrote a series of posts in which I argued that with a more complete model–including wealth and the lagged effect of gas prices–the discrepancies in Silver’s model seemed to disappear. That suggests that we don’t need to hypothesize any cultural change to explain the decline in driving. You can go to those older posts for the gory methodological details; in this post, I’m just going to post an updated version of one of my old graphs:

The blue line is the 12-month moving average of Vehicle Miles Travelled–the same thing Bernstein posted. The green and red lines are 12-month moving averages of predicted VMT from two different regression models–the Nate Silver model and my expanded model, as described in the earlier post I linked. The underlying models haven’t changed since my earlier version of this graph, except that I updated the data to include the most recent information, and switched to the 10-city Case Shiller average for my house price measure, rather than the OFHEO House Price Index that I was using before, but which seems to be an inferior measure.

The basic conclusion I draw here is the same as it was before: a complete set of economic covariates does a pretty good job of predicting miles traveled. In fact, even Nate Silver’s simple “gas prices and unemployment” model does fine for recent months, although it greatly overpredicts during the depths of the recession.* So I don’t see any cultural shift away from driving here–much as I would like to, since I personally hate to drive and I wish America wasn’t built around car ownership. Instead, the story seems to be that Americans, collectively, have experienced an unprecedented combination of lost wealth, lost income, and high gas prices. That’s consistent with graphs like these, which look a lot like the VMT graph.

The larger point here is that we can’t count on shifts in individual preferences to get us away from car culture. The entire built environment of the United States is designed around the car–sprawling suburbs, massive highways, meager public transit, and so on. A lot of people can’t afford to live in walkable, bikeable, or transit-accessible places even if they want to. Changing that is going to require a long-term change in government priorities, not just a cultural shift.

Below are the coefficients for my model. The data is here, and the code to generate the models and graph is here.

Coef.     s.e.

(Intercept)    111.55     2.09

unemp           -1.57     0.27

gasprice        -0.08     0.01

gasprice_lag12  -0.03     0.01

date             0.01     0.00

stocks           0.58     0.23

housing          0.10     0.01

monthAugust     17.52     1.01

monthDecember   -9.21     1.02

monthFebruary  -31.83     1.03

monthJanuary   -22.90     1.02

monthJuly       17.84     1.02

monthJune       11.31     1.03

monthMarch      -0.09     1.03

monthMay        12.08     1.02

monthNovember  -10.46     1.01

monthOctober     5.82     1.01

monthSeptember  -2.73     1.01

---

n = 234, k = 18

residual sd = 3.16, R-Squared = 0.99

* That’s important, since you could otherwise argue that the housing variable in my model–which has seen an unprecedented drop in recent years–is actually proxying a cultural change. I doubt that for other reasons, though. If housing is removed from the model, it underpredicts VMT during the runup of the bubble, just as Silver’s model does. That suggests that there is some real wealth effect of house prices on driving.

## What is output?

April 6th, 2011  |  Published in Statistical Graphics, Statistics

I’m going to do a little series on manufacturing, because after doing my last post I got a little sucked into the various data sources that are available. Today’s installment comes with a special attention conservation notice, however: this post will be extremely boring. I’ll get back to my substantive arguments about manufacturing in future posts, and put up some details about trends in productivity in specific sectors, some data that contextualizes the U.S. internationally, and a specific comparison with China. But first, I need to make a detour into definitions and methods, just so that I have it for my own reference. What follows is an attempt to answer a question I’ve often wanted answered but never seen written up in one place: what, exactly, do published measures of real economic growth actually mean?

The two key concepts in my previous post are manufacturing employment and manufacturing output. The first concept is pretty simple–the main difficulty is to define what counts as a manufacturing job, but there are fairly well-accepted definitions that researchers use. In the International Standard Industrial Classification (ISIC), which is used in many cross-national datasets, manufacturing is definied as:

the physical or chemical transformation of materials of components into new products, whether the work is performed by power- driven machines or by hand, whether it is done in a factory or in the worker’s home, and whether the products are sold at wholesale or retail. Included are assembly of component parts of manufactured products and recycling of waste materials.

There is some uncertainty about how to classify workers who are only indirectly involved in manufacturing, but in general it’s fairly clear which workers are involved in manufacturing according to this criterion.

The concept of “output”, however, is much fuzzier. It’s not so hard to figure out what the physical outputs of manufacturing are–what’s difficult is to compare them, particularly over time. My last post was gesturing at some concept of physical product: the idea was that we produce more things than we did a few decades ago, but that we do so with far fewer people. However, there is no simple way to compare present and past products of the manufacturing process, because the things themselves are qualitatively different. If it took a certain number of person-hours to make a black and white TV in the 1950’s, and it takes a certain number of person-hours to make an iPhone in 2011, what does that tell us about manufacturing productivity?

There are multiple sources of data on manufacturing output available. My last post used the Federal Reserve’s Industrial Production data. The Fed says that this series “measures the real output of the manufacturing, mining, and electric and gas utilities industries”. They further explain that this measure is based on “two main types of source data: (1) output measured in physical units and (2) data on inputs to the production process, from which output is inferred.”. Another U.S. government source is the Bureau of Economic Analysis data on value added by industry, which “is equal to an industry’s gross output (sales or receipts and other operating income, commodity taxes, and inventory change) minus its intermediate inputs (consumption of goods and services purchased from other industries or imported).” For international comparisons, the OECD provides a set of numbers based on what they call “indices of industrial production”–which, for the United States, are the same as the Federal Reserve output numbers. And the United Nations presents data for value-added by industry, which covers more countries than the OECD and is supposed to be cross-nationally comparable, but does not quite match up with the BEA numbers.

The first question to ask is: how comparable are all these different measures? Only the Fed/OECD numbers refer to actual physical output; the BEA/UN data appears to be based only on the money value of final output. Here is a comparison of the different measures, for the years in which they are all available (1970-2009). The numbers have all been put on the same scale: percent of the value in the year 2007.

The red line shows the relationship between the BEA value added numbers and the Fed output numbers, while the blue line shows the comparison between the UN value-added data and the Fed output data. The diagonal black line shows where the lines would fall if these two measures were perfectly comparable. While the overall correlation is fairly strong, there are clear discrepancies. In the pre-1990 data, the BEA data shows manufacturing output being much lower than the Fed’s data, while the UN series shows somewhat higher levels of output. The other puzzling result is in the very recent data: according to value-added, manufacturing output has remained steady in the last few years, but according to the Fed output measure it has declined dramatically. It’s hard to know what to make of this, but it does suggest that the Great Recession has created some issues for the models used to create these data series.

What I would generally say about these findings is that these different data sources are sufficiently comparable to be used interchangeably in making the points I want to make about long-term trends in manufacturing, but they are nevertheless different enough that one shouldn’t ascribe unwarranted precision to them. However, the fact that all the data are similar doesn’t address the larger question: how can we trust any of these numbers? Specifically, how do government statistical agencies deal with the problem of comparing qualitatively different outputs over time?

Contemporary National Accounts data tracks changes in GDP using something called a “chained Fisher price index”. Statistics Canada has a good explanation of the method. There are two different problems that this method attempts to solve. The first is the problem of combining all the different outputs of an economy at a single point in time, and the second is to track changes from one time period to another. In both instances, it is necessary to distinguish between the quantity of goods produced, and the prices of those goods. Over time, the nominal GDP–that is, the total money value of everything the economy produces–will grow for two reasons. There is a “price effect” due to inflation, where the same goods just cost more, and a “volume effect” due to what StatCan summarizes as “the change in quantities, quality and composition of the aggregate” of goods produced.

StatCan describes the goal of GDP growth measures as follows: “the total change in quantities can only be calculated by adding the changes in quantities in the economy.” Thus the goal is something approaching a measure of how much physical stuff is being produced. But they go on to say that:

creating such a summation is problematic in that it is not possible to add quantities with physically different units, such as cars and telephones, even two different models of cars. This means that the quantities have to be re evaluated using a common unit. In a currency-based economy, the simplest solution is to express quantities in monetary terms: once evaluated, that is, multiplied by their prices, quantities can be easily aggregated.

This is an important thing to keep in mind about output growth statistics, such as the manufacturing output numbers I just discussed. Ultimately, they are all measuring things in terms of their price. That is, they are not doing what one might intuitively want, which is to compare the actual amount of physical stuff produced at one point with the amount produced at a later point, without reference to money. This latter type of comparison is simply not possible, or at least it is not done by statistical agencies. (As an aside, this is a recognition of one of Marx’s basic insights about the capitalist economy: it is only when commodities are exchanged on the market, through the medium of money, that it becomes possible to render qualitatively different objects commensurable with one another.)

In practice, growth in output is measured using two pieces of information. The first is the total amount of a given product that is sold in a given period. Total amount, in this context, does not refer to a physical quantity (it would be preferable to use physical quanitites, but this data is not usually available), but to the total money value of goods sold. The second piece of information is the price of a product at a given time point, which can be compared to the price in a previous period. The “volume effect”–that is, the actual increase in output–is then defined as the change in total amount sold, “deflated” to account for changes in price. So, for example, say there are $1 billion worth of shoes sold in period 1, and$1.5 billion worth of shoes sold in period 2. Meanwhile, the price of a pair of shoes rises from $50 to$60 between periods 1 and two. The “nominal” change in shoe production is 50%–that is, sales have increased from 1 billion to 1.5 billion. But the real change in the volume of shoes sold is defined as:

$\frac{\frac{\50}{\60}*\1.5 billion}{\1 billion} = 1.25$

So after correcting for the price increase, the actual increase in the amount of shoes produced is 25 percent. Although the example is a tremendous simplification, it is in essence how growth in output is measured by national statistical agencies.

## Elster on the Social Sciences

October 20th, 2009  |  Published in Social Science, Sociology, Statistics

The present crisis looks as though it may bring about a long-delayed moment of reckoning in the field of economics. Macro-economics has been plunged into turmoil now that many of its leading practitioners stand exposed as dogmatists blithely clinging to absurd pre-Keynesian notions about the impossibility of economic stimulus and the inherent rationality of markets, who have nothing at all to say about the roots of the current turmoil. Micro-economics, meanwhile, has seen Freakonomics run its course, as long-standing criticisms of the obsession with “clean identification” over meaningful questions spill over into a new row over climate-change denialism.

Joining the pile-on, Jon Elster has an article in the electronic journal Capitalism and Society on the “excessive ambitions” of the social sciences. Focusing on economics–but referring to related fields–he criticizes three main lines of inquiry: rational choice theory, behavioral economics, and statistical inference.

Although I agree with most of the article’s arguments, much of it seemed rather under-argued. At various points, Elster’s argument seems to be: “I don’t need to provide an example of this; isn’t it obvious?” And with respect to his claim that “much work in economics and political science is devoid of empirical, aesthetic, or mathematical interest, which means that it has no value at all“, I’m inclined to agree. But it’s hard for me to say that Elster is contributing a whole lot to the discussion. I’m also a bit skeptical of the claim that behavioral economics has “predictive but not prescriptive implications”, given the efforts of people like Cass Sunstein to implement “libertarian paternalist” policies based on an understanding of some of the irrationalities studied in behavioral research.

But the part of the essay closest to my own interests was on data analysis. Here Elster is wading into the well-travelled terrain of complaining about poorly reasoned statistical analysis. He himself admits to being inexpert in these matters, and so relies on others, especially David Freedman. But he still sees fit to proclaim that we are awash in work that is both methodogically suspect and insufficiently engaged with its empirical substance.

The criticisms raised are all familiar. The specter of “data snooping . . . curve-fitting . . . arbitrariness in the measurement of . . . variables”, and so on, all fit under the rubric of what Freedman called “data driven model selection”. And indeed these things are all problems. But much of Elster’s discussion suffers from his lack of familiarity with the debates. He refers repeatedly to the problem of statistical significance testing–both the confusion of statistical and substantive significance, and the arbitrariness of the traditional 5% threshold for detecting effects. While I wouldn’t deny that these abuses persist, I think that years of relentless polemics on this issue from people like Deirdre McCloskey and Jacob Cohen have had an impact, and practice has begun to shift in a more productive direction.

Elster never really moves beyond these technical details to grapple with the larger philosphical issues that arise in applied statistics. For example, all of the problems with statistical significance arise from an over-reliance on the null hypothesis testing model of inference–even though as Andrew Gelman says, the true value of a parameter is never zero in any real social science situation. Simply by moving in the direction of estimating the magnitude of effects and their confidence intervals, we can avoid many of these problems.

And although Freedman makes a number of very important criticisms of standard practice, the article that Elster relies upon relies very heavily on the weakness of the causal claims made about regression models. As a superior model, Freedman invokes John Snow’s analysis of cholera in the 1850’s, which used simple methods but relied upon identifying a natural experiment in which different houses received their water from different sources. In this respect, the article is redolent of the time it was published (1991), when the obsession with clean identification and natural experiments was still gaining steam, and valid causal inference seemed like the most important goal of social science.

Yet we now see the limitations of that research agenda. It’s rare and fortuitous to find a situation like Snow’s cholera study, in which a vitally important question is illuminated by a clean natural experiment. All too often, the search for identification leads researchers to study obscure topics of little general relevance, thereby gaining internal validity (verifiable causality in a given data set) at the expense external validity (applicability to broader social situations). This is what has led to the stagnation of Freakonomics-style research. What we have to accept, I think, is that it is often impossible to find an analytical strategy which is both free of strong assumptions about causality and applicable beyond a narrow and artificial situation. The goal of causal inference, that is, is a noble but often futile pursuit. In place of causal inference, what we must often do instead is causal interpretation, in which essentially descriptive tools (such as regression) are interpreted causally based on prior knowledge, logical argument and empirical tests that persuasively refute alternative explanations.**

This is, I think, consistent with the role Elster proposes for data analysis, in the closing of his essay: an enterprize which “turns on substantive causal knowledge of the field in question together with the imagination to concoct testable implications that can establish ‘novel facts'”. And Elster gives some useful practical suggestions for improving results, such as partitioning data sets, fitting models on only one half, and not looking at the other half of the cases until a model is decided upon. But as with many rants against statistical malpractice, it seems to me that the real sociological issue is being sidestepped, which is that the institutional structure of social science strongly incentivizes malpractice. To put it another way, the purpose of academic social science is not, in general, to produce valid inference about the world; it is to produce publications. As long as that is the case, it seems unlikely that bad practices can be definitively stamped out.

**Addendum: Fabio Rojas says what I wanted to say, rather more concisely. He notes that “identification is a luxury when you have an abundance of data and a pretty clear idea about what casual effects you care about.”. Causal inference where possible, causal interpretation where necessary, ought to be the guiding principle. Via the Social Science Statistics blog, there is also a very interesting paper by Angus Deaton on the problems of causal inference. Of particular note is the difficulty of testing the assumptions behind instrumental variables methods, and the often-elided distinction between an instrument that is external to the process under investigation (that is, not caused by the system being studied) and one that is truly exogenous (that is, uncorrelated with the error term in the regression of the outcome on the predictor of interest.)

## The data seem so much less real once you ask the same person the same question twice

October 12th, 2009  |  Published in Data, Social Science, Statistics

I identify with Jeremy Freese to an unhealthy degree. When the other options are to a) have a life; or b) do something that advances his career, he chooses to concoct a home-brewed match between GSS respondents in 2006 and their 2008 re-interviews. I would totally do this. I still might do this.

And then he drops the brutal insight that provides my title. Context.

UPDATE: And then Kieran Healy drops this:

The real distinction between qualitative and quantitative is not widely appreciated. People think it has something to do with counting versus not counting, but this is a mistake. If the interpretive work necessary to make sense of things is immediately obvious to everyone, it’s qualitative data. If the interpretative work you need to do is immediately obvious only to experts, it’s quantitative data.