## We Have Always Been Rentiers

April 22nd, 2013  |  Published in anti-Star Trek, Political Economy, Statistics

In my periodic discussions of contemporary capitalism and its potential transition into a rentier-dominated economy, I have emphasized the point that an economy based on private property depends upon the state to define and enforce just what counts as property, and what rights come with owning that property. (The point is perhaps made most directly in this essay for The New Inquiry.) Just as capitalism required that the commons in land be enclosed and transformed into the property of individuals, so what I’ve called “rentism” requires the extension of intellectual property: the right to control the copying and modification of patterns, and not just of physical objects.

But the development of rentism entails not just a change in the laws, but in the way the economy itself is measured and defined. Since capitalism is rooted in the quantitative reduction of human action to the accumulation of money, the way in which it quantifies itself has great economic and political significance. To relate this back to my last post: much was made of the empirical and conceptual worthiness of Reinhart and Rogoff’s link between government debt and economic growth, but all such disputations presume agreement about the measurement of economic growth itself.

Which brings us to the United States Bureau of Economic Analysis, and its surprisingly fascinating “Preview of the 2013 Comprehensive Revision of the National Income and Product Accounts”. The paper describes a change in the way the government represents the size of various parts of the economy, and therefore economic growth. The most significant changes are these:

Recognize expenditures by business, government, and nonprofit institutions serving households (NPISH) on research and development as fixed investment.

Recognize expenditures by business and NPISH on entertainment, literary, and other artistic originals as fixed investment.

The essential issue is whether spending on Research and Development, and on the production of creative works, should be regarded merely as an input to other production processes, or instead as an investment in the creation of a distinct value-bearing asset. The BEA report observes that “expenditures for R&D have long been recognized as having the characteristics of fixed assets—defined ownership rights, long-lasting, and repeated use and benefit in the production process”, and that therefore the BEA “recogniz[es] that the asset boundary should be expanded to include innovative activities.” Likewise, “some entertainment, literary, and other artistic originals are designed to generate mass reproductions for sale to the general public and to have a useful lifespan of more than one year.” Thus the need for “a new asset category entitled ‘intellectual property products’,” which will encompass both types of property.

What the BEA calls “expanding the asset boundary” is precisely the redefinition of the property form that I’ve written about—only now it is a statistical rather than a legal redefinition. And that change in measurement will be written backwards into the past as well as forwards into the future: national accounts going back to 1929 will be revised to account for the newly expansive view of assets.

Here the statisticians are only following a long legal trend, in which the state treats immaterial patterns as a sort of physical asset. It may be a coincidence, but the BEA’s decision to start its revisionist statistical account in the 1920′s matches the point at which U.S. copyright law became fully disconnected from its original emphasis on limited and temporary protections subordinated to social benefits. Under the Copyright Term Extension Act, creative works made in 1923 and afterwards have remained out of the public domain, perpetually maintaining them as private assets rather than public goods.

A careful reading of the BEA report shows the way in which the very statistical definitions employed in the new accounts rely upon the prior efforts of the state to promote the profitability of the intellectual property form. In its discussion of creative works, the report notes that “entertainment originals are rarely sold in an open market, so it is difficult to observe market prices . . . a common problem with measuring the value of intangible assets.” As libertarian critics like to point out, an economy based on intellectual property must be organized around monopoly rather than direct competition.

In order to measure the value of intangible assets, therefore, the BEA takes a different approach. For R&D, “BEA analyzed the relationship between investment in R&D and future profits . . . in which each period’s R&D investment contributes to the profits in later periods.” Likewise for creative works, BEA will “estimate the value of these as­sets based on the NPV [Net Present Value] of expected future royalties or other revenue obtained from these assets”.

Here we see the reciprocal operation of state power and statistical measurement. Insofar as the state collaborates with copyright holders to stamp out unauthorized copying (“piracy”), and insofar as the courts uphold stringent patent rights, the potential revenue stream that can be derived from owning IP will grow. And now that the system of national accounts has validated such revenues as a part of the value of intangible assets, the copyright and patent cartels can justly claim to be important contributors to the growth of the Gross Domestic Product.

The BEA also has interesting things to say about how their new definitions will impact different components of the overall national accounts aggregate. They note that the categories of “corporate profits” and “proprietors’ income” will increase—an accounting convention perhaps, but one that accurately reflects the constituencies that stand to benefit from the control of intellectual property. Thus the new economic order being mapped by the BEA fits in neatly with Steve Waldman’s excellent recent post about late capitalism’s “technologically-driven resource curse, coalescing into groups of insiders and outsiders and people fighting at the margins not to be left behind.”

The changes related to R&D and artistic works may be the most significant, but the other three revisions in the report are worth noting as well. One has to do with the costs associated with transferring residential fixed assets (e.g., the closing costs related to buying a house), while another has to do with the accounting applied to pension plans. Only the final one, a technical harmonization, has to do directly with wages and salaries. This is perhaps an accurate reflection of an economic elite more preoccupied with asset values than with the direct returns to wage labor.

Finally, the reception of the BEA report provides another “peril of wonkery”, related to the one I described in my last post. The Wonkblog post about the report makes some effort to acknowledge the socially constructed nature of economic statistics: “the assumptions you make in creating your benchmark economic statistics can create big swings in the reality you see.” And yet the post then moves directly on to claim that in light of the statistical revisions, “the U.S. economy is even more heavily driven by the iPad designers and George Lucases of the world—and proportionally less by the guys who assemble washing machines—than we thought.” This is no doubt how the matter will be described going forward. But the new measurement strategies are only manifestations of a choice to attribute a greater share of our material wealth to designers and directors, and that choice has more to do with class struggle than with statistics.

## The Recession and the Decline in Driving

August 19th, 2011  |  Published in Data, Social Science, Statistical Graphics, Statistics

Jared Bernstein recently posted the graph of U.S. Vehicle Miles Traveled released by the Federal Highway Administration. Bernstein notes that normally, recessions and unemployment don’t affect our driving habits very much–until the recent recession, miles traveled just kept going up. That has changed in recent years, as VMT still hasn’t gotten back to the pre-recession peak. Bernstein:

What you see in the current period is a quite different—a massive decline in driving over the downturn with little uptick since. Again, both high unemployment and high [gas] prices are in play here, so there may be a bounce back out there once the economy gets back on track. But it bears watching—there may be a new behavioral response in play, with people’s driving habits a lot more responsive to these economic changes than they used to be.

Ok, but what’s the big deal? Well, I’ve generally been skeptical of arguments about “the new normal,” thinking that much of what we’re going through is cyclical, not structural, meaning things pretty much revert back to the old normal once we’re growing in earnest again. But it’s worth tracking signals like this that remind one that at some point, if it goes on long enough, cyclical morphs into structural.

What could explain this cultural shift? Maybe more young people are worried about the price of gas or the environment. But—and this is just a theory—technology could play a role, too. Once upon a time, newly licensed teens would pile all their friends into their new car and drive around aimlessly. For young suburban Americans, it was practically a rite of passage. Nowadays, however, teens can socialize via Facebook or texting instead—in the Zipcar survey, more than half of all young adults said they’d rather chat online than drive to meet their friends.

But that’s all just speculation at this point. As Bernstein says, it’s still unclear whether the decline in driving is a structural change or just a cyclical shift that will disappear once (if) the U.S. economy starts growing again.

Is it really plausible to posit this kind of cultural shift, particularly given the evidence about the price elasticity of oil? As it happens, I did a bit of analysis on this point a couple of years ago. Back then, Nate Silver wrote a column in which he tried to use a regression model to address this question of whether the decline in driving was a response to economic factors or an indication of a cultural trend. Silver argued that economic factors–in his model, unemployment and gas prices–couldn’t completely explain the decline in driving. If true, that result would support the “cultural shift” argument against the “cyclical downturn” argument.

I wrote a series of posts in which I argued that with a more complete model–including wealth and the lagged effect of gas prices–the discrepancies in Silver’s model seemed to disappear. That suggests that we don’t need to hypothesize any cultural change to explain the decline in driving. You can go to those older posts for the gory methodological details; in this post, I’m just going to post an updated version of one of my old graphs:

The blue line is the 12-month moving average of Vehicle Miles Travelled–the same thing Bernstein posted. The green and red lines are 12-month moving averages of predicted VMT from two different regression models–the Nate Silver model and my expanded model, as described in the earlier post I linked. The underlying models haven’t changed since my earlier version of this graph, except that I updated the data to include the most recent information, and switched to the 10-city Case Shiller average for my house price measure, rather than the OFHEO House Price Index that I was using before, but which seems to be an inferior measure.

The basic conclusion I draw here is the same as it was before: a complete set of economic covariates does a pretty good job of predicting miles traveled. In fact, even Nate Silver’s simple “gas prices and unemployment” model does fine for recent months, although it greatly overpredicts during the depths of the recession.* So I don’t see any cultural shift away from driving here–much as I would like to, since I personally hate to drive and I wish America wasn’t built around car ownership. Instead, the story seems to be that Americans, collectively, have experienced an unprecedented combination of lost wealth, lost income, and high gas prices. That’s consistent with graphs like these, which look a lot like the VMT graph.

The larger point here is that we can’t count on shifts in individual preferences to get us away from car culture. The entire built environment of the United States is designed around the car–sprawling suburbs, massive highways, meager public transit, and so on. A lot of people can’t afford to live in walkable, bikeable, or transit-accessible places even if they want to. Changing that is going to require a long-term change in government priorities, not just a cultural shift.

Below are the coefficients for my model. The data is here, and the code to generate the models and graph is here.

                Coef.     s.e.

(Intercept)    111.55     2.09

unemp           -1.57     0.27

gasprice        -0.08     0.01

gasprice_lag12  -0.03     0.01

date             0.01     0.00

stocks           0.58     0.23

housing          0.10     0.01

monthAugust     17.52     1.01

monthDecember   -9.21     1.02

monthFebruary  -31.83     1.03

monthJanuary   -22.90     1.02

monthJuly       17.84     1.02

monthJune       11.31     1.03

monthMarch      -0.09     1.03

monthMay        12.08     1.02

monthNovember  -10.46     1.01

monthOctober     5.82     1.01

monthSeptember  -2.73     1.01

---

n = 234, k = 18

residual sd = 3.16, R-Squared = 0.99


* That’s important, since you could otherwise argue that the housing variable in my model–which has seen an unprecedented drop in recent years–is actually proxying a cultural change. I doubt that for other reasons, though. If housing is removed from the model, it underpredicts VMT during the runup of the bubble, just as Silver’s model does. That suggests that there is some real wealth effect of house prices on driving.

## What is output?

April 6th, 2011  |  Published in Statistical Graphics, Statistics

I’m going to do a little series on manufacturing, because after doing my last post I got a little sucked into the various data sources that are available. Today’s installment comes with a special attention conservation notice, however: this post will be extremely boring. I’ll get back to my substantive arguments about manufacturing in future posts, and put up some details about trends in productivity in specific sectors, some data that contextualizes the U.S. internationally, and a specific comparison with China. But first, I need to make a detour into definitions and methods, just so that I have it for my own reference. What follows is an attempt to answer a question I’ve often wanted answered but never seen written up in one place: what, exactly, do published measures of real economic growth actually mean?

The two key concepts in my previous post are manufacturing employment and manufacturing output. The first concept is pretty simple–the main difficulty is to define what counts as a manufacturing job, but there are fairly well-accepted definitions that researchers use. In the International Standard Industrial Classification (ISIC), which is used in many cross-national datasets, manufacturing is definied as:

the physical or chemical transformation of materials of components into new products, whether the work is performed by power- driven machines or by hand, whether it is done in a factory or in the worker’s home, and whether the products are sold at wholesale or retail. Included are assembly of component parts of manufactured products and recycling of waste materials.

There is some uncertainty about how to classify workers who are only indirectly involved in manufacturing, but in general it’s fairly clear which workers are involved in manufacturing according to this criterion.

The concept of “output”, however, is much fuzzier. It’s not so hard to figure out what the physical outputs of manufacturing are–what’s difficult is to compare them, particularly over time. My last post was gesturing at some concept of physical product: the idea was that we produce more things than we did a few decades ago, but that we do so with far fewer people. However, there is no simple way to compare present and past products of the manufacturing process, because the things themselves are qualitatively different. If it took a certain number of person-hours to make a black and white TV in the 1950′s, and it takes a certain number of person-hours to make an iPhone in 2011, what does that tell us about manufacturing productivity?

There are multiple sources of data on manufacturing output available. My last post used the Federal Reserve’s Industrial Production data. The Fed says that this series “measures the real output of the manufacturing, mining, and electric and gas utilities industries”. They further explain that this measure is based on “two main types of source data: (1) output measured in physical units and (2) data on inputs to the production process, from which output is inferred.”. Another U.S. government source is the Bureau of Economic Analysis data on value added by industry, which “is equal to an industry’s gross output (sales or receipts and other operating income, commodity taxes, and inventory change) minus its intermediate inputs (consumption of goods and services purchased from other industries or imported).” For international comparisons, the OECD provides a set of numbers based on what they call “indices of industrial production”–which, for the United States, are the same as the Federal Reserve output numbers. And the United Nations presents data for value-added by industry, which covers more countries than the OECD and is supposed to be cross-nationally comparable, but does not quite match up with the BEA numbers.

The first question to ask is: how comparable are all these different measures? Only the Fed/OECD numbers refer to actual physical output; the BEA/UN data appears to be based only on the money value of final output. Here is a comparison of the different measures, for the years in which they are all available (1970-2009). The numbers have all been put on the same scale: percent of the value in the year 2007.

The red line shows the relationship between the BEA value added numbers and the Fed output numbers, while the blue line shows the comparison between the UN value-added data and the Fed output data. The diagonal black line shows where the lines would fall if these two measures were perfectly comparable. While the overall correlation is fairly strong, there are clear discrepancies. In the pre-1990 data, the BEA data shows manufacturing output being much lower than the Fed’s data, while the UN series shows somewhat higher levels of output. The other puzzling result is in the very recent data: according to value-added, manufacturing output has remained steady in the last few years, but according to the Fed output measure it has declined dramatically. It’s hard to know what to make of this, but it does suggest that the Great Recession has created some issues for the models used to create these data series.

What I would generally say about these findings is that these different data sources are sufficiently comparable to be used interchangeably in making the points I want to make about long-term trends in manufacturing, but they are nevertheless different enough that one shouldn’t ascribe unwarranted precision to them. However, the fact that all the data are similar doesn’t address the larger question: how can we trust any of these numbers? Specifically, how do government statistical agencies deal with the problem of comparing qualitatively different outputs over time?

Contemporary National Accounts data tracks changes in GDP using something called a “chained Fisher price index”. Statistics Canada has a good explanation of the method. There are two different problems that this method attempts to solve. The first is the problem of combining all the different outputs of an economy at a single point in time, and the second is to track changes from one time period to another. In both instances, it is necessary to distinguish between the quantity of goods produced, and the prices of those goods. Over time, the nominal GDP–that is, the total money value of everything the economy produces–will grow for two reasons. There is a “price effect” due to inflation, where the same goods just cost more, and a “volume effect” due to what StatCan summarizes as “the change in quantities, quality and composition of the aggregate” of goods produced.

StatCan describes the goal of GDP growth measures as follows: “the total change in quantities can only be calculated by adding the changes in quantities in the economy.” Thus the goal is something approaching a measure of how much physical stuff is being produced. But they go on to say that:

creating such a summation is problematic in that it is not possible to add quantities with physically different units, such as cars and telephones, even two different models of cars. This means that the quantities have to be re evaluated using a common unit. In a currency-based economy, the simplest solution is to express quantities in monetary terms: once evaluated, that is, multiplied by their prices, quantities can be easily aggregated.

This is an important thing to keep in mind about output growth statistics, such as the manufacturing output numbers I just discussed. Ultimately, they are all measuring things in terms of their price. That is, they are not doing what one might intuitively want, which is to compare the actual amount of physical stuff produced at one point with the amount produced at a later point, without reference to money. This latter type of comparison is simply not possible, or at least it is not done by statistical agencies. (As an aside, this is a recognition of one of Marx’s basic insights about the capitalist economy: it is only when commodities are exchanged on the market, through the medium of money, that it becomes possible to render qualitatively different objects commensurable with one another.)

In practice, growth in output is measured using two pieces of information. The first is the total amount of a given product that is sold in a given period. Total amount, in this context, does not refer to a physical quantity (it would be preferable to use physical quanitites, but this data is not usually available), but to the total money value of goods sold. The second piece of information is the price of a product at a given time point, which can be compared to the price in a previous period. The “volume effect”–that is, the actual increase in output–is then defined as the change in total amount sold, “deflated” to account for changes in price. So, for example, say there are $1 billion worth of shoes sold in period 1, and$1.5 billion worth of shoes sold in period 2. Meanwhile, the price of a pair of shoes rises from $50 to$60 between periods 1 and two. The “nominal” change in shoe production is 50%–that is, sales have increased from 1 billion to 1.5 billion. But the real change in the volume of shoes sold is defined as:

$\frac{\frac{\50}{\60}*\1.5 billion}{\1 billion} = 1.25$

So after correcting for the price increase, the actual increase in the amount of shoes produced is 25 percent. Although the example is a tremendous simplification, it is in essence how growth in output is measured by national statistical agencies.

## Elster on the Social Sciences

October 20th, 2009  |  Published in Social Science, Sociology, Statistics

The present crisis looks as though it may bring about a long-delayed moment of reckoning in the field of economics. Macro-economics has been plunged into turmoil now that many of its leading practitioners stand exposed as dogmatists blithely clinging to absurd pre-Keynesian notions about the impossibility of economic stimulus and the inherent rationality of markets, who have nothing at all to say about the roots of the current turmoil. Micro-economics, meanwhile, has seen Freakonomics run its course, as long-standing criticisms of the obsession with “clean identification” over meaningful questions spill over into a new row over climate-change denialism.

Joining the pile-on, Jon Elster has an article in the electronic journal Capitalism and Society on the “excessive ambitions” of the social sciences. Focusing on economics–but referring to related fields–he criticizes three main lines of inquiry: rational choice theory, behavioral economics, and statistical inference.

Although I agree with most of the article’s arguments, much of it seemed rather under-argued. At various points, Elster’s argument seems to be: “I don’t need to provide an example of this; isn’t it obvious?” And with respect to his claim that “much work in economics and political science is devoid of empirical, aesthetic, or mathematical interest, which means that it has no value at all“, I’m inclined to agree. But it’s hard for me to say that Elster is contributing a whole lot to the discussion. I’m also a bit skeptical of the claim that behavioral economics has “predictive but not prescriptive implications”, given the efforts of people like Cass Sunstein to implement “libertarian paternalist” policies based on an understanding of some of the irrationalities studied in behavioral research.

But the part of the essay closest to my own interests was on data analysis. Here Elster is wading into the well-travelled terrain of complaining about poorly reasoned statistical analysis. He himself admits to being inexpert in these matters, and so relies on others, especially David Freedman. But he still sees fit to proclaim that we are awash in work that is both methodogically suspect and insufficiently engaged with its empirical substance.

The criticisms raised are all familiar. The specter of “data snooping . . . curve-fitting . . . arbitrariness in the measurement of . . . variables”, and so on, all fit under the rubric of what Freedman called “data driven model selection”. And indeed these things are all problems. But much of Elster’s discussion suffers from his lack of familiarity with the debates. He refers repeatedly to the problem of statistical significance testing–both the confusion of statistical and substantive significance, and the arbitrariness of the traditional 5% threshold for detecting effects. While I wouldn’t deny that these abuses persist, I think that years of relentless polemics on this issue from people like Deirdre McCloskey and Jacob Cohen have had an impact, and practice has begun to shift in a more productive direction.

Elster never really moves beyond these technical details to grapple with the larger philosphical issues that arise in applied statistics. For example, all of the problems with statistical significance arise from an over-reliance on the null hypothesis testing model of inference–even though as Andrew Gelman says, the true value of a parameter is never zero in any real social science situation. Simply by moving in the direction of estimating the magnitude of effects and their confidence intervals, we can avoid many of these problems.

And although Freedman makes a number of very important criticisms of standard practice, the article that Elster relies upon relies very heavily on the weakness of the causal claims made about regression models. As a superior model, Freedman invokes John Snow’s analysis of cholera in the 1850′s, which used simple methods but relied upon identifying a natural experiment in which different houses received their water from different sources. In this respect, the article is redolent of the time it was published (1991), when the obsession with clean identification and natural experiments was still gaining steam, and valid causal inference seemed like the most important goal of social science.

Yet we now see the limitations of that research agenda. It’s rare and fortuitous to find a situation like Snow’s cholera study, in which a vitally important question is illuminated by a clean natural experiment. All too often, the search for identification leads researchers to study obscure topics of little general relevance, thereby gaining internal validity (verifiable causality in a given data set) at the expense external validity (applicability to broader social situations). This is what has led to the stagnation of Freakonomics-style research. What we have to accept, I think, is that it is often impossible to find an analytical strategy which is both free of strong assumptions about causality and applicable beyond a narrow and artificial situation. The goal of causal inference, that is, is a noble but often futile pursuit. In place of causal inference, what we must often do instead is causal interpretation, in which essentially descriptive tools (such as regression) are interpreted causally based on prior knowledge, logical argument and empirical tests that persuasively refute alternative explanations.**

This is, I think, consistent with the role Elster proposes for data analysis, in the closing of his essay: an enterprize which “turns on substantive causal knowledge of the field in question together with the imagination to concoct testable implications that can establish ‘novel facts’”. And Elster gives some useful practical suggestions for improving results, such as partitioning data sets, fitting models on only one half, and not looking at the other half of the cases until a model is decided upon. But as with many rants against statistical malpractice, it seems to me that the real sociological issue is being sidestepped, which is that the institutional structure of social science strongly incentivizes malpractice. To put it another way, the purpose of academic social science is not, in general, to produce valid inference about the world; it is to produce publications. As long as that is the case, it seems unlikely that bad practices can be definitively stamped out.

**Addendum: Fabio Rojas says what I wanted to say, rather more concisely. He notes that “identification is a luxury when you have an abundance of data and a pretty clear idea about what casual effects you care about.”. Causal inference where possible, causal interpretation where necessary, ought to be the guiding principle. Via the Social Science Statistics blog, there is also a very interesting paper by Angus Deaton on the problems of causal inference. Of particular note is the difficulty of testing the assumptions behind instrumental variables methods, and the often-elided distinction between an instrument that is external to the process under investigation (that is, not caused by the system being studied) and one that is truly exogenous (that is, uncorrelated with the error term in the regression of the outcome on the predictor of interest.)

## The data seem so much less real once you ask the same person the same question twice

October 12th, 2009  |  Published in Data, Social Science, Statistics

I identify with Jeremy Freese to an unhealthy degree. When the other options are to a) have a life; or b) do something that advances his career, he chooses to concoct a home-brewed match between GSS respondents in 2006 and their 2008 re-interviews. I would totally do this. I still might do this.

And then he drops the brutal insight that provides my title. Context.

UPDATE: And then Kieran Healy drops this:

The real distinction between qualitative and quantitative is not widely appreciated. People think it has something to do with counting versus not counting, but this is a mistake. If the interpretive work necessary to make sense of things is immediately obvious to everyone, it’s qualitative data. If the interpretative work you need to do is immediately obvious only to experts, it’s quantitative data.

## Quantity, Quality, Social Science

September 17th, 2009  |  Published in Social Science, Statistics

Henry Farrell expresses the duality of social scientific thought by invoking a passage from one of my favorite books, Calvino’s Invisible Cities. The comments spin out the eternal quantitative vs. qualitative research debate, in both more and less interesting permutations.

Historically and philosophically, the whole qual-quant divide is an important object of social science, since it is itself a consequence of the same process of modernity and capitalist development that produces social science itself. It is only when society and its institutions appear at a scale too large for the human mind to grasp all at once that we require abstractions–particularly statistical and mathematic ones–to simplifiy and describe our social world to ourselves.

Within academia, however, there is a seemingly inescapable sense that qualitative and quantitative epistemologies are locked in some kind of zero-sum competition. These days people like to talk about “mixed methods”, but I agree with some commenters in the above thread that this too often amounts to doing a quantitative study and then using qualitative examples (from interviews or ethnography or whatever) as examples or window dressing.

It seems to me that a lot of this is driven by a misapprehension about what either approach is really good for.  The problem is that we expect quantitative and qualitative approaches to do the same kind of thing; that is, to collect data and use them to test well-defined hypotheses. I find that quantitative approaches are generally quite useful for taking well-defined concepts, and reasonably precise operationalizations of those concepts, and testing the interrelations between them. If your question is “do high tax rates inhibit economic growth”, and you have acceptable definitions and data for the subject and object of that hypothesis, then you can make useful–though never definitive–inferences using quantitative methods.

Qualitative methods are less often (though sometimes) suited to this kind of thing, because they are by nature rooted in the idiosyncracies of specific cases and hence are difficult to generalize. What qualitative work is really good for, I think, is in generating concepts. Quantitative analysis presupposes a huge conceptual apparatus: from the way ideas are operationalized, to the way survey questions are written, to the way variables are defined, to the way models are parameterized. Some of these presuppositions can be adjusted in the course of an analysis, but others are deeply encoded in the information we use. If  you want to know whether the categories of a “race” variable are appropriate, the best strategy is probably a qualitative one, which will examine how racial categories are experienced by people, and how they operate in everyday life. Likewise, new hypotheses can arise from “thick description” which would not be apparent from consulting large tables of numbers.

This, however, brings up an issue that will probably be uncomfortable for a lot of qualitative social scientists, particular those who are concerned with defending the “scientific” credentials of their work. Namely, can we draw a clear boundary between qualitative social science, journalism, and even fiction, with regards to their utility for driving the concept-formation process? Social science typically differentiates itself from mere journalism by its greater rigour; yet in my reading, the kind of rigour which is most important to qualitative work will be its interpretive rigour, rather than its precision in research design and data-gathering. Whether one is starting with ethnographic field notes or with The Wire, the point is to draw out and develop concepts and hypotheses in a sufficiently precise way that they can be tested with larger-scale (which is to say, generally quantitative) empirical data.

To put things this way seems to slide into a kind of cultural studies, except that the latter tends to set itself up as oppositional, rather than complementary, to quantitative empirical work. We would do far better, I think, to recognize that data analysis without qualitative conceptual interpretation is sterile and stagnant, while qualitative analysis without large-scale empiricism will tend to be speculative and inconclusive.

## The ontology of statistics

September 4th, 2009  |  Published in Social Science, Statistics

Bayesian statistics is sometimes differentiated from its frequentist alternative with the claim that frequentists have a kind of platonist ontology, which treats the parameters they seek to estimate as being fixed by nature; Bayesians, in contrast, are said to hold a stochastic ontology in which there is variability “all the way down”, as it were. This distinction implies that frequentist measurements of uncertainty refer solely to epistemological uncertainty:  if we estimate that a certain variable has a mean of 50 and a standard error of two, we are saying only that we do not have enough information to specify the mean more precisely. In contrast, the Bayesian perspective (according to the view just elucidated) would hold that a measure of uncertainty includes not only epistemological but also ontological uncertainty: even with a sample size approaching infinity, the mean of the variable in question is the realization of some probability distribution and not a fixed quantity, and therefore can never be specified without uncertainty.

As regards the frequentist-Bayesian distinction, the above distinction is misleading and unhelpful.  Andrew Gelman is, by any sensible account, one of the leading exponents and practitioners of Bayesian statistics, and yet he says here that “I’m a Bayesian and I think parameters are fixed by nature. But I don’t know them, so I model them using random variables.” Compare this to the comment of another Bayesian, Bill Jefferys: “I’ve always regarded the main difference between Bayesian and classical statistics to be the fact that Bayesians treat the state of nature (e.g., the value of a parameter) as a random variable, whereas the classical way of looking at it is that it’s a fixed but unknown number, and that putting a probability distribution on it doesn’t make sense.”

For Gelman, the choice of Bayesian methods is not primarily motivated by ontological commitments, but is rather a kind of pragmatism: he adopts techniques such as shrinkage estimators, prior distributions, etc. because they give good predictions about the state of the world in cases where frequentist methods fail or cannot be applied. This, I suspect, corresponds to the inclinations and motivations of many applied researchers, who as often as not will be uninterested in the ontology implied by their methods, so long as the techniques give reasonable answers.

Moreover, if it is possible to be a Bayesian with a Platonist ontology, it is equally possible to wander into a stochastic view of the world without reaching beyond the well-accepted “classical” methods. Consider, for example, the logistic regression, which is by now a part of routine introductory statistical instruction in every field of social science.  A logistic regression model does not directly predict a binary outcome y, which can be 0 or 1. Rather, it predicts the probability of such an outcome, conditional on the predictor variables.  There are two ways to think about such models. One of them, the so-called “latent variable” interpretation, posits that there is some unobservable continuous variable Z, and that the outcome y is 0 if this Z variable is below a certain threshold, and 1 otherwise. If one holds to this interpretation, it is perhaps possible to hold to a Platonist ontology, by stipulating that the value of Z is “fixed by nature”. However, this fixed parameter is at the same time unobservable, leading to the unsatisfying conclusion that a the propensity of event y occuring for a given subject is at once fixed and unknowable.

In the latent variable interpretation,  the predicted probabilities generated by a logistic regression are simply emanations from the “true” quantity of interest, the unobserved value of Z. An alternative interpretation is that the predicted probabilities are themselves the quantities of interest. Ontologically, this means that rather than having an exact value for Z, each case is associated with a certain probability that for that case, y=1. Of course, in the actual world we observe, each case in our dataset is either 1 or 0. But this second interpretation of the model implies that if we “ran the tape of history over again”, to paraphrase Stephen Jay Gould, the values of y for each individual case might be different; only the overall distribution of probabilities is assumed to be constant.

Thus the distinction between the Platonist and stochastic ontologies in statistics turns out to be quite orthogonal to the distinction between frequentist and Bayesian. And it is an important distinction to be aware of, because it has real practical implications for applied researchers.  It will affect, for example, the way in which we assess how well a model fits the data.

In the case of logistic regression, the Platonist view would imply that the best model possible would predict every case correctly: that is, it would yield a predicted probability of more than 0.5 when y=1, and less than 0.5 when y=0. On the stochastic view, however, that degree of predictive accuracy is a priori held to be impossible, and achieving it indicates overfitting of the model. The best one can really aim for, on this view, is a model which gets the probabilities right–so that for 10 cases with predicted probabilities of 0.1, there should should be one case where y=1 and nine where y=0.

This conundrum arises even for Ordinary Least Squares regression, even though in that case the outcome variable is continuous and the model predicts it directly. It has long been traditional to assess OLS model fit using R-squared, the proportion of variance explained by the model. Many people unthinkingly assume that because the theoretical upper bound of the R-squared statistic is 1, the maximum possible value in any particular empirical situation is also 1. But this assumption 0nce again rests on an implicit Platonist ontology. It assumes that sigma, the residual standard error of a regression, reflects only omitted variables rather than inherent variability in the outcome in question. But as Gary King observed a long time ago, if some portion of sigma is due to intrinsic, ontological variability, then the maximum value of R-squared is some unknown value less than 1.* In this case, once again, high values of R-squared may be indicators of overfitting rather than signs of a well-constructed model.

Statistics, even in its grubbiest, most applied forms, is philosophical; we ignore that aspect of quantitative practice at our peril. I am put in mind of Keynes’ remark about economic common sense and theoretical doctrine, which I will not repeat here as it is already ubiquitous.

*In practice, the residual variability may truly ontological in the sense that it is rooted in the probabilistic behavior of the physical world at the level of quantum mechanics, or it may be that all variation can be accounted for in principle, but that residual variation is irreducible in practice, because of the exteremely large number of very minor causes that contribute to the outcome. In either case, the consequence for the applied researcher is the same.

http://www.stat.columbia.edu/~cook/movabletype/archives/2007/12/intractable_is.html#more