Social Science

Making things, marking time

January 27th, 2010  |  Published in Data, Political Economy, R, Work

Today Matt Yglesias revisits a favorite topic of mine, the distinction between U.S. manufacturing employment and manufacturing production. It has become increasingly common to hear liberals complain about the "decline" in American manufacturing, and lament that America doesn't "make things" anymore:

Harold Meyerson had a typical riff on this recently:

Reviving American manufacturing may be an economic and strategic necessity, without which our trade deficit will continue to climb, our credit-based economy will produce and consume even more debt, and our already-rickety ladders of economic mobility, up which generations of immigrants have climbed, may splinter altogether.

. . .

The epochal shift that's overtaken the American economy over the past 30 years  . . .  finance, which has compelled manufacturers to move offshore in search of higher profit margins . . .  retailers, who have compelled manufacturers to move offshore in search of lower prices for consumers and higher profits for themselves

. . .

Creating the better paid, less debt-ridden work force that would emerge from a shift to an economy with more manufacturing and a higher rate of unionization would reduce the huge revenue streams flowing to the Bentonvilles (Wal-Mart's home town) and the banks . . . . The campaign contributions from the financial sector to Democrats and Republicans alike now dwarf those from manufacturing -- a major reason why our government's adherence to free-trade orthodoxy in what is otherwise a mercantilist world is likely to persist.

. . .

[Sen. Sherrod] Brown . . . acknowledges that as manufacturing employs a steadily smaller share of the American work force, "younger people probably don't think about it as much" as their elders . . . . Politically, American manufacturing is in a race against time: As manufacturing becomes more alien to a growing number of Americans, its support may dwindle, even as the social, economic, and strategic need to bolster it becomes more acute. That makes push for a national industrial policy -- to become again a nation that makes things instead of debt, to build again our house upon a rock -- even more urgent.

I don't dispute that manufacturing has become "more alien" to the bulk of American working people. But I question Meyerson's explanation for why this has happened, and I wonder whether we should really be so horrified by it. The evidence suggests that the decline in manufacturing employment in this country has been driven not primarily by offshoring (as Meyerson would have it), but by a dramatic increase in productivity. Yglesias provides one graphical illustration of this; here is my home-brewed alternative, going back to World War II:

Manufacturing output and employment, 1939-2009

This picture leaves some unanswered questions, to be sure. First, one would want to know what kind of manufacturing has grown in the U.S., for one thing; however, my cursory examination of the data suggests that U.S. output is still more heavily oriented toward consumer goods over defense and aerospace production, despite what one might think. Second, it's possible that the globally integrated system of production is "hiding" labor in other parts of the supply chain, in China and other countries with low labor costs.

But I don't think the general story of rapidly increasing productivity can be easily ignored. To really reverse the decline in manufacturing employment, we would need to have something like a ban on labor-saving technologies, in order to return the U.S. economy to the low-productivity equilibrium of forty or fifty years ago. Of course, that would also require either reducing American wages to Chinese levels or imposing a level of autarchy in trade policy beyond what any left-protectionist advocates.

Needless to say, I think this modest proposal is totally undesirable, and I raise it only to suggest the folly of "rebuilding manufacturing" as a slogan for the left. As Yglesias observes in the linked post, manufacturing now seems to be going through a transition like the one that agriculture experienced in the last century: farming went from being the major activity of most people to being a niche of the economy that employs very few people. Yet of course food hasn't ceased to be one of the fundamental necessities of human life, and we produce more of it than ever.

And yet I understand the real problem that motivates the pro-manufacturing instinct among liberals. The decline in manufacturing has coincided with a massive increase in income inequality and a decline in the prospects for low-skill workers. Moreover, the decline of manufacturing has coincided with the decline of organized labor, and it is unclear whether traditional workplace-based labor union organizing can ever really succeed in a post-industrial economy.  But the nostalgia for a manufacturing-centered economy is an attempt to universalize a very specific period in the history of capitalism, one which is unlikely to recur.

The obsession with manufacturing jobs is, I think, a symptom of a larger weakness of liberal thought: the preoccupation with a certain kind of full-employment Keynesianism, predicated on the assumption that a good society is one in which everyone is engaged in full-time waged employment. But this sells short the real potential of higher productivity: less work for all. As Keynes himself observed:

For the moment the very rapidity of these changes is hurting us and bringing difficult problems to solve. Those countries are suffering relatively which are not in the vanguard of progress. We are being afflicted with a new disease of which some readers may not yet have heard the name, but of which they will hear a great deal in the years to come-‑namely, technological unemployment. This means unemployment due to our discovery of means of economising the use of labour outrunning the pace at which we can find new uses for labour.

But this is only a temporary phase of maladjustment. All this means in the long run that mankind is solving its economic problem. I would predict that the standard of life in progressive countries one hundred years hence will be between four and eight times as high as it is to‑day. There would be nothing surprising in this even in the light of our present knowledge. It would not be foolish to contemplate the possibility of afar greater progress still.

. . .

Thus for the first time since his creation man will be faced with his real, his permanent prob­lem‑how to use his freedom from pressing economic cares, how to occupy the leisure, which science and compound interest will have won for him, to live wisely and agreeably and well.

Productivity has continued to increase, just as Keynes predicted. Yet the long weekend of permanent leisure never arrives. This--and not deindustrialization--is the cruel joke played on working class. The answer is not to force people into deadening make-work jobs, but rather to acknowledge our tremendous social wealth and ensure that those who do not have access to paid work still have access to at least the basic necessities of life--through something like a guaranteed minimum income.


Geeky addendum: I thought the plot I made for this post was kind of nice and it took some figuring out to make it, so below is the R code required to reproduce it. It queries the data sources (A couple of Federal Reserve sites) directly, so no saving of files is required, and it should automatically use the most recent available data.

manemp <- read.table("http://research.stlouisfed.org/fred2/data/MANEMP.txt",
   skip=19,header=TRUE)
names(manemp) <- tolower(names(manemp))
manemp$date <- as.Date(manemp$date, format="%Y-%m-%d")
 
curdate <- format(as.Date(substr(as.character(Sys.time()),1,10)),"%m/%d/%Y")
 
outputurl <- url(paste(
   'http://www.federalreserve.gov/datadownload/Output.aspx?rel=G17&amp;series=063c8e96205b9dd107f74061a32d9dd9&amp;lastObs=&amp;from=01/01/1939&amp;to=',
   curdate,
   '&amp;filetype=csv&amp;label=omit&amp;layout=seriescolumn',sep=''))
 
manout <- read.csv(outputurl,
   as.is=TRUE,skip=1,col.names=c("date","value"))
manout$date <- as.Date(paste(manout$date,"01",sep="-"), format="%Y-%m-%d") par(mar=c(2,2,2,2)) plot(manemp$date[manemp$date&gt;="1939-01-01"],
   manemp$value[manemp$date&gt;="1939-01-01"],
type="l", col="blue", lwd=2,
xlab="",ylab="",axes=FALSE, xaxs="i")
axis(side=1,
   at=as.Date(paste(seq(1940,2015,10),"01","01",sep="-")),
   labels=seq(1940,2015,10))
text(as.Date("1955-01-01"),17500,
   "Manufacturing employment (millions)",col="blue")
axis(side=2,col="blue")
 
par(new=TRUE)
plot(manout$date,manout$value,
   type="l", col="red",axes=FALSE,xlab="",ylab="",lwd=2,xaxs="i")
text(as.Date("1975-01-01"),20,
   "Manfacturing output (% of output in 2002)", col="red")
axis(side=4,col="red")

Measuring globalization

January 25th, 2010  |  Published in Data, Social Science, Statistical Graphics

Via the Monkey Cage, an interesting and comprehensive new database, the "KOF Index of Globalization". I'm generally a bit leery of attempts to boil down complex configurations of political economy into a pat "index", but this one is reasonably straightforward, measuring both "economic globalization" (economic flows and trade restrictions) and "political globalization" (participation in international institutions and diplomatic relations.) The example graph at the Monkey Cage is interesting, but I immediately thought it would be better represented like this:

Globalization in post-Communist countries, 1991-2007

This could be cleaned up to deal with the overlapping names, and additional information might be useful (such as the average globalization score of all countries in each year, and the maximum and minimum scores), but I think this is pretty informative. You can see the overall political and economic integration of these countries into the capitalist world, for example. There's also the increasing distance between the main cluster of countries on the one hand, and the insular autocracies of Belarus and Uzbekistan on the other.

Detroit Facts

November 26th, 2009  |  Published in Cities, Social Science

Detroit facts are like the opposite of Chuck Norris facts. Each one portrays the city of Detroit as being unimaginably and implausibly screwed up and economically depressed. And unlike Chuck Norris facts, Detroit facts are true.

My favorite Detroit fact used to be: what's the average price of a house in Detroit? I would ask people this, and almost no-one gets it. When I first started asking people this a couple of years ago, it was about $10,000. I think it's less now--the median was reported as $7,500 earlier this year.

Now, however, I have a new favorite Detroit fact. In New York City, you can buy this for $600,000:

Greenwich village studio apartment

It's a very nice little studio apartment in Greenwich village. On the other hand, for only $583,000 in Detroit, you could have bought this:

Pontiac Silverdome

That's right, it's the Silverdome. Needless to say, this does not augur well for the future of Detroit. Nor does this:

This plots gains in house prices with post-crash declines in different cities. Detroit's housing prices didn't really go up during the bubble, but they've come down with the crash.  Which suggests that it's the underlying weakness of the local economy that's bringing down prices. I think urbanists need to be thinking a lot harder about what we can do about places like this--bringing them back to their former glory seems impossible, but to simply abandon the people who live there would be immoral. We need a strategy for, quite frankly, gradually letting these places shrink. See also Ed Glaeser on the case for letting Buffalo die.

Elster on the Social Sciences

October 20th, 2009  |  Published in Social Science, Sociology, Statistics

The present crisis looks as though it may bring about a long-delayed moment of reckoning in the field of economics. Macro-economics has been plunged into turmoil now that many of its leading practitioners stand exposed as dogmatists blithely clinging to absurd pre-Keynesian notions about the impossibility of economic stimulus and the inherent rationality of markets, who have nothing at all to say about the roots of the current turmoil. Micro-economics, meanwhile, has seen Freakonomics run its course, as long-standing criticisms of the obsession with "clean identification" over meaningful questions spill over into a new row over climate-change denialism.

Joining the pile-on, Jon Elster has an article in the electronic journal Capitalism and Society on the "excessive ambitions" of the social sciences. Focusing on economics--but referring to related fields--he criticizes three main lines of inquiry: rational choice theory, behavioral economics, and statistical inference.

Although I agree with most of the article's arguments, much of it seemed rather under-argued. At various points, Elster's argument seems to be: "I don't need to provide an example of this; isn't it obvious?" And with respect to his claim that "much work in economics and political science is devoid of empirical, aesthetic, or mathematical interest, which means that it has no value at all", I'm inclined to agree. But it's hard for me to say that Elster is contributing a whole lot to the discussion. I'm also a bit skeptical of the claim that behavioral economics has "predictive but not prescriptive implications", given the efforts of people like Cass Sunstein to implement "libertarian paternalist" policies based on an understanding of some of the irrationalities studied in behavioral research.

But the part of the essay closest to my own interests was on data analysis. Here Elster is wading into the well-travelled terrain of complaining about poorly reasoned statistical analysis. He himself admits to being inexpert in these matters, and so relies on others, especially David Freedman. But he still sees fit to proclaim that we are awash in work that is both methodogically suspect and insufficiently engaged with its empirical substance.

The criticisms raised are all familiar. The specter of "data snooping . . . curve-fitting . . . arbitrariness in the measurement of . . . variables", and so on, all fit under the rubric of what Freedman called "data driven model selection". And indeed these things are all problems. But much of Elster's discussion suffers from his lack of familiarity with the debates. He refers repeatedly to the problem of statistical significance testing--both the confusion of statistical and substantive significance, and the arbitrariness of the traditional 5% threshold for detecting effects. While I wouldn't deny that these abuses persist, I think that years of relentless polemics on this issue from people like Deirdre McCloskey and Jacob Cohen have had an impact, and practice has begun to shift in a more productive direction.

Elster never really moves beyond these technical details to grapple with the larger philosphical issues that arise in applied statistics. For example, all of the problems with statistical significance arise from an over-reliance on the null hypothesis testing model of inference--even though as Andrew Gelman says, the true value of a parameter is never zero in any real social science situation. Simply by moving in the direction of estimating the magnitude of effects and their confidence intervals, we can avoid many of these problems.

And although Freedman makes a number of very important criticisms of standard practice, the article that Elster relies upon relies very heavily on the weakness of the causal claims made about regression models. As a superior model, Freedman invokes John Snow's analysis of cholera in the 1850's, which used simple methods but relied upon identifying a natural experiment in which different houses received their water from different sources. In this respect, the article is redolent of the time it was published (1991), when the obsession with clean identification and natural experiments was still gaining steam, and valid causal inference seemed like the most important goal of social science.

Yet we now see the limitations of that research agenda. It's rare and fortuitous to find a situation like Snow's cholera study, in which a vitally important question is illuminated by a clean natural experiment. All too often, the search for identification leads researchers to study obscure topics of little general relevance, thereby gaining internal validity (verifiable causality in a given data set) at the expense external validity (applicability to broader social situations). This is what has led to the stagnation of Freakonomics-style research. What we have to accept, I think, is that it is often impossible to find an analytical strategy which is both free of strong assumptions about causality and applicable beyond a narrow and artificial situation. The goal of causal inference, that is, is a noble but often futile pursuit. In place of causal inference, what we must often do instead is causal interpretation, in which essentially descriptive tools (such as regression) are interpreted causally based on prior knowledge, logical argument and empirical tests that persuasively refute alternative explanations.**

This is, I think, consistent with the role Elster proposes for data analysis, in the closing of his essay: an enterprize which "turns on substantive causal knowledge of the field in question together with the imagination to concoct testable implications that can establish 'novel facts'". And Elster gives some useful practical suggestions for improving results, such as partitioning data sets, fitting models on only one half, and not looking at the other half of the cases until a model is decided upon. But as with many rants against statistical malpractice, it seems to me that the real sociological issue is being sidestepped, which is that the institutional structure of social science strongly incentivizes malpractice. To put it another way, the purpose of academic social science is not, in general, to produce valid inference about the world; it is to produce publications. As long as that is the case, it seems unlikely that bad practices can be definitively stamped out.

**Addendum: Fabio Rojas says what I wanted to say, rather more concisely. He notes that "identification is a luxury when you have an abundance of data and a pretty clear idea about what casual effects you care about.". Causal inference where possible, causal interpretation where necessary, ought to be the guiding principle. Via the Social Science Statistics blog, there is also a very interesting paper by Angus Deaton on the problems of causal inference. Of particular note is the difficulty of testing the assumptions behind instrumental variables methods, and the often-elided distinction between an instrument that is external to the process under investigation (that is, not caused by the system being studied) and one that is truly exogenous (that is, uncorrelated with the error term in the regression of the outcome on the predictor of interest.)

The data seem so much less real once you ask the same person the same question twice

October 12th, 2009  |  Published in Data, Social Science, Statistics

I identify with Jeremy Freese to an unhealthy degree. When the other options are to a) have a life; or b) do something that advances his career, he chooses to concoct a home-brewed match between GSS respondents in 2006 and their 2008 re-interviews. I would totally do this. I still might do this.

And then he drops the brutal insight that provides my title. Context.

UPDATE: And then Kieran Healy drops this:

The real distinction between qualitative and quantitative is not widely appreciated. People think it has something to do with counting versus not counting, but this is a mistake. If the interpretive work necessary to make sense of things is immediately obvious to everyone, it’s qualitative data. If the interpretative work you need to do is immediately obvious only to experts, it’s quantitative data.

Trans-Europe Express

September 21st, 2009  |  Published in Art and Literature, Work

Compare and contrast:

"Work is where they find their real fulfilment--running an investment bank , designing an airport, bringing on stream a new family of antibiotics. If their work is satisfying people don't need leisure in the old-fashioned sense. No one ever asks what Newton or Darwin did to relax, or how Bach spent his weekends. At Eden-Olympia work is the ultimate play, and play the ultimate work."  --J.G. Ballard, Super-Cannes

"Old premise: work sucks, and after decades of toil, one has “earned the right” to get paid to do nothing. New premise: work is self-defined, self-led and empowering. Small-scale and global-reach entrepreneurship is a reality and this will make work a joy rather than a painful necessity." -Pascal-Emmanuel Gobry, The American Scene

The libertarian right assures us that the preceding is a description of utopia.

This message intellectually sponsored by the Work Less Party.

Quantity, Quality, Social Science

September 17th, 2009  |  Published in Social Science, Statistics

Henry Farrell expresses the duality of social scientific thought by invoking a passage from one of my favorite books, Calvino's Invisible Cities. The comments spin out the eternal quantitative vs. qualitative research debate, in both more and less interesting permutations.

Historically and philosophically, the whole qual-quant divide is an important object of social science, since it is itself a consequence of the same process of modernity and capitalist development that produces social science itself. It is only when society and its institutions appear at a scale too large for the human mind to grasp all at once that we require abstractions--particularly statistical and mathematic ones--to simplifiy and describe our social world to ourselves.

Within academia, however, there is a seemingly inescapable sense that qualitative and quantitative epistemologies are locked in some kind of zero-sum competition. These days people like to talk about "mixed methods", but I agree with some commenters in the above thread that this too often amounts to doing a quantitative study and then using qualitative examples (from interviews or ethnography or whatever) as examples or window dressing.

It seems to me that a lot of this is driven by a misapprehension about what either approach is really good for.  The problem is that we expect quantitative and qualitative approaches to do the same kind of thing; that is, to collect data and use them to test well-defined hypotheses. I find that quantitative approaches are generally quite useful for taking well-defined concepts, and reasonably precise operationalizations of those concepts, and testing the interrelations between them. If your question is "do high tax rates inhibit economic growth", and you have acceptable definitions and data for the subject and object of that hypothesis, then you can make useful--though never definitive--inferences using quantitative methods.

Qualitative methods are less often (though sometimes) suited to this kind of thing, because they are by nature rooted in the idiosyncracies of specific cases and hence are difficult to generalize. What qualitative work is really good for, I think, is in generating concepts. Quantitative analysis presupposes a huge conceptual apparatus: from the way ideas are operationalized, to the way survey questions are written, to the way variables are defined, to the way models are parameterized. Some of these presuppositions can be adjusted in the course of an analysis, but others are deeply encoded in the information we use. If  you want to know whether the categories of a "race" variable are appropriate, the best strategy is probably a qualitative one, which will examine how racial categories are experienced by people, and how they operate in everyday life. Likewise, new hypotheses can arise from "thick description" which would not be apparent from consulting large tables of numbers.

This, however, brings up an issue that will probably be uncomfortable for a lot of qualitative social scientists, particular those who are concerned with defending the "scientific" credentials of their work. Namely, can we draw a clear boundary between qualitative social science, journalism, and even fiction, with regards to their utility for driving the concept-formation process? Social science typically differentiates itself from mere journalism by its greater rigour; yet in my reading, the kind of rigour which is most important to qualitative work will be its interpretive rigour, rather than its precision in research design and data-gathering. Whether one is starting with ethnographic field notes or with The Wire, the point is to draw out and develop concepts and hypotheses in a sufficiently precise way that they can be tested with larger-scale (which is to say, generally quantitative) empirical data.

To put things this way seems to slide into a kind of cultural studies, except that the latter tends to set itself up as oppositional, rather than complementary, to quantitative empirical work. We would do far better, I think, to recognize that data analysis without qualitative conceptual interpretation is sterile and stagnant, while qualitative analysis without large-scale empiricism will tend to be speculative and inconclusive.

The Game Beyond the Game

September 10th, 2009  |  Published in Art and Literature, Cities, Politics, Social Science, Sociology

The new issue of City and Community has an article by Peter Dreier and John Atlas about a show that captivates many an urban sociologist, The Wire. Their piece extends comments they made last year in Dissent, in a symposium about the show. In both pieces, they repeat the common accusation that the show is nihilistic, because it presents urban problems but doesn't show any solutions to them. To bolster the point, they dredge up a quotation from an interview, in which Simon proclaims that meaningful change is impossible "within the current political structure".

As a corrective to what they see as The Wire's shortcomings, Dreier and Atlas catalogue some of the real community activists who have struggled against injustice in Baltimore, and won some small victories. And these are indeed inspiring and courageous people, who have managed to win some real improvements in people's lives. But by bringing them up and presenting them as the solution to all the problems The Wire portrays, I think Dreier and Atlas miss the point of what David Simon and Ed Burns are doing with the show.

It's misleading to say that The Wire is nihilistic. It's true that the problems it portrays appear, within the context of the narrative, to be insoluble. And it may even seem, initially, as though the show is sympathetic to a conservative position: the poor will always be with us, government intervention always makes things worse, so we might as well just give up and try to make things better in our own small, individualist way. But this would be a profound misreading, because the show suggests, not that there are no solutions, but something far more complex. We come to understand, as the seasons unfold, that each of the dysfunctional institutions we see is embedded in a larger system that goes far beyond the scale of Baltimore. There is, as Stringer Bell puts it in season 3, "a game beyond the game". We therefore have to conclude, not that there are no solutions, but that there may be no solutions at the scale of a single city.

The police find themselves hamstrung by their need to deal with national agencies like the FBI, which has been caught up in the mania of the "war on terror". The dockworkers find their way of life destroyed by the automation and the transformation of the global shipping industry. The mayor is at the mercy of Maryland state politics because he needs funding. The local newspaper struggles, and fails, to adjust to a world of profit-driven news and competition from new media. Even the drug dealers are at the mercy of their out-of-town "connect".

None of this implies that Baltimore's doom is inevitable. Neither imperialism, nor neoliberalism, nor Republican domination of state politics, nor the tabloidization of all journalism are inevitable. If they seem that way on the show, it is because of the careful and clever way in which the story is framed: these larger-scale institutions, the ones where the real agency lies, are always kept off screen and held beyond the reach of the characters. Thus the world the characters inhabit appears to them to be one where nothing can be changed. That doesn't mean that the world of the show, that we viewers can sense, is actually so tragic.

But is true that none of these problems can be solved in a single city, and most of them require a long-term, and fairly radical project of social transformation. This may present difficulties for liberals who would prefer that social problems have incremental, non-threatening solutions. But by presenting small-scale local activism as an adequate response, Dreier and Atlas do a disservice both to the problems they address, and to the activists themselves.

Perhaps, however, their real political objective is somewhat different from simply promoting the importance of urban collective action. The giveaway comes at the end of the City and Community version of their essay:

Perhaps, a year or two from now, Simon or another writer will propose a new series to TV networks about the inner workings of the White House and an idealistic young president, a former community organizer, who uses his bully pulpit to mobilize the American people around their better instincts.

This president would challenge the influence of big business and its political allies, to build a movement, a New Deal for the 21st century, to revitalize an economy brought to its knees by Wall Street greed, address the nation's health care and environmental problems, provide adequate funding for inner-city schools, reduce poverty and homelessness, and strengthen the power of unions and community groups.

A show like that would certainly be a nice bit of wish-fulfilment for liberals who like to imagine a "great man" riding in and fulfilling all their fantasies. But it's unclear what has to do with our world, in which an ambitious young politician used his charisma and the wishful thinking of his base to ride to power, and then proceeded to cater to the needs of bankers and insurance companies while sinking America ever deeper into an intractable war in Afghanistan. Faced with that reality, the world of The Wire doesn't look so nihilistic or unrealistic after all.

Never been in a (language) riot

September 7th, 2009  |  Published in Art and Literature, Social Science

I just got through a summer-long reading of David Foster Wallace's Infinite Jest, which has consequently invaded all my waking thoughts. Among the conceits of the book is that a character, one Avril Incandenza, is fanatical about proper grammatical usage to the point of helping to incite the "M.I.T. language riots" at some point in the early 21st century.

As an undergraduate, I studied linguistics, so I was unbelievably tickled by Avril's character. There are even references to Montague grammar, the logical formalisms and lambda-calculus of which I remember well, and whose descendants took up an unhealthy amount of my collegiate time.

But the funniest thing about Avril's character is how exactly contrary she is to everything I know about really existing American academic linguistics. This, after all, is a woman who does things like  replacing commas with semicolons on public signage and correcting "they" to "he or she" in her son's speech. Yet the one thing that has stuck with me from my linguistic education is the idea that these kinds of rules are totally meaningless and stupid.

We used to talk about prescriptive and descriptive linguistics. (Wallace was no doubt aware of this, as he had Avril be a member of the "prescriptive grammarians of Massachusetts.) Prescriptive grammar meant telling people how they were supposed to use language, like your elementary school teacher telling you not to say "ain't" or warning you against ending sentences with prepositions. Descriptive grammar, by contrast, was what real scientific linguists did. Its premise was that whatever people actually said was the real language, and it was our job to document that. All of the prescriptive rules were just superstitions or attempts by privileged social strata to make their way of speaking seem more "correct" than that of the less advantaged.

Now that I've slid over into a new career as a social scientist, I find that I'm all the more committed to this prescriptivist dogma, and I newly appreciate its sociological sophistication. All too many social scientists, who are otherwise eager to acknowledge the role of social construction and power relations in making our social world, nevertheless accept the reality and the usefulness of grammatical rules. Whereas even the most apolitical of the linguists I have known would dismiss such rules in an instant as irrational prescriptivism.

But it turns out that what I see as the only sensible way of understanding language is still very much a minority view.  And this always surprises me. It's not that I'm unaccustomed to holding unpopular views; I am after all, a socialist. But somehow the language issue seems like it should be more common-sense, less divisive. And then I read something like this, from an otherwise excellent Infinite Jest-related blog:

My argument is that as long as we agree that there are standards of grammar and spelling that we should aspire to (and most of us do agree), deviations will be seen as ignorance and possibly reflect poorly on the intelligence and abilities of the writer and therefore should be corrected. Since when is pointing out people's mistakes the same as telling them you think they are second-class human beings?

Well, as regards the parenthetical assumption: I do not agree! And I find it slightly appalling that others do agree. It's not even that, in practice, I disagree with this author's advice. I can understand advocating prescriptive grammar in the same way that one would advocate, say, wearing a tie to a job interview: it may not make sense, it may not have anything to do with anything, but it's what people expect and sometimes it's best to just go with the flow and accede to the demands of the social structure.  The "will be seen as" in the sentence above suggests that kind of argument. But I get the sense that this is not how prescriptive grammarians feel, even smart and educated ones. They think that obeying pointless grammar rules really is somehow indicative of one's intelligence or self-discipline or whatever.

What a waste. Not only does prescriptive grammar reinforce class hierarchies, it cuts educated and affluent people off from the richness, dynamism, and power of everyday American language.  Even if there weren't all the other objections I've already adduced, there'd be this: in traditional upper-class white American English, there is no word for wack.

The ontology of statistics

September 4th, 2009  |  Published in Social Science, Statistics

Bayesian statistics is sometimes differentiated from its frequentist alternative with the claim that frequentists have a kind of platonist ontology, which treats the parameters they seek to estimate as being fixed by nature; Bayesians, in contrast, are said to hold a stochastic ontology in which there is variability "all the way down", as it were. This distinction implies that frequentist measurements of uncertainty refer solely to epistemological uncertainty:  if we estimate that a certain variable has a mean of 50 and a standard error of two, we are saying only that we do not have enough information to specify the mean more precisely. In contrast, the Bayesian perspective (according to the view just elucidated) would hold that a measure of uncertainty includes not only epistemological but also ontological uncertainty: even with a sample size approaching infinity, the mean of the variable in question is the realization of some probability distribution and not a fixed quantity, and therefore can never be specified without uncertainty.

As regards the frequentist-Bayesian distinction, the above distinction is misleading and unhelpful.  Andrew Gelman is, by any sensible account, one of the leading exponents and practitioners of Bayesian statistics, and yet he says here that "I'm a Bayesian and I think parameters are fixed by nature. But I don't know them, so I model them using random variables." Compare this to the comment of another Bayesian, Bill Jefferys: "I've always regarded the main difference between Bayesian and classical statistics to be the fact that Bayesians treat the state of nature (e.g., the value of a parameter) as a random variable, whereas the classical way of looking at it is that it's a fixed but unknown number, and that putting a probability distribution on it doesn't make sense."

For Gelman, the choice of Bayesian methods is not primarily motivated by ontological commitments, but is rather a kind of pragmatism: he adopts techniques such as shrinkage estimators, prior distributions, etc. because they give good predictions about the state of the world in cases where frequentist methods fail or cannot be applied. This, I suspect, corresponds to the inclinations and motivations of many applied researchers, who as often as not will be uninterested in the ontology implied by their methods, so long as the techniques give reasonable answers.

Moreover, if it is possible to be a Bayesian with a Platonist ontology, it is equally possible to wander into a stochastic view of the world without reaching beyond the well-accepted "classical" methods. Consider, for example, the logistic regression, which is by now a part of routine introductory statistical instruction in every field of social science.  A logistic regression model does not directly predict a binary outcome y, which can be 0 or 1. Rather, it predicts the probability of such an outcome, conditional on the predictor variables.  There are two ways to think about such models. One of them, the so-called "latent variable" interpretation, posits that there is some unobservable continuous variable Z, and that the outcome y is 0 if this Z variable is below a certain threshold, and 1 otherwise. If one holds to this interpretation, it is perhaps possible to hold to a Platonist ontology, by stipulating that the value of Z is "fixed by nature". However, this fixed parameter is at the same time unobservable, leading to the unsatisfying conclusion that a the propensity of event y occuring for a given subject is at once fixed and unknowable.

In the latent variable interpretation,  the predicted probabilities generated by a logistic regression are simply emanations from the "true" quantity of interest, the unobserved value of Z. An alternative interpretation is that the predicted probabilities are themselves the quantities of interest. Ontologically, this means that rather than having an exact value for Z, each case is associated with a certain probability that for that case, y=1. Of course, in the actual world we observe, each case in our dataset is either 1 or 0. But this second interpretation of the model implies that if we "ran the tape of history over again", to paraphrase Stephen Jay Gould, the values of y for each individual case might be different; only the overall distribution of probabilities is assumed to be constant.

Thus the distinction between the Platonist and stochastic ontologies in statistics turns out to be quite orthogonal to the distinction between frequentist and Bayesian. And it is an important distinction to be aware of, because it has real practical implications for applied researchers.  It will affect, for example, the way in which we assess how well a model fits the data.

In the case of logistic regression, the Platonist view would imply that the best model possible would predict every case correctly: that is, it would yield a predicted probability of more than 0.5 when y=1, and less than 0.5 when y=0. On the stochastic view, however, that degree of predictive accuracy is a priori held to be impossible, and achieving it indicates overfitting of the model. The best one can really aim for, on this view, is a model which gets the probabilities right--so that for 10 cases with predicted probabilities of 0.1, there should should be one case where y=1 and nine where y=0.

This conundrum arises even for Ordinary Least Squares regression, even though in that case the outcome variable is continuous and the model predicts it directly. It has long been traditional to assess OLS model fit using R-squared, the proportion of variance explained by the model. Many people unthinkingly assume that because the theoretical upper bound of the R-squared statistic is 1, the maximum possible value in any particular empirical situation is also 1. But this assumption 0nce again rests on an implicit Platonist ontology. It assumes that sigma, the residual standard error of a regression, reflects only omitted variables rather than inherent variability in the outcome in question. But as Gary King observed a long time ago, if some portion of sigma is due to intrinsic, ontological variability, then the maximum value of R-squared is some unknown value less than 1.* In this case, once again, high values of R-squared may be indicators of overfitting rather than signs of a well-constructed model.

Statistics, even in its grubbiest, most applied forms, is philosophical; we ignore that aspect of quantitative practice at our peril. I am put in mind of Keynes' remark about economic common sense and theoretical doctrine, which I will not repeat here as it is already ubiquitous.

*In practice, the residual variability may truly ontological in the sense that it is rooted in the probabilistic behavior of the physical world at the level of quantum mechanics, or it may be that all variation can be accounted for in principle, but that residual variation is irreducible in practice, because of the exteremely large number of very minor causes that contribute to the outcome. In either case, the consequence for the applied researcher is the same.

http://www.stat.columbia.edu/~cook/movabletype/archives/2007/12/intractable_is.html#more