The ontology of statistics
September 4th, 2009 | Published in Social Science, Statistics
Bayesian statistics is sometimes differentiated from its frequentist alternative with the claim that frequentists have a kind of platonist ontology, which treats the parameters they seek to estimate as being fixed by nature; Bayesians, in contrast, are said to hold a stochastic ontology in which there is variability "all the way down", as it were. This distinction implies that frequentist measurements of uncertainty refer solely to epistemological uncertainty: if we estimate that a certain variable has a mean of 50 and a standard error of two, we are saying only that we do not have enough information to specify the mean more precisely. In contrast, the Bayesian perspective (according to the view just elucidated) would hold that a measure of uncertainty includes not only epistemological but also ontological uncertainty: even with a sample size approaching infinity, the mean of the variable in question is the realization of some probability distribution and not a fixed quantity, and therefore can never be specified without uncertainty.
As regards the frequentist-Bayesian distinction, the above distinction is misleading and unhelpful. Andrew Gelman is, by any sensible account, one of the leading exponents and practitioners of Bayesian statistics, and yet he says here that "I'm a Bayesian and I think parameters are fixed by nature. But I don't know them, so I model them using random variables." Compare this to the comment of another Bayesian, Bill Jefferys: "I've always regarded the main difference between Bayesian and classical statistics to be the fact that Bayesians treat the state of nature (e.g., the value of a parameter) as a random variable, whereas the classical way of looking at it is that it's a fixed but unknown number, and that putting a probability distribution on it doesn't make sense."
For Gelman, the choice of Bayesian methods is not primarily motivated by ontological commitments, but is rather a kind of pragmatism: he adopts techniques such as shrinkage estimators, prior distributions, etc. because they give good predictions about the state of the world in cases where frequentist methods fail or cannot be applied. This, I suspect, corresponds to the inclinations and motivations of many applied researchers, who as often as not will be uninterested in the ontology implied by their methods, so long as the techniques give reasonable answers.
Moreover, if it is possible to be a Bayesian with a Platonist ontology, it is equally possible to wander into a stochastic view of the world without reaching beyond the well-accepted "classical" methods. Consider, for example, the logistic regression, which is by now a part of routine introductory statistical instruction in every field of social science. A logistic regression model does not directly predict a binary outcome y, which can be 0 or 1. Rather, it predicts the probability of such an outcome, conditional on the predictor variables. There are two ways to think about such models. One of them, the so-called "latent variable" interpretation, posits that there is some unobservable continuous variable Z, and that the outcome y is 0 if this Z variable is below a certain threshold, and 1 otherwise. If one holds to this interpretation, it is perhaps possible to hold to a Platonist ontology, by stipulating that the value of Z is "fixed by nature". However, this fixed parameter is at the same time unobservable, leading to the unsatisfying conclusion that a the propensity of event y occuring for a given subject is at once fixed and unknowable.
In the latent variable interpretation, the predicted probabilities generated by a logistic regression are simply emanations from the "true" quantity of interest, the unobserved value of Z. An alternative interpretation is that the predicted probabilities are themselves the quantities of interest. Ontologically, this means that rather than having an exact value for Z, each case is associated with a certain probability that for that case, y=1. Of course, in the actual world we observe, each case in our dataset is either 1 or 0. But this second interpretation of the model implies that if we "ran the tape of history over again", to paraphrase Stephen Jay Gould, the values of y for each individual case might be different; only the overall distribution of probabilities is assumed to be constant.
Thus the distinction between the Platonist and stochastic ontologies in statistics turns out to be quite orthogonal to the distinction between frequentist and Bayesian. And it is an important distinction to be aware of, because it has real practical implications for applied researchers. It will affect, for example, the way in which we assess how well a model fits the data.
In the case of logistic regression, the Platonist view would imply that the best model possible would predict every case correctly: that is, it would yield a predicted probability of more than 0.5 when y=1, and less than 0.5 when y=0. On the stochastic view, however, that degree of predictive accuracy is a priori held to be impossible, and achieving it indicates overfitting of the model. The best one can really aim for, on this view, is a model which gets the probabilities right--so that for 10 cases with predicted probabilities of 0.1, there should should be one case where y=1 and nine where y=0.
This conundrum arises even for Ordinary Least Squares regression, even though in that case the outcome variable is continuous and the model predicts it directly. It has long been traditional to assess OLS model fit using R-squared, the proportion of variance explained by the model. Many people unthinkingly assume that because the theoretical upper bound of the R-squared statistic is 1, the maximum possible value in any particular empirical situation is also 1. But this assumption 0nce again rests on an implicit Platonist ontology. It assumes that sigma, the residual standard error of a regression, reflects only omitted variables rather than inherent variability in the outcome in question. But as Gary King observed a long time ago, if some portion of sigma is due to intrinsic, ontological variability, then the maximum value of R-squared is some unknown value less than 1.* In this case, once again, high values of R-squared may be indicators of overfitting rather than signs of a well-constructed model.
Statistics, even in its grubbiest, most applied forms, is philosophical; we ignore that aspect of quantitative practice at our peril. I am put in mind of Keynes' remark about economic common sense and theoretical doctrine, which I will not repeat here as it is already ubiquitous.
*In practice, the residual variability may truly ontological in the sense that it is rooted in the probabilistic behavior of the physical world at the level of quantum mechanics, or it may be that all variation can be accounted for in principle, but that residual variation is irreducible in practice, because of the exteremely large number of very minor causes that contribute to the outcome. In either case, the consequence for the applied researcher is the same.