I came across a tweet last week by an academic at a conference (I can’t remember who). They were indignant that presenters were using the word ‘predict’ to describe a correlations. My first reaction was to sigh. Prediction has no causal connotation. When you go to the fair and some huckster offers to guess your age for a price, they are making a prediction based on your physical appearance. This prediction does not require a belief that your physical appearance caused your age. Such a belief is absurd. Yet prediction is still the right word.
This was my first reaction. My second, was to reflect on my use of prediction in reporting research results. While I believe ‘to predict’ requires no causal beliefs, it implies a certain level of accuracy. I have used the word predict to describe a correlation of .20. On reflection this seems wrong. Not because I am implying causation. I am not. But because I am implying a level of accuracy in predicting x from y. The implication is that by knowing a person’s value on x I can make a good guess at their value of y. But a correlation of .20 or even .40 as a basis of such a prediction would be atrociously inaccurate. Reporting such weak results using the word predict leads the public, who rightly read ‘predict’ as ‘predict accurately’, to overstate vastly the significance of the finding.
The social sciences are well known for being terrible at prediction.
Predictive accuracy is often woeful, even on the data that the researchers used to build the statistical model (see here for example). The social sciences often seem to not know about the importance of, let alone test, the predictive accuracy of a model on unseen data. Really, the only metric of predictive accuracy that matters.
In his fantastic paper, Shmueli argues that the social sciences have neglected the prediction component of science in favor of a complete focus on explanation. Mostly this is because of the mistaken belief that explanation is synonymous with prediction.
And here lies the problem. The social sciences are scathing on anyone who uses the word prediction outside of RCT research. But this fit of pique is misdirected. The willing skeptic at the fair may say “I have $10 with your name on it, if you can guess my age to within a year”. So too we should call authors on their use of “predict “ when their models are scarcely better than chance.