A while back I wrote an R Script to scrape all speeches, interviews, and talks by all Australian Prime Ministers from the government’s official PM Transcripts site. I have been using these data on a project to test aspects of Moral Foundations theory. Recently I came across a great blog post on text mining from Julia Silge on a technique to identify unique word usage by group. This approach allows us to look at how a group uses particular words while accounting for the fact that some groups might talk more than others.
This got me thinking: could I use this approach to gain insight into how the conservative/centre right (Liberal) and progressive/centre left (Labor) leaders (from the 1940s to 2015) talk about migrants? This might provide some insight into how leaders think about migrants or, more likely, what they think their base thinks about migrants. I am at the beginning of this project but what I have found so far is interesting. Below is a graph of the 10 most uniquely used words that a Prime Minister of a given party uses near the word migrant, immigrant or refugee. Put simply, the words each party uniquely associate with the word migrant. All analysis was done on words that had been stemmed. Hence ‘educ’ captures the words education, educate, and educated. I will leave the interpretation of this plot up to you.
Correlation is not causation. So comes the inevitable refrain in response to anyone who presents a correlational study as evidence in a debate. There is good reason for this. People have long extrapolated from correlation to causation. Bad science and often bad policy follows. But a healthy respect for what can what we can claim about causality has given way to abject fear of any language that even hints at causality.
There is no danger in being overly cautious I hear you say.
But there have been unintentional consequences that have resulted from the barring of causal language. First, few social scientists now understand much about causality, mistakenly thinking it is that which comes from an RCT. Second, theory has become sloppy. Why waste time constructing a detailed theory of why x leads to y when a reviewer will make you tear it up.
Evidence that something has gone wrong
The biggest evidence I see that something is amiss is how reviewers and writers now interact. It is not uncommon to have a reviewer demand a writer remove all causal language from their manuscript. I have seen this include purging the word ‘effect’ from a manuscript entirely; even named theories are not immune (the Big-Fish-Little-Pond effect becomes the Big-Fish -Little-Pond association). But authors should advance causal theories in introductions!
Reviewers also display a lack of understanding about causation when they claim only an RCT can provide evidence of causality. RCTs neither provide definitive evidence of causation nor are they only way of providing evidence of causality.
Writers also make mistakes. Writers of papers I have reviewed refuse to explain how x leads to y because they didn’t do an RCT. One wonders, if they think this way, why they bothered to do the study at all. And if they are so scared of advancing a causal explanation why use a regression model that so strongly suggests that x leads to y?
Setting the record straight
In Hunting causes and using them Nancy Cartwright emphasizes causality is not a single thing (it is also worth reading her book Evidence Based Policy). So heterogeneous are the things we call causality that we might be better to abandon the term entirely. We likely need to match method and evidence to the type of causality we are chasing.
Judea Pearl in The book of why claims science would become worse not better if we were to believe that only RCTs have the power to provide evidence of causation. In such a world how would we know that smoking causes cancer?
The danger in the current social science landscape comes from a belief that causation is a dichotomy. If you did an RCT, you can advance causal claims. If you didn’t, you can’t. But causality is not a dichotomy. RCTs often can’t provide evidence of causation and sometimes provide poor evidence. RCTs are critical but we both need to be more conservative—RCTs provide some evidence sometimes—and be more liberal in allowing other designs (regression of discontinuity, instrumental variables, Granger causality) to provide evidence of causality.
What to do about it
Treat causality as a spectrum where researchers can marshal evidence that push the needle toward a causal interpretation or away from it.
View no single piece of evidence an inconvertible evidence of causality.
Write about clear and simple causal mechanisms in introductions and literature reviews.
In the method section, social science should have a section on the degree to which the results can provide evidence of causation; perhaps also the type of causation they have in mind. This should include a discussion of design, context, strength of theory, and methodology. In other words, researchers should have to make a case for their specific research rather than relying on general social science tropes.
As Cartwright suggests, we should replace general terms like cause with more specific terms like repel, pull, excite, or suppress that give a better idea of what is being claimed.
Repression, according to Freud, is a common phenomenon. Note repression is a noun here. People don’t repress. Rather repression is the name of a state that seems to happen all on its own. This is the point that Michael Billig makes in his book Learn to Write Badly: How to Succeed in the Social Sciences. Billig points out that Freud’s theory turns the verb to repress into the noun repression. Freud does this to make his theory sound scientific. But in doing so we lose a heap of important theoretical information. Who does the repression? How does it happen? What processes result in it?
My colleague in my writing circle calls this thingification. But in keeping with the theme of this post I suppose I should talk about when researchers thingify an abstract process.
I see this is my field a lot. “Growth mindset is associated with persistence”. This sounds like a sufficiently science like expression that we let it go by unexamined. We then expect the scientist to collect survey data on growth mindset and persistence and then test their relationship to determine the strength and direction of the association. But the statement “growth mindset is associated with persistence” leaves so much unsaid.
A better statement would be: children who believe their ability is fixed, and thus cannot change, are unlikely to persist in overcoming obstacles to their learning. Notice how I replace the weak term associated with richer causal language? See how this language specifies a process for the association? The child is now an actor in the sentence who believes things and acts accordingly.
In the original statement, it is the variables doing things to each other. But as John Goldthorpe states “variables don’t do things, people do”. Notice further that this sentence invites further specification? We can now ask:
How did the child come to believe this?
Do they believe only their ability is fixed or everyones?
What do such children make of a school system that demands they practice, practice, practice?
Social scientists thingify to sound more scientific. But in doing so we have created a myriad of under-specified theories and a science about people that is almost entirely absent of people.