How I Review: 2020 Edition

In 2020 I have decided to try to refine my reviews. The impetus for this is that I think I have greater clarity about what my role should be or more correctly what my role is not.

My wife once had an internship at a publishing company. Her job was to go through the bin of unsolicited submissions and be ruthless. The company could only publish a set number of books a year, and they solicited most of the books they published. Thus her role was to reject almost all submissions. I think many reviewers think they have this job too. Many reviewers also believe they are the defender of the purity of science. This is a role I used to play. I believed that my field was a disaster and only I could fix it by standing in the way of as many articles as I could. My aim was to expunge the various sins I saw my field committing. What hubris!

I no longer believe that is my role. Ultimately, I think the role of a reviewer is a) to detect fatal flaws (a flaw that no amount of revision would fix); b) to identify any fundamental issue that should prevent publishing of any type (e.g., plagiarism, etc.); and c) determine if the article would look out-of-place among other articles in the field.

Ultimately, the role of a reviewer is to catch malfeasance and monsters.

The role of determining the place of an articles position as important or impactful or paradigm shifting is held by readers.

With this refined sense of what a reviewer should be, I have aimed to introduce the following to my own reviews:

  1. My review distribution will become increasingly bi-modal focused on either outright rejection or acceptance/ minor conditional acceptance.
  2. When I reject, my reviews are short. I outline what I think the fatal flaw was and nothing more. If an article is unsalvageable, advising on how certain paragraphs should be phrased or how APA styling should have be handled is a waste of time and confusing to the authors. The language here should be clear. There is no “I think the authors should consider…” or “Have the authors thought of…”. I am also clear in the first sentence that I do not think the article should be accepted and that I do not believe a revision could resolve the fundamental flaws I see in the paper.
  3. If I give a recommendation of conditional acceptance, I am careful of distinguishing between the few things upon which I believe are the conditions of acceptance and areas I think might improve the article. I am clear that the latter are suggestions and the authors are free to ignore them. I then try to phrase these points as questions rather than commands.
  4. If an author refuses to adjust their article in relation to something I think they should adjust, and they give reasons that are not preposterous, I let it go. You have likely received a review from me if you read “I don’t agree with the authors’ position on this issue, but my job is not to make authors write the paper how I want it written. I suggest the paper move forward to publication.”

Shorth is better than length

Dr Seuss wisely stated “shorth is better than length”. And it seems academia is slowly getting the message. Brief reports are here and will likely play an increasing role in Educational and Developmental Psychology. Having spent a while working with people in public health, I have been infected with their obsession with the brief. As Dr Seuss also said:

A writer who breeds more words than he needs creates a chore for the reader who readers.

With this advice in mind I have tended to target brief reports and encouraged others to do so. There is an issue I am seeing though. Unlike in public health, Ed and Dev reviewers are not really sure how to review brief reports. Today, for example, a post-doc at my institute was hauled over the coals by three reviewers all of whom said she had not done a thorougher review of the literature. The problem? The format she submitted to only allows for six references. My general experience is that reviewers are bringing across their expectations from long form articles and seem unwilling, unable, or unsure of how to adapt their reviews to the brief report format.

Some of the problem might lies with the editors. Maybe they don’t communicate expectations about brief reports to reviewers clearly enough. Maybe some of it is the fault of the publisher who don’t do a sufficient job signposting that an article is a brief report. Some of it is also likely teething problems as the Ed and Dev community starts to come to terms with the brief report format. What ever the reason, I think we need to address this if we are to embrace this format. And I think we should embrace the format. Generally my writing gets better and my ideas clearer when I am forced to whittle them down to the bare minimum.

So what should we do? In the long term I think there needs to be a rethink about the way different article types are flagged to reviewers by publishing systems and editors likely need to get better at: a) signalling to potential reviewers that a paper is a brief report and what that means for a given journal; b) providing authors clear directions on how to address reviewers who have requested changes that break with a brief report format; and c) providing reviewers with feedback. In the short term, when you review you should pay attention to the articles submission type and find out what the implications of this are (e.g., is there a limitation on the number of references allowed). As an author, I think it does not hurt to alert a reviewer to the fact you have written a brief report by using language like: “In this brief report …” rather than “In this paper…”.

Egalitarian School Systems Breed Better Student Performance

Cross-posted from the international education blog

Imagine you are a policymaker tasked with overhauling your countries education system. You are faced with a bewildering list of competing claims, all advanced with absolute certainty by proponents. Should you listen to the economists who want to expand school choice? What about the trendy calls to replicate the ‘Finnish miracle’? Should you try to produce a near carbon copy of the Finnish system? What about a return to old fashion tracking like the Germans and the Dutch? Empirical data exists but most of the best stuff is on local interventions that are unlikely to tell you much about how a particular policy will affect the whole education system.

This is why high quality, multi-cycle, and multi-nation studies like PISA are so valuable. They allow policymakers to compare countries with different policy environments on a common metric; literacy, numeracy, and science achievement. Policy makers can also compare results across time to see how the introduction of some policy has benefited or harmed the performance of a country’s students.

Policy Question: Does Achievement Stratification Help or Harm?

Using PISA we asked: do countries that stratify their schools by academic achievement do better in PISA tests than countries that do not? Stratification refers to lots of different policies such as school tracking, private schooling, selective schools, magnet schools, school choice polices, and school catchment area policies in countries that are highly geographically segregated by income and wealth. All these policies result in smarter children being schooled together and separately from less smart children. You can read the full text of our article here.

There are many good reasons for doing this. Perhaps smarter children can only flourish when educated among their peers. Teachers may be able to target their teaching to the level of the students. School choice may provide parents with latitude to select a school that has the right fit for their child. But stratification may also have negative effects.

Poorer performing students may enjoy the help of smarter children. And smarter children may gain a richer understanding of a topic by teaching it to other students. In addition, a now extensive body of research shows that children educated in selective schools have poorer motivation, worse self-beliefs, and lower academic interest than similarly able peers educated in comprehensive schools.

What We Found

With these competing views we collated data from five cycles of PISA to look at the relationship between the amount of stratification in a countries education system and their performance. We also looked at whether increases or decreases in stratification over time are related to improvements or declines in average academic performance. The figure below tells the story. Countries where stratification increased experienced declines in average performance.


Our research argues that policies that increase achievement stratification are associated with declines in academic performance. Should policy makers change school systems based on our research? On its own: no. Yet the results are illuminating and, taken as part of a larger body of research, suggest policymakers should aim to create more egalitarian school systems.

The Track Change Jerk

I was a track change jerk last week. Someone did something minor that I didn’t like. So I showed my distaste via the comment function in Word. I know better. 

Like most people, when working on a collaboration with a big team I eagerly await people’s comments on my comments. And while most you won’t admit it, I too get a thrill out of seeing some change I made via track changes accepted by the lead author. 

This means collaborative track changes are not low stakes. And yet we treat them like they are. I add comments on papers that are as dismissive as they are uninformative (“awkward sentence”, “this makes no sense”). I change whole sentences or paragraphs without once explaining why I thought what they had was wrong. I treat the comments section as if it is a conversation between me and the person without acknowledging that these comments will be visible to the whole team.

This is not a blog post aimed at self-flagellation. It is more a call for discussion. Do we need track change etiquette? And if we do what should it be? A few thoughts:

  1. Acknowledge that track changes in big teams are public documents and that it doesn’t hurt to be nice.
  2. Acknowledge that you are not a professional proof reader (yes that means you). So if you change something put a comment explaining why. I had a great colleague this week point out a split infinitive via comments but also acknowledged that he was not sure if they mattered anymore.
  3. Point out the superb bits. The academic mentality is so optimised around criticism we find it really hard to acknowledge good work. Recognising good work is as much a critical skill as is recognising bad work.
  4. If you have something controversial or sensitive to say, do it in person, via Skype, or—if you really really have to—via email. Don’t do it in track changes or comments.
  5. Before changing something ask “is this just my personal preference?”.

My commitment is to be a better track change colleague from here out.

The P Word

I came across a tweet last week by an academic at a conference (I can’t remember who). They were indignant that presenters were using the word ‘predict’ to describe a correlations. My first reaction was to sigh. Prediction has no causal connotation. When you go to the fair and some huckster offers to guess your age for a price, they are making a prediction based on your physical appearance. This prediction does not require a belief that your physical appearance caused your age. Such a belief is absurd. Yet prediction is still the right word.

This was my first reaction. My second, was to reflect on my use of prediction in reporting research results. While I believe ‘to predict’ requires no causal beliefs, it implies a certain level of accuracy. I have used the word predict to describe a correlation of .20. On reflection this seems wrong. Not because I am implying causation. I am not. But because I am implying a level of accuracy in predicting x from y. The implication is that by knowing a person’s value on x I can make a good guess at their value of y. But a correlation of .20 or even .40 as a basis of such a prediction would be atrociously inaccurate. Reporting such weak results using the word predict leads the public, who rightly read ‘predict’ as ‘predict accurately’, to overstate vastly the significance of the finding.

The social sciences are well known for being terrible at prediction.

Predictive accuracy is often woeful, even on the data that the researchers used to build the statistical model (see here for example). The social sciences often seem to not know about the importance of, let alone test, the predictive accuracy of a model on unseen data. Really, the only metric of predictive accuracy that matters.

In his fantastic paper, Shmueli argues that the social sciences have neglected the prediction component of science in favor of a complete focus on explanation. Mostly this is because of the mistaken belief that explanation is synonymous with prediction. 

And here lies the problem. The social sciences are scathing on anyone who uses the word prediction outside of RCT research. But this fit of pique is misdirected. The willing skeptic at the fair may say “I have $10 with your name on it, if you can guess my age to within a year”. So too we should call authors on their use of “predict “ when their models are scarcely better than chance.

Do Progressive and Conservative Leaders Talk About Migrants Differently?

A while back I wrote an R Script to scrape all speeches, interviews, and talks by all Australian Prime Ministers from the government’s official PM Transcripts site. I have been using these data on a project to test aspects of Moral Foundations theory. Recently I came across a great blog post on text mining from Julia Silge on a technique to identify unique word usage by group. This approach allows us to look at how a group uses particular words while accounting for the fact that some groups might talk more than others.

This got me thinking: could I use this approach to gain insight into how the conservative/centre right (Liberal) and progressive/centre left (Labor) leaders (from the 1940s to 2015) talk about migrants? This might provide some insight into how leaders think about migrants or, more likely, what they think their base thinks about migrants. I am at the beginning of this project but what I have found so far is interesting. Below is a graph of the 10 most uniquely used words that a Prime Minister of a given party uses near the word migrant, immigrant or refugee. Put simply, the words each party uniquely associate with the word migrant. All analysis was done on words that had been stemmed. Hence ‘educ’ captures the words education, educate, and educated. I will leave the interpretation of this plot up to you.

Use Causal Language: Ignore the Haters

Correlation is not causation. So comes the inevitable refrain in response to anyone who presents a correlational study as evidence in a debate. There is good reason for this. People have long extrapolated from correlation to causation. Bad science and often bad policy follows. But a healthy respect for what can what we can claim about causality has given way to abject fear of any language that even hints at causality. 

There is no danger in being overly cautious I hear you say. 

But there have been unintentional consequences that have resulted from the barring of causal language. First, few social scientists now understand much about causality, mistakenly thinking it is that which comes from an RCT. Second, theory has become sloppy. Why waste time constructing a detailed theory of why x leads to y when a reviewer will make you tear it up.

Evidence that something has gone wrong

The biggest evidence I see that something is amiss is how reviewers and writers now interact. It is not uncommon to have a reviewer demand a writer remove all causal language from their manuscript. I have seen this include purging the word ‘effect’ from a manuscript entirely; even named theories are not immune (the Big-Fish-Little-Pond effect becomes the Big-Fish -Little-Pond association). But authors should advance causal theories in introductions!

Reviewers also display a lack of understanding about causation when they claim only an RCT can provide evidence of causality. RCTs neither provide definitive evidence of causation nor are they only way of providing evidence of causality.  

Writers also make mistakes. Writers of papers I have reviewed refuse to explain how x leads to y because they didn’t do an RCT. One wonders, if they think this way, why they bothered to do the study at all. And if they are so scared of advancing a causal explanation why use a regression model that so strongly suggests that x leads to y?

Setting the record straight

 In Hunting causes and using them Nancy Cartwright emphasizes causality is not a single thing (it is also worth reading her book Evidence Based Policy). So heterogeneous are the things we call causality that we might be better to abandon the term entirely. We likely need to match method and evidence to the type of causality we are chasing. 

Judea Pearl in The book of why claims science would become worse not better if we were to believe that only RCTs have the power to provide evidence of causation. In such a world how would we know that smoking causes cancer? 

The danger in the current social science landscape comes from a belief that causation is a dichotomy. If you did an RCT, you can advance causal claims. If you didn’t, you can’t. But causality is not a dichotomy. RCTs often can’t provide evidence of causation and sometimes provide poor evidence. RCTs are critical but we both need to be more conservative—RCTs provide some evidence sometimes—and be more liberal in allowing other designs (regression of discontinuity, instrumental variables, Granger causality) to provide evidence of causality.

What to do about it

  1. Treat causality as a spectrum where researchers can marshal evidence that push the needle toward a causal interpretation or away from it.
  2. View no single piece of evidence an inconvertible evidence of causality.
  3. Write about clear and simple causal mechanisms in introductions and literature reviews.
  4. In the method section, social science should have a section on the degree to which the results can provide evidence of causation; perhaps also the type of causation they have in mind. This should include a discussion of design, context, strength of theory, and methodology. In other words, researchers should have to make a case for their specific research rather than relying on general social science tropes.
  5. As Cartwright suggests, we should replace general terms like cause with more specific terms like repel, pull, excite, or suppress that give a better idea of what is being claimed.