How I Review: 2020 Edition

In 2020 I have decided to try to refine my reviews. The impetus for this is that I think I have greater clarity about what my role should be or more correctly what my role is not.

My wife once had an internship at a publishing company. Her job was to go through the bin of unsolicited submissions and be ruthless. The company could only publish a set number of books a year, and they solicited most of the books they published. Thus her role was to reject almost all submissions. I think many reviewers think they have this job too. Many reviewers also believe they are the defender of the purity of science. This is a role I used to play. I believed that my field was a disaster and only I could fix it by standing in the way of as many articles as I could. My aim was to expunge the various sins I saw my field committing. What hubris!

I no longer believe that is my role. Ultimately, I think the role of a reviewer is a) to detect fatal flaws (a flaw that no amount of revision would fix); b) to identify any fundamental issue that should prevent publishing of any type (e.g., plagiarism, etc.); and c) determine if the article would look out-of-place among other articles in the field.

Ultimately, the role of a reviewer is to catch malfeasance and monsters.

The role of determining the place of an articles position as important or impactful or paradigm shifting is held by readers.

With this refined sense of what a reviewer should be, I have aimed to introduce the following to my own reviews:

  1. My review distribution will become increasingly bi-modal focused on either outright rejection or acceptance/ minor conditional acceptance.
  2. When I reject, my reviews are short. I outline what I think the fatal flaw was and nothing more. If an article is unsalvageable, advising on how certain paragraphs should be phrased or how APA styling should have be handled is a waste of time and confusing to the authors. The language here should be clear. There is no “I think the authors should consider…” or “Have the authors thought of…”. I am also clear in the first sentence that I do not think the article should be accepted and that I do not believe a revision could resolve the fundamental flaws I see in the paper.
  3. If I give a recommendation of conditional acceptance, I am careful of distinguishing between the few things upon which I believe are the conditions of acceptance and areas I think might improve the article. I am clear that the latter are suggestions and the authors are free to ignore them. I then try to phrase these points as questions rather than commands.
  4. If an author refuses to adjust their article in relation to something I think they should adjust, and they give reasons that are not preposterous, I let it go. You have likely received a review from me if you read “I don’t agree with the authors’ position on this issue, but my job is not to make authors write the paper how I want it written. I suggest the paper move forward to publication.”

Shorth is better than length

Dr Seuss wisely stated “shorth is better than length”. And it seems academia is slowly getting the message. Brief reports are here and will likely play an increasing role in Educational and Developmental Psychology. Having spent a while working with people in public health, I have been infected with their obsession with the brief. As Dr Seuss also said:

A writer who breeds more words than he needs creates a chore for the reader who readers.

With this advice in mind I have tended to target brief reports and encouraged others to do so. There is an issue I am seeing though. Unlike in public health, Ed and Dev reviewers are not really sure how to review brief reports. Today, for example, a post-doc at my institute was hauled over the coals by three reviewers all of whom said she had not done a thorougher review of the literature. The problem? The format she submitted to only allows for six references. My general experience is that reviewers are bringing across their expectations from long form articles and seem unwilling, unable, or unsure of how to adapt their reviews to the brief report format.

Some of the problem might lies with the editors. Maybe they don’t communicate expectations about brief reports to reviewers clearly enough. Maybe some of it is the fault of the publisher who don’t do a sufficient job signposting that an article is a brief report. Some of it is also likely teething problems as the Ed and Dev community starts to come to terms with the brief report format. What ever the reason, I think we need to address this if we are to embrace this format. And I think we should embrace the format. Generally my writing gets better and my ideas clearer when I am forced to whittle them down to the bare minimum.

So what should we do? In the long term I think there needs to be a rethink about the way different article types are flagged to reviewers by publishing systems and editors likely need to get better at: a) signalling to potential reviewers that a paper is a brief report and what that means for a given journal; b) providing authors clear directions on how to address reviewers who have requested changes that break with a brief report format; and c) providing reviewers with feedback. In the short term, when you review you should pay attention to the articles submission type and find out what the implications of this are (e.g., is there a limitation on the number of references allowed). As an author, I think it does not hurt to alert a reviewer to the fact you have written a brief report by using language like: “In this brief report …” rather than “In this paper…”.

Egalitarian School Systems Breed Better Student Performance

Cross-posted from the international education blog

Imagine you are a policymaker tasked with overhauling your countries education system. You are faced with a bewildering list of competing claims, all advanced with absolute certainty by proponents. Should you listen to the economists who want to expand school choice? What about the trendy calls to replicate the ‘Finnish miracle’? Should you try to produce a near carbon copy of the Finnish system? What about a return to old fashion tracking like the Germans and the Dutch? Empirical data exists but most of the best stuff is on local interventions that are unlikely to tell you much about how a particular policy will affect the whole education system.

This is why high quality, multi-cycle, and multi-nation studies like PISA are so valuable. They allow policymakers to compare countries with different policy environments on a common metric; literacy, numeracy, and science achievement. Policy makers can also compare results across time to see how the introduction of some policy has benefited or harmed the performance of a country’s students.

Policy Question: Does Achievement Stratification Help or Harm?

Using PISA we asked: do countries that stratify their schools by academic achievement do better in PISA tests than countries that do not? Stratification refers to lots of different policies such as school tracking, private schooling, selective schools, magnet schools, school choice polices, and school catchment area policies in countries that are highly geographically segregated by income and wealth. All these policies result in smarter children being schooled together and separately from less smart children. You can read the full text of our article here.

There are many good reasons for doing this. Perhaps smarter children can only flourish when educated among their peers. Teachers may be able to target their teaching to the level of the students. School choice may provide parents with latitude to select a school that has the right fit for their child. But stratification may also have negative effects.

Poorer performing students may enjoy the help of smarter children. And smarter children may gain a richer understanding of a topic by teaching it to other students. In addition, a now extensive body of research shows that children educated in selective schools have poorer motivation, worse self-beliefs, and lower academic interest than similarly able peers educated in comprehensive schools.

What We Found

With these competing views we collated data from five cycles of PISA to look at the relationship between the amount of stratification in a countries education system and their performance. We also looked at whether increases or decreases in stratification over time are related to improvements or declines in average academic performance. The figure below tells the story. Countries where stratification increased experienced declines in average performance.


Our research argues that policies that increase achievement stratification are associated with declines in academic performance. Should policy makers change school systems based on our research? On its own: no. Yet the results are illuminating and, taken as part of a larger body of research, suggest policymakers should aim to create more egalitarian school systems.

The Track Change Jerk

I was a track change jerk last week. Someone did something minor that I didn’t like. So I showed my distaste via the comment function in Word. I know better. 

Like most people, when working on a collaboration with a big team I eagerly await people’s comments on my comments. And while most you won’t admit it, I too get a thrill out of seeing some change I made via track changes accepted by the lead author. 

This means collaborative track changes are not low stakes. And yet we treat them like they are. I add comments on papers that are as dismissive as they are uninformative (“awkward sentence”, “this makes no sense”). I change whole sentences or paragraphs without once explaining why I thought what they had was wrong. I treat the comments section as if it is a conversation between me and the person without acknowledging that these comments will be visible to the whole team.

This is not a blog post aimed at self-flagellation. It is more a call for discussion. Do we need track change etiquette? And if we do what should it be? A few thoughts:

  1. Acknowledge that track changes in big teams are public documents and that it doesn’t hurt to be nice.
  2. Acknowledge that you are not a professional proof reader (yes that means you). So if you change something put a comment explaining why. I had a great colleague this week point out a split infinitive via comments but also acknowledged that he was not sure if they mattered anymore.
  3. Point out the superb bits. The academic mentality is so optimised around criticism we find it really hard to acknowledge good work. Recognising good work is as much a critical skill as is recognising bad work.
  4. If you have something controversial or sensitive to say, do it in person, via Skype, or—if you really really have to—via email. Don’t do it in track changes or comments.
  5. Before changing something ask “is this just my personal preference?”.

My commitment is to be a better track change colleague from here out.

The P Word

I came across a tweet last week by an academic at a conference (I can’t remember who). They were indignant that presenters were using the word ‘predict’ to describe a correlations. My first reaction was to sigh. Prediction has no causal connotation. When you go to the fair and some huckster offers to guess your age for a price, they are making a prediction based on your physical appearance. This prediction does not require a belief that your physical appearance caused your age. Such a belief is absurd. Yet prediction is still the right word.

This was my first reaction. My second, was to reflect on my use of prediction in reporting research results. While I believe ‘to predict’ requires no causal beliefs, it implies a certain level of accuracy. I have used the word predict to describe a correlation of .20. On reflection this seems wrong. Not because I am implying causation. I am not. But because I am implying a level of accuracy in predicting x from y. The implication is that by knowing a person’s value on x I can make a good guess at their value of y. But a correlation of .20 or even .40 as a basis of such a prediction would be atrociously inaccurate. Reporting such weak results using the word predict leads the public, who rightly read ‘predict’ as ‘predict accurately’, to overstate vastly the significance of the finding.

The social sciences are well known for being terrible at prediction.

Predictive accuracy is often woeful, even on the data that the researchers used to build the statistical model (see here for example). The social sciences often seem to not know about the importance of, let alone test, the predictive accuracy of a model on unseen data. Really, the only metric of predictive accuracy that matters.

In his fantastic paper, Shmueli argues that the social sciences have neglected the prediction component of science in favor of a complete focus on explanation. Mostly this is because of the mistaken belief that explanation is synonymous with prediction. 

And here lies the problem. The social sciences are scathing on anyone who uses the word prediction outside of RCT research. But this fit of pique is misdirected. The willing skeptic at the fair may say “I have $10 with your name on it, if you can guess my age to within a year”. So too we should call authors on their use of “predict “ when their models are scarcely better than chance.

Do Progressive and Conservative Leaders Talk About Migrants Differently?

A while back I wrote an R Script to scrape all speeches, interviews, and talks by all Australian Prime Ministers from the government’s official PM Transcripts site. I have been using these data on a project to test aspects of Moral Foundations theory. Recently I came across a great blog post on text mining from Julia Silge on a technique to identify unique word usage by group. This approach allows us to look at how a group uses particular words while accounting for the fact that some groups might talk more than others.

This got me thinking: could I use this approach to gain insight into how the conservative/centre right (Liberal) and progressive/centre left (Labor) leaders (from the 1940s to 2015) talk about migrants? This might provide some insight into how leaders think about migrants or, more likely, what they think their base thinks about migrants. I am at the beginning of this project but what I have found so far is interesting. Below is a graph of the 10 most uniquely used words that a Prime Minister of a given party uses near the word migrant, immigrant or refugee. Put simply, the words each party uniquely associate with the word migrant. All analysis was done on words that had been stemmed. Hence ‘educ’ captures the words education, educate, and educated. I will leave the interpretation of this plot up to you.

Use Causal Language: Ignore the Haters

Correlation is not causation. So comes the inevitable refrain in response to anyone who presents a correlational study as evidence in a debate. There is good reason for this. People have long extrapolated from correlation to causation. Bad science and often bad policy follows. But a healthy respect for what can what we can claim about causality has given way to abject fear of any language that even hints at causality. 

There is no danger in being overly cautious I hear you say. 

But there have been unintentional consequences that have resulted from the barring of causal language. First, few social scientists now understand much about causality, mistakenly thinking it is that which comes from an RCT. Second, theory has become sloppy. Why waste time constructing a detailed theory of why x leads to y when a reviewer will make you tear it up.

Evidence that something has gone wrong

The biggest evidence I see that something is amiss is how reviewers and writers now interact. It is not uncommon to have a reviewer demand a writer remove all causal language from their manuscript. I have seen this include purging the word ‘effect’ from a manuscript entirely; even named theories are not immune (the Big-Fish-Little-Pond effect becomes the Big-Fish -Little-Pond association). But authors should advance causal theories in introductions!

Reviewers also display a lack of understanding about causation when they claim only an RCT can provide evidence of causality. RCTs neither provide definitive evidence of causation nor are they only way of providing evidence of causality.  

Writers also make mistakes. Writers of papers I have reviewed refuse to explain how x leads to y because they didn’t do an RCT. One wonders, if they think this way, why they bothered to do the study at all. And if they are so scared of advancing a causal explanation why use a regression model that so strongly suggests that x leads to y?

Setting the record straight

 In Hunting causes and using them Nancy Cartwright emphasizes causality is not a single thing (it is also worth reading her book Evidence Based Policy). So heterogeneous are the things we call causality that we might be better to abandon the term entirely. We likely need to match method and evidence to the type of causality we are chasing. 

Judea Pearl in The book of why claims science would become worse not better if we were to believe that only RCTs have the power to provide evidence of causation. In such a world how would we know that smoking causes cancer? 

The danger in the current social science landscape comes from a belief that causation is a dichotomy. If you did an RCT, you can advance causal claims. If you didn’t, you can’t. But causality is not a dichotomy. RCTs often can’t provide evidence of causation and sometimes provide poor evidence. RCTs are critical but we both need to be more conservative—RCTs provide some evidence sometimes—and be more liberal in allowing other designs (regression of discontinuity, instrumental variables, Granger causality) to provide evidence of causality.

What to do about it

  1. Treat causality as a spectrum where researchers can marshal evidence that push the needle toward a causal interpretation or away from it.
  2. View no single piece of evidence an inconvertible evidence of causality.
  3. Write about clear and simple causal mechanisms in introductions and literature reviews.
  4. In the method section, social science should have a section on the degree to which the results can provide evidence of causation; perhaps also the type of causation they have in mind. This should include a discussion of design, context, strength of theory, and methodology. In other words, researchers should have to make a case for their specific research rather than relying on general social science tropes.
  5. As Cartwright suggests, we should replace general terms like cause with more specific terms like repel, pull, excite, or suppress that give a better idea of what is being claimed.

Thingification: Bad Writing Leads to Bad Theory

Repression, according to Freud, is a common phenomenon. Note repression is a noun here. People don’t repress. Rather repression is the name of a state that seems to happen all on its own. This is the point that Michael Billig makes in his book Learn to Write Badly: How to Succeed in the Social Sciences. Billig points out that Freud’s theory turns the verb to repress into the noun repression. Freud does this to make his theory sound scientific. But in doing so we lose a heap of important theoretical information. Who does the repression? How does it happen? What processes result in it?

My colleague in my writing circle calls this thingification. But in keeping with the theme of this post I suppose I should talk about when researchers thingify an abstract process.

I see this is my field a lot. “Growth mindset is associated with persistence”. This sounds like a sufficiently science like expression that we let it go by unexamined. We then expect the scientist to collect survey data on growth mindset and persistence and then test their relationship to determine the strength and direction of the association. But the statement “growth mindset is associated with persistence” leaves so much unsaid. 

A better statement would be: children who believe their ability is fixed, and thus cannot change, are unlikely to persist in overcoming obstacles to their learning. Notice how I replace the weak term associated with richer causal language? See how this language specifies a process for the association? The child is now an actor in the sentence who believes things and acts accordingly. 

In the original statement, it is the variables doing things to each other. But as John Goldthorpe states “variables don’t do things, people do”. Notice further that this sentence invites further specification? We can now ask:

  • How did the child come to believe this? 
  • Do they believe only their ability is fixed or everyones? 
  • What do such children make of a school system that demands they practice, practice, practice?

Social scientists thingify to sound more scientific. But in doing so we have created a myriad of under-specified theories and a science about people that is almost entirely absent of people.

Person Centered Analysis: Where are all the People?

I hate academic conferences. What seems like a chance for free travel to an exotic location turns out to be an endless bore in a stuffy room. For an introvert, the need to be constantly ‘on’ when talking to students, peers, and that big name you are desperate to collaborate with is tiring. The point being, I am not usually in the best of moods when at conferences. Which is probably why I found a particular presentation so irksome.

Why so much Person-centered Seems so Hollow to Me

The presenter at this hot and stuffy conference gets up and smugly states that previous, crappy, social science has used a variable-centered approach to research. He, however, would use a person-centered approach. The motivation was, I confess, solid.

Person-centered analysis starts with the assumption that within any large group of people there are likely smaller distinct groups (within a school there are jocks, goths, nerds, etc). Too much research treats humans as a bunch of mini clones that are driven by the same processes and differ only in degree. I can get behind this sentiment.

I was surprised then to not hear mention of a single person for the rest of the presentation. No explanation was given of how people in the different groups think, believe, feel, or act differently from each other. Nor was there discussion about whether people chose to be a member of their group or whether they were forced into it. Did they jump or were they pushed? Instead, the entire presentation focused on various configurations of variables. This was not, to me at least, person-centered.

This is a disturbing trend in person-centered research. The almost total absence of people.

The overall impression I get from most person-centered analysis is that people believe human diversity has been ill-treated by regression like approaches. But many researchers assume that by applying things like cluster analysis or similar they will magically fix this problem. In my experience, researchers don’t seem to put in a lot of thought into how these approaches better represent real people or what the results are really saying about them. Researchers tend not to describe a prototypical human from each of the researcher’s groups. Researchers also apply little imagination to what drives people in different groups.

Greater attention to this could truly transform the social sciences. A truly person-centered ontology and epistemology could serve disadvantaged groups better. Researchers could better acknowledge that the experience of, say, an Indigenous girl is qualitatively different from a South East Asian Australian boy. But to do this, person-centeredness needs to be about more than methods. And it needs to be motivated less by an appeal to what it isn’t (e.g., “Unlike previous research we use person-centered approaches” is not a convincing rationale).

Give me person-centered person-centered analysis

A move in the right direction would be to consider what Rob Brockman and I have recently called the four S’s of person-centered analysis:

  1. Specificity. Once you have your groups, can you describe what makes these groups distinct? By this I don’t mean a profile graph of variables used to create the groups. I mean a deeper insight to what these groups of people are like. What do their members do? How do they think? What do they want?
  2. Selectivity. How do people end up in these groups? By what process does a person end up in group A and not group B? Were people born into different groups? Did some person, institution, or cultural practice push them into their group? Or is their group membership their choice?
  3. Sensitivity. Do these same groups occur in different samples? If not, why not? Do differences in groupings across—for example—countries illuminate how people’s context shapes grouping, or do differences just reflect unreliable research findings?
  4. Superiority. The beauty of cluster analysis is that it will always return the number of groups you asked for. And like a Rorschach test, it is easy to make something out of whatever the computer gives you. Researchers should attempt to show that their groups tell us something we did not already know. And researchers need to show us that groups really differ from each other qualitatively rather than merely quantitatively.

STEM Gender Gaps in Motivation, Interest, and Self-belief are Huge Right?

We recently had a meta-analysis on STEM gender differences in motivation, interest, and self-belief in Educational Psychology Review. We could not be more thrilled. And a big thank you to my former PhD student Brooke for all her work on this. The results are in the paper poster download below. But first some context for why there is a download in the first place.

I have been thinking about using Kudos for new papers and this seemed like a good paper to give it a try. I spent longer than I like setting up a design brief for this. But now it is done, I have a template in In Design I can use for all new papers as well as themes for ggplot and a standard color pallet. My design choices were:

  1. Use of three colors only; all blues. I think this is elegant but is also advantageous for me as I am color blind.
  2. For plots I have modified the economist white theme from ggthemes. So here on out all my plots will be consistent.
  3. I used a combination serif and san-serif set of fonts the work nicely together. I chose Avenir book and EB Garamond. I am not super happy with these but I don’t like the idea of paying $400 for the fonts I really want. I may want to swap out EB Garamond for Nanum Myeong to have a more crisp feel. Not sure yet.

Anyway, you can see the result here:

Comments welcome; particularly on fonts, general look, and plot theme as I will want to role these out for other papers. I still need a lot of work on distilling the message of my papers down to 100 or so Sticky words. And my In Design skills are weak (though I think I am getting better with my R to Illustrator workflow).