Unsuccessful replications are beginnings not ends – Part I

Recently, there has been lots of controversy brewing around the so called “replication movement” in psychology. This controversy reached new heights this past week in response to Johnson, Cheung, & Donnellen’s (2014) “failed” replications of Schnall, Benton, & Harvey’s (2008) cleanliness priming on moral judgment finding. Exchanges have spiraled out of control, with unprofessional and overly personal comments uttered. For example, an original author accusing replicators of engaging in “replication bullying” and a “status quo supporter” calling (young) replicators “assholes” and “shameless little bullies”.

In this post, I want to try and bring back the conversation to substantive scientific issues regarding the crucial importance of direct replications and will argue that direct replications should be viewed as constructive rather than destructive. But first a quick clarification regarding the peripheral issue of the term “replication bullying.”

The National Center Against Bullying defines bullying as: “Bullying is when someone with more power repeatedly and intentionally causes hurt or harm to another person who feel helpless to respond.” 

According to this definition, it is very clear that publishing failed replications of original research findings does not come close to meeting the criteria for bullying. Replicators have no intention to harm the original researcher(s), but rather have the intention to add new evidence regarding the robustness of a published finding. This is a normal part of science and is actually the most important feature of the scientific method, which ensures an empirical literature is self-correcting and cumulative. Of course the original authors may claim that their reputation might be harmed by the publication of fair and high-quality replication studies that do not corroborate their original findings. However, this is an unavoidable reality of engaging in scientific endeavors. Science involves highly complex and technically challenging activities. When a new empirical finding is added to the pool of existing ideas, there will always be a risk that competent independent researchers may not be able to corroborate the original findings.

That being said, science entails the careful calibration of beliefs about how our world works. Scientific beliefs are carefully calibrated to the totality of the evidence available for a certain claim. This involves a graded continuum between (1) high confidence in a belief when strong evidence is continually found to support a certain claim and (2) strong doubt in a belief when weak evidence is repeatedly found. In between these two poles, exists a graded continuum where one may have low to moderate confidence in a belief until more high-quality evidence is produced.

For example, in the Schnall et al. situation, Johnson et al.’s have reported two unsuccessful direct replications for each of the two studies originally reported by Schnall et al. However, two *successful* direct replications of Schnall et al.’s Study 1 also have been reported by completely independent researchers.  These “successful” direct replications, however, were both severely under-powered to detect the original effect size. Notwithstanding this limitation, these studies nonetheless should be considered in carefully calibrating one’s belief regarding the claim that cleanliness priming can reduce the severity of moral judgments. Furthermore, future research would need to be executed to understand these discrepant results. Finally, even in the absence of the successful direct replications, Johnson et al.’s two high-quality direct replications does not indicate that the idea that cleanliness priming reduces severity of moral judgments is perpetually wrong. The idea might indeed have some truth to it under a different set of operationalizations and/or in different contexts. The challenge is to identify those operationalizations and contexts whereby the phenomenon yields replicable results. Unsuccessful replications are beginnings, not ends.

In the second part of this post, I will more concretely demonstrate how unsuccessful replications are beginnings by organizing all of the replication information for the Schnall et al.’s (2008) studies using an approach being developed at CurateScience.org.


  1. How many studies in behavioral economics can’t be replicated? Dan Ariely once showed that reading the ten commandments prior to taking a test decreased cheating. A commenter on his blog (before he deleted and prevented comments) noted that Catholic priests are notorious child molestors. DA came up with a bizarre way to explain away the counterexample – maybe they’d be even worse if they didn’t read ten commandments. Do you really think this result would replicate? Why did Danny Kahneman pick this field to accuse of being the poster child of lack of integrity in psychology? Prior to Danny’s infamous letter, did anyone else single out this field as especially unreliable? Is the bizarre hypothesis that “cleanliness priming increases morality” really a prototypical (a.k.a. representative) example of a priming effect?

    1. Thanks for your comment. Danny Kahneman’s open letter to “behavioral priming” researchers to clean up their act regarding the reliability of their findings was motivated by the fact that his prominent book “Thinking, Fast and Slow” is based heavily on such findings.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s