scientific beliefs

Unsuccessful replications are beginnings not ends – Part II

In Part I, I argued that unsuccessful replications should more constructively be seen as scientific beginnings rather than ends. As promised, in Part II I will more concretely demonstrate this by organizing all of the available replication information for Schnall et al.’s (2008) studies using an approach being developed at aims to accelerate the growth of cumulative knowledge by organizing information about replication results and allowing constructive comments by the community of scientists regarding the careful interpretation of replication results. Links to available data, syntax files, and experimental materials will also be organized. The web platform aims to be a one-stop shop to locate, add, and modify such information and also facilitate constructive discussions and new scholarship of published research findings. (The kinds of heated debates currently happening regarding Schnall et al.’s studies that makes science so exciting — well, minus the ad hominem attacks!)

Below is a screenshot of the organized replication results for the Schnall et al. (2008) studies, including links to available data files, forest plot graph of the effect size confidence intervals, and aggregated list of relevant blog posts and tweets.


As can be seen, there are actually 4 additional direct replications in addition to Johnson et al.’s (2014) special issue direct replications. As mentioned in Part I, two “successful” direct replications have been reported for Schnall et al.’s Study 1. However, as can readily be seen, these two studies were under-powered (@60%) to detect the original d=-.60 effect size and both effect size CIs include zero. Consequently, it would be inappropriate to characterize these studies as “successful” (the < .05 p-values reported on were one-tailed tests). That being said, these studies should not be ignored given they contribute additional evidence that should count toward one’s overall evaluation of the evidence for the claim that cleanliness priming influences moral judgments.

Unsuccessful replications should also be viewed as beginnings given that virtually all replicators make their data publicly available for verification and re-analysis (one of Curate Science’s focus). Hence, any interested researcher can download the data and re-analyze it from a different theoretical perspective and potentially gain new insights into the discrepant results. Data availability also plays an important role in interpreting replication results, especially in the case the results have not been peer-reviewed. That is, one should put more weight into replication results whose conclusions can be verified via re-analysis than replication results that do not have available data.

Organizing replication results in this situation makes it clear that virtually all of the replication efforts have targeted Schnall et al.’s Study 1. Only one direct replication is so far available for Shnall et al.’s Study 2. Though this replication study used a much larger sample and was pre-registered (hence more weight should be given to its results), it is not the case that the final verdict has been spoken. Our confidence in Study 2’s original results should decrease to some extent (assuming the replication results can be reproduced from the raw data), however, more evidence would be needed to further decrease our confidence.

And even in the event of subsequent negative results from high-powered direct replications (for either of Schnall et al.’s studies), it would still be possible that cleanliness priming can influence moral judgments using more accurate instruments or using more advanced designs (e.g., highly-repeated within-person designs). aims to facilitate constructive discussions and theoretical debates of these kinds to accelerate the growth of cumulative knowledge in psychology/neuroscience (and beyond). Unsuccessful replications are beginnings, not ends.

Unsuccessful replications are beginnings not ends – Part I

Recently, there has been lots of controversy brewing around the so called “replication movement” in psychology. This controversy reached new heights this past week in response to Johnson, Cheung, & Donnellen’s (2014) “failed” replications of Schnall, Benton, & Harvey’s (2008) cleanliness priming on moral judgment finding. Exchanges have spiraled out of control, with unprofessional and overly personal comments uttered. For example, an original author accusing replicators of engaging in “replication bullying” and a “status quo supporter” calling (young) replicators “assholes” and “shameless little bullies”.

In this post, I want to try and bring back the conversation to substantive scientific issues regarding the crucial importance of direct replications and will argue that direct replications should be viewed as constructive rather than destructive. But first a quick clarification regarding the peripheral issue of the term “replication bullying.”

The National Center Against Bullying defines bullying as: “Bullying is when someone with more power repeatedly and intentionally causes hurt or harm to another person who feel helpless to respond.” 

According to this definition, it is very clear that publishing failed replications of original research findings does not come close to meeting the criteria for bullying. Replicators have no intention to harm the original researcher(s), but rather have the intention to add new evidence regarding the robustness of a published finding. This is a normal part of science and is actually the most important feature of the scientific method, which ensures an empirical literature is self-correcting and cumulative. Of course the original authors may claim that their reputation might be harmed by the publication of fair and high-quality replication studies that do not corroborate their original findings. However, this is an unavoidable reality of engaging in scientific endeavors. Science involves highly complex and technically challenging activities. When a new empirical finding is added to the pool of existing ideas, there will always be a risk that competent independent researchers may not be able to corroborate the original findings.

That being said, science entails the careful calibration of beliefs about how our world works. Scientific beliefs are carefully calibrated to the totality of the evidence available for a certain claim. This involves a graded continuum between (1) high confidence in a belief when strong evidence is continually found to support a certain claim and (2) strong doubt in a belief when weak evidence is repeatedly found. In between these two poles, exists a graded continuum where one may have low to moderate confidence in a belief until more high-quality evidence is produced.

For example, in the Schnall et al. situation, Johnson et al.’s have reported two unsuccessful direct replications for each of the two studies originally reported by Schnall et al. However, two *successful* direct replications of Schnall et al.’s Study 1 also have been reported by completely independent researchers.  These “successful” direct replications, however, were both severely under-powered to detect the original effect size. Notwithstanding this limitation, these studies nonetheless should be considered in carefully calibrating one’s belief regarding the claim that cleanliness priming can reduce the severity of moral judgments. Furthermore, future research would need to be executed to understand these discrepant results. Finally, even in the absence of the successful direct replications, Johnson et al.’s two high-quality direct replications does not indicate that the idea that cleanliness priming reduces severity of moral judgments is perpetually wrong. The idea might indeed have some truth to it under a different set of operationalizations and/or in different contexts. The challenge is to identify those operationalizations and contexts whereby the phenomenon yields replicable results. Unsuccessful replications are beginnings, not ends.

In the second part of this post, I will more concretely demonstrate how unsuccessful replications are beginnings by organizing all of the replication information for the Schnall et al.’s (2008) studies using an approach being developed at