The Language of Science: A Primer

Scientific knowledge accrues via the falsification of theory-derived hypotheses.

Falsification is the process of proving a hypothesis wrong.

Falsifiability is the extent to which a hypothesis, or approach to research, can be falsified.

Falsifiability requires sufficient transparency (i.e., full disclosure, open materials, open data, and pre-registration)

Falsifiability is a non-optional essential aspect of the scientific approach (or else one is a historian rather than a scientist).

Falsification is achieved via meticulously executed series of direct replications.

Taxonomy of direct replications with respect to falsifiability (10 shades of falsifiability).

Replication is the activity of carrying out a direct replication.

Replicability is the extent to which a particular effect/hypothesis, or area of research, is replicable.

An effect is said to be replicable if an effect of similar magnitude can be consistently observed, under specified (boundary) conditions, across independent samples and researchers.

Scientific findings must be demonstrably replicable under conditions specified, but not necessarily replicable at will.1




1. For example, the bending of light 1919 solar eclipse findings, which eventually led to the falsification of Newton’s theory of gravitation in favor of Einstein’s General Theory of Relativity, is not replicable at will, but is (and continues to be) demonstrably replicable for solar eclipses satisfying conditions originally specified.

New replication policy at flagship social psychology journal will not be effective

The Journal of Personality and Social Psychology (JPSP) — considered social psychology’s flagship journal — recently announced their new replication policy, which officially states:

Although not a central part of its mission, the Journal of Personality and Social Psychology values replications and encourages submissions that attempt to replicate important findings previously published in social and personality psychology. Major criteria for publication of replication papers include:

    • the theoretical importance of the finding being replicated
    • the statistical power of the replication study or studies
    • the extent to which the methodology, procedure, and materials match those of the original study
    • the number and power of previous replications of the same finding
    • Novelty of theoretical or empirical contribution is not a major criterion, although evidence of moderators of a finding would be a positive factor.

Preference will be given to submissions by researchers other than the authors of the original finding, that present direct rather than conceptual replications, and that include attempts to replicate more than one study of a multi-study original publication. However, papers that do not meet these criteria will be considered as well.

Given my “pre-cognitive abilities”1, we actually submitted a replication paper to JPSP about 2 weeks *prior* to their announcement, reporting the results of two unsuccessful high-powered replication attempts of Correll’s (2008, Exp 2) 1/f noise racial bias effect. Exactly one day after the new replication policy was announced we received this rejection letter:

Your paper stands high on several of [our replication policy] criteria. You worked with the author of the original paper to duplicate materials and procedures as closely as possible, and pre-registered your data collection and analysis plans. Your studies are adequately powered. However, I have concluded that because the impact of the original Correll article has been minimal, an article aimed at replicating his findings does not have the magnitude of conceptual impact that we are looking for in the new replication section. Thus, I will decline to publish this manuscript in JPSP. To assess the impact of the Correll (2008) paper, since it is 6 years old, I turned to citation data. It has been cited 22 times (according to Web of Science) but the vast majority are journals such as Human Movement Science, Ecological Psychology, or Physics Reports, far outside our field. I have not looked at all of the citing articles, of course, but the typical citation of Correll’s work appears to be as an in-passing example of the application of dynamical systems logic. There are only two citations within social psychology. One is Correll’s 2011 JESP follow-up (which itself has been cited only twice, again by journals far outside our field). The second is an Annual Review of Psychology article on gender development (in which again Correll’s 2008 paper is cited in passing as an example of dynamical approaches). I have to conclude that Correll’s paper has had zero substantive impact in social psychology, attracting attention almost exclusively from researchers (mostly outside our field) who cite it as an example application of a specific conceptual and analytic approach. Such citations have little or nothing to do with the substance of the finding that you failed to replicate – the impact of task instructions on the PSD slope. In sum, my decision on your replication manuscript is not based on any deficiencies in your work, but on the virtually complete lack of impact of the original finding within our field.

I responded to the decision letter with the following email:

Thanks for your quick response regarding our replication manuscript (PSP-A-2014-0114). Of course it is not the outcome we had hoped for, however, we respect your decision. That being said, I would like to point out what seems to be a major discrepancy between the official policy for publication of replication papers (theoretical importance of the finding, quality of replication methods, & pre-existing replications of the finding) *and* the primary basis for rejecting our replication paper, which was that the original article had insufficient actual impact in terms of citation count. These two things are distinct and if you will be rejecting papers on the latter criteria, then your official policy should be revised to reflect this fact.

Furthermore, if you do revise your official policy in this way — whereby a major criterion for publishing replication papers is “actual impact” of original article in terms of citation count — this would mean that you could avoid publishing replication papers — no matter how high-quality — for about 85% of published articles in JPSP given the skewed distribution of article citation count whereby the vast majority of articles have minimal actual impact (Seglen, 1992). This kind of strategy would of course be a highly ineffective editorial policy if the goal is to increase the credibility and cumulative nature of empirical findings in JPSP.

To which the editor responded by saying that Corell’s (2008, Exp 2) finding was deemed “important” for methodological reasons and re-iterated that Correll’s research has had “little to no impact within our field.” More importantly, he did not address my two main concerns that their “new replication policy is (1) not well specified and (2) will not be effective in increasing the credibility of empirical findings in JPSP.”2

I responded by saying that they need — at the very least — to revise their official policy to state that they will *only* publish high-quality replication papers of theoretically important findings that have had an *actual* impact in terms of citation count. This of course means that they can avoid publishing replication papers of all recently published JPSP papers *and* the vast majority of JPSP papers that are rarely or never cited, which is simply absurd. Another curious aspect (alluded to by Lorne Campbell) is this: Can an empirical finding actually have an impact on a field if it hasn’t been independently corroborated?


1. Just kidding, I unfortunately do not actually have pre-cognitive abilities though it would be great if I did.
2. This is in contrast to replication policies at more reputable journals — such as Psychological Science, Journal of Experimental Social Psychology, Psychonomic Bulletin & Review, and Journal of Research in Personality — that publish high-quality replication papers of *any* findings originally published in their journal. For examples, see here and here.

Unsuccessful replications are beginnings not ends – Part I

Recently, there has been lots of controversy brewing around the so called “replication movement” in psychology. This controversy reached new heights this past week in response to Johnson, Cheung, & Donnellen’s (2014) “failed” replications of Schnall, Benton, & Harvey’s (2008) cleanliness priming on moral judgment finding. Exchanges have spiraled out of control, with unprofessional and overly personal comments uttered. For example, an original author accusing replicators of engaging in “replication bullying” and a “status quo supporter” calling (young) replicators “assholes” and “shameless little bullies”.

In this post, I want to try and bring back the conversation to substantive scientific issues regarding the crucial importance of direct replications and will argue that direct replications should be viewed as constructive rather than destructive. But first a quick clarification regarding the peripheral issue of the term “replication bullying.”

The National Center Against Bullying defines bullying as: “Bullying is when someone with more power repeatedly and intentionally causes hurt or harm to another person who feel helpless to respond.” 

According to this definition, it is very clear that publishing failed replications of original research findings does not come close to meeting the criteria for bullying. Replicators have no intention to harm the original researcher(s), but rather have the intention to add new evidence regarding the robustness of a published finding. This is a normal part of science and is actually the most important feature of the scientific method, which ensures an empirical literature is self-correcting and cumulative. Of course the original authors may claim that their reputation might be harmed by the publication of fair and high-quality replication studies that do not corroborate their original findings. However, this is an unavoidable reality of engaging in scientific endeavors. Science involves highly complex and technically challenging activities. When a new empirical finding is added to the pool of existing ideas, there will always be a risk that competent independent researchers may not be able to corroborate the original findings.

That being said, science entails the careful calibration of beliefs about how our world works. Scientific beliefs are carefully calibrated to the totality of the evidence available for a certain claim. This involves a graded continuum between (1) high confidence in a belief when strong evidence is continually found to support a certain claim and (2) strong doubt in a belief when weak evidence is repeatedly found. In between these two poles, exists a graded continuum where one may have low to moderate confidence in a belief until more high-quality evidence is produced.

For example, in the Schnall et al. situation, Johnson et al.’s have reported two unsuccessful direct replications for each of the two studies originally reported by Schnall et al. However, two *successful* direct replications of Schnall et al.’s Study 1 also have been reported by completely independent researchers.  These “successful” direct replications, however, were both severely under-powered to detect the original effect size. Notwithstanding this limitation, these studies nonetheless should be considered in carefully calibrating one’s belief regarding the claim that cleanliness priming can reduce the severity of moral judgments. Furthermore, future research would need to be executed to understand these discrepant results. Finally, even in the absence of the successful direct replications, Johnson et al.’s two high-quality direct replications does not indicate that the idea that cleanliness priming reduces severity of moral judgments is perpetually wrong. The idea might indeed have some truth to it under a different set of operationalizations and/or in different contexts. The challenge is to identify those operationalizations and contexts whereby the phenomenon yields replicable results. Unsuccessful replications are beginnings, not ends.

In the second part of this post, I will more concretely demonstrate how unsuccessful replications are beginnings by organizing all of the replication information for the Schnall et al.’s (2008) studies using an approach being developed at