Confusion regarding scientific theory as contributor to replicability crisis?

[DISCLAIMER: Ideas and statements made in this blog post in no way are intended to insult or disrespect my fellow psychologists.]

In this post, I will discuss psychology’s replicability crisis from a new angle. I want to consider the possibility that confusion regarding what scientific theory is and how theory is developed may have contributed to the replicability crisis in psychology.

Scientific theories are internally consistent sets of principles that are put forward to explain various empirical phenomena. Theories compete in the scientific marketplace by being evaluated according to the following five criteria (Popper, 1959; Quine & Ullian, 1978):

1. parsimony: simpler theories involving the fewest entities are preferred to more complicated theories
2. explanatory power:  theories that can explain many empirical phenomena are preferred to theories that can only explain a few phenomena
3. predictive power: a useful theory makes new empirical predictions above and beyond extant theories
4. falsifiability: a theory must yield falsifiable predictions
5. accuracy: degree to which a theory’s empirical predictions match experimental results

It is important to explicitly point out, however, that underlying all of these considerations is the fact that before a theory can be put forward, demonstrably repeatable empirical phenomena need to exist in the first place that need to be explained! Demonstrably repeatable is understood to mean that an empirical phenomenon “can be regularly reproduced by anyone who carries out the appropriate experiment in the way prescribed” (Popper, 1959, p. 23). Put simply, scientific theories aim to explain repeatable empirical phenomena; without repeatable empirical phenomena, there is nothing to explain and hence no theories can be developed.

The idea then is that confusion regarding these points may have contributed to the current replicability crisis.  To support my point, I will briefly review some examples from the beleaguered “social priming” literature. [DISCLAIMER: I contend my argument likely also holds in other areas of experimental psychology; I’ve chosen this literature out of convenience, and hence my intention was not to pick on these specific researchers.]

For example, in a piece entitled “The Alleged Crisis and the Illusion of Exact Replication”, Stroebe and Strack (2014) state that:

“Although reproducibility of scientific findings is one of science’s defining features, the ultimate issue is the extent to which a theory has undergone strict tests and has been supported by empirical findings” (p. 60).

Stroebe and Strack seem to be saying that the most important issue (i.e., the “ultimate issue”) in evaluating scientific theory is whether the theory has been supported by empirical findings (accuracy criterion #5 from above), but at the same time downplay the reproducibility of findings as “one of science’s defining features”. This kind of position, however, doesn’t seem to fit with the considerations above whereby reproducible empirical phenomena are required before a scientific theory can even be put forward, let alone be evaluated viz-a-viz other theories.

In another example, Cesario (2014) — in the context of discussing what features of the original methodology need to be duplicated for a replication attempt to be informative — states:

“We know this only because we have relevant theories that tell us that these features should matter.” (p. 42) “Theories inform us as to which variables are important and which are unimportant (i.e., which variables can be modified from one research study to the next without consequence).” (p. 45)

Cesario seems to be saying that we can use a scientific theory to tell us which methodological features in an original study need to be duplicated to reliably observe an empirical phenomenon. Such a position would seem to be putting the cart in front of the horse, however, given that without demonstrably repeatable empirical phenomena to explain, no theory can be developed in the first place.1

A final example comes from an article by Dijksterhuis (2014, “Welcome back theory!”), who summarizes Cesario’s (2014) paper by saying:

“Cesario  draws  the  conclusion  that  although  behavioral  priming researchers  could  show  more  methodological  rigor,  the relative infancy of the theory is the main reason the field faces a problem.” (p. 74)

Dijksterhuis seems to be saying that the field of behavioral priming currently has problems with non-replications because of insufficiently developed theory. This position is again difficult to reconcile with the standard conceptualization of scientific theory. With all due respect, such a position would be akin to saying that ESP researchers have yet to document replicable ESP findings because theories of ESP are insufficiently developed!

But how could this happen?

I contend that such confusion regarding scientific theory has emerged due (at least in part) to the relatively weak methods used in modal research (LeBel & Peters, 2011). This includes the improper use of null hypothesis significant testing (i.e., p<.05 indicates a “reliable” finding) and an over-emphasis on conceptual rather than direct replications. Conceptual replications involve immediately following up an observed effect with a study using a different methodology, hence rendering any negative results completely ambiguous (i.e., was the different result due to the different methodology or due to the falsity of the tested hypothesis). This practice effectively shields any positive empirical findings from falsification (see here for a great blog post precisely on this point; see also Greenwald et al., 1986). Granted, once the reproducibility of a particular effect has been independently confirmed (using the original methodology), it is of course important to subsequently test whether the effect generalizes to other methods (i.e., other operationalizations of the IV and DV). However, we simply cannot skip the first step. This broadly fits with Rozin’s (2001) position that psychologists need to place much more emphasis on first reliably describing empirical phenomena, before we set out to actually test hypotheses about those phenomena.

1. That being said, Cesario should be lauded for his public stance that behavioral priming researchers need to directly replicate their own findings (using the same methodology) before publishing their findings.



Cesario, J. (2014). Priming, replication, and the hardest science. Perspectives on Psychological Science, 9, 40–48.

Dijksterhuis, A. (2014). Welcome Back Theory!. Perspectives on Psychological Science, 9(1), 72-75.

Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., & Baumgardner, M. H. (1986). Under what conditions does theory obstruct research progress?. Psychological Review, 93(2), 216.

LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology, 15,371-379

Popper, K. R. (1959). The logic of scientific discovery. New York, NY: Basic Books

Quine, W. V. O., & Ullian, J. S. (1978). The web of belief (2nd ed.). New York, NY: Random House

Rozin, P. (2001). Social psychology and science: Some lessons from Solomon Asch. Personality and Social Psychology Review, 5(1), 2-14.

Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59–71.



  1. You’re not really engaging with Cesario’s argument: you simply conclude it goes counter to what you have posited as the basic principle of science. For what it’s worth, I think Popper would have agreed with Cesario. Observation and experimentation unguided by theory are aimless. You have to start with some sort of understanding of what you’re dealing with, and then refine or revise it in the research process.

    1. Great comment Maarten! Of course, a researcher has to start with some sort of understanding of the empirical phenomenon, but it doesn’t *necessarily* have to be guided by a scientific theory. Certainly if *actual* scientific theories do exist in an area of research, then empirical research in that area definitely *should* be guided by those theories. The point, however, is that in early stages of a scientific field, researchers need to start somewhere, and they will definitely not start by being guided by scientific theory if no reproducible empirical phenomena have even been documented in that field.

      Take for example, the Stroop (1935) effect. Was J. Ridley Stroop’s original studies guided by actual scientific theory? Of course not! His main goal was to learn more about the empirical phenomenon of interference by seeing if individuals name color words printed in conflicting colors more slowly than words printed in non-conflicting colors. According to his empirical observations (i.e., his data/evidence), the answer to the question seemed to be “yes”. The stroop effect of course turned out to be a reproducible empirical phenomenon and researchers subsequently put forward theories to attempt to explain the phenomenon (e.g., speed of processing theory, selective attention theory).

      1. I agree that the research process can start with a phenomenon, but if there’s no agreement on whether there is a phenomenon or on what it is, theory will inevitably come into play. This is the point that Cesario was making: theory tells you how to do the experiment, which variables are important and which not. Disagreement about the facts is tied up with disagreement about theory, and settling the facts involves coming to agreement about the theory.

  2. Fascinating. Good science often starts with observation of a phenomenon (e.g. Newton noticing, but not inventing, gravity). Why not follow Newton’s example and Rozin’s advice and notice all the things people actually do rather focusing on things that were invented by (and are primarily of interest to) psychologists? Perhaps the lesson is that if scientists are inventing phenomena then it seems silly to devise theories about those inventions. We likely do not need theories to explain things that don’t actually happen in a reproducible way.

  3. A major problem is that funding agencies, such as NIH, seldom if ever fund a replication study. In addition, a confirmatory study is not given the same value for academic publication or promotion.

  4. “I contend that such confusion regarding scientific theory has emerged due (at least in part) to the relatively weak methods used in modal research (LeBel & Peters, 2011). This includes the improper use of null hypothesis significant testing (i.e., p<.05 indicates a “reliable” finding) and an over-emphasis on conceptual rather than direct replications"

    Is the following a possibly interesting experiment trying to solve the things you describe in this post?

    1) Small groups of let’s say 5 researchers all working on the same theory/topic/construct perform a pilot study/exploratory study and at one point make it clear for themselves and the other members of the group to have their work rigorously tested.

    2) These 5 studies will all then all be pre-registrated and prospectively replicated in a round robin fashion.

    3) You would hereby end up with 5 (what perhaps often can be seen as “conceptual” replications depending on how far you want to go to consider something a “conceptual” replication) studies, that will all have been “directly” replicated 4 times (+ 1 version via the original researcher, which makes a total of 5).

    4) All results will be published no matter the outcome in a single paper: for instance “Ego-depletion: Round 1”. This paper then includes 5 different "conceptual" studies (probably varying in degree of how "conceptual" they are, e.g. see your "falsifiability is not optional" paper), which will all have been "directly' replicated.

    5) All members of the team of 5 researchers would then come up with their own follow-up study, possibly (partly) related to the results of the “first round”. The process repeats itself as long as deemed fruitful.

    Additional thoughts related to this format which might be interesting regarding recent discussions and events in psychological science:

    1) Possibly think how this format could influence the discussions about “creativity”, “science being messy” and the acceptance of “null-results”.

    Researchers using this format could each come up with their own ideas for each “round” (creativity), there would be a clear demarcation between pilot-studies/exploratory studies and testing it in a confirmatory fashion ("science is messy"), and this could also contribute to publishing and “doing something” with possible null-results concerning inferences and conclusions (acceptance of "null-results").

    2) Possibly think about how this format could influence the discussion about how there may be too much information (i.c. Simonsohn’s “let’s publish fewer papers”).

    Let’s say it’s reasonable that researchers can try and run 5 studies a year (2 years?) given time and resources (50-100 pp per study per individual researcher). That would mean that a group of researchers using this format could publish a single paper every 1 or 2 years (“let’s publish fewer papers”), but this paper would be highly informational given that it would be relatively highly-powered (5 x 50-100 pp = 250-500 pp per study), and would contain both “conceptual” and “direct” replications.

    3) Possibly think about how this format could influence the discussion about “expertise” and “reverse p-hacking/deliberately wanting to find a "null-result” concerning replications.

    Perhaps every member of these small groups would be inclined to a) “put forward” their “best” experiment they want to rigorously test using this format, and b) execute the replication part of the format (i.c. the replications of the other members’ study) with great attention and effort because they would be incentivized to do so. This is because “optimally” gathered information coming from this format (e.g. both significant and non-significant findings) would be directly helpful to them for coming up with study-proposals for the next round (e.g. see your "falsifiability is not optional" paper).

    4) Possibly think about how this format could influence the discussion about "a single study almost never provides definitive evidence for or against an effect", and problems if interpreting "single p-values". Also see Fisher, 1926, p. 83: "A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance. "

    5) Possibly think about how this format could influence the discussion about the problematic grant-culture in academia. Small groups of collaborating researchers could write grant proposals together, and funding agencies would give their money to multiple researchers who each contribute their own ideas. Both things contribute to psychological science becoming less competetive and more collaborative.

    6) The overall process of this format would entail a clear distinction of post-hoc theorizing and theory testing (c.f. Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012), “rounds” of theory building, testing, and reformulation (cf. Wallander, 1992) and could be viewed as a systematic manner of data collection (cf. Chow, 2002)

    7) Finally, it might also be interesting to note that this format could lead to interesting meta-scientific information as well. For instance, perhaps the findings of a later “round” turn out to be more replicable due to enhanced accurate knowledge about a specific theory or phenomenon. Or perhaps it will show that the devastating typical process of research into psychological phenomena and theories described by Meehl (1978) will be cut-off sooner, or will follow a different path.

    1. “Is the following a possibly interesting experiment trying to solve the things you describe in this post? ”

      I guess not…..(?)

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s