falsification

The Language of Science: A Primer

Scientific knowledge accrues via the falsification of theory-derived hypotheses.

Falsification is the process of proving a hypothesis wrong.

Falsifiability is the extent to which a hypothesis, or approach to research, can be falsified.

Falsifiability requires sufficient transparency (i.e., full disclosure, open materials, open data, and pre-registration)

Falsifiability is a non-optional essential aspect of the scientific approach (or else one is a historian rather than a scientist).

Falsification is achieved via meticulously executed series of direct replications.

Taxonomy of direct replications with respect to falsifiability (10 shades of falsifiability).

Replication is the activity of carrying out a direct replication.

Replicability is the extent to which a particular effect/hypothesis, or area of research, is replicable.

An effect is said to be replicable if an effect of similar magnitude can be consistently observed, under specified (boundary) conditions, across independent samples and researchers.

Scientific findings must be demonstrably replicable under conditions specified, but not necessarily replicable at will.1

 

 

***Footnotes***

1. For example, the bending of light 1919 solar eclipse findings, which eventually led to the falsification of Newton’s theory of gravitation in favor of Einstein’s General Theory of Relativity, is not replicable at will, but is (and continues to be) demonstrably replicable for solar eclipses satisfying conditions originally specified.

Advertisements

Insufficiently open science — not theory — obstructs empirical progress!

I stumbled upon Greenwald et al.’s (1986) “Under what conditions does theory obstruct research progress” article the other day and decided to re-read it. I found it very interesting to re-read in the context of current controversies about p-hacking and replication difficulties! Very prescient indeed.

In the article, Greenwald et al. argued that theory obstructs research progress when:
1. testing theory is the central goal of research, and
2. the researcher has more faith in the correctness of the theory than in the suitability of the procedures used to test the theory.

Though I agree with their main argument (& indeed we’ve made a very similar argument here), I don’t think it’s completely correct (or at least incomplete given what we now know about modal research practices).

I want to put forward the possibility that it is insufficiently open research practices — rather than theory-confirming practices — that obstruct empirical progress! Testing theory has always involved the (precarious) goal of producing experimental results that confirm a theory-derived novel empirical predictions. Such endeavors almost always involve repeated tweaking and refinement of procedures and calibration of instruments. As long as researchers are sufficiently open about the methods used to execute their experimental tests, however, such theory-confirming practices *can* lead to empirical progress. This is the case because being open means other researchers can gauge more objectively all of the required methodological tweakings that were required to get the theory-confirming result, but is also the case because being open means using stronger methods and better thought-out experimental designs to begin with. Consequently, being more open means theory-derived empirical predictions are more open to disconfirmation (given disconfirmation requires strong methods), which actually substantially *accelerates* research progress! Don’t take my word for it, here’s what Richard Feynman had to say on the subject:

“We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.” (Richard Feynman)

 

Two quotes from Greenwald et al.’s article that inspired this post!

“The theory-testing approach runs smoothly enough when theoretically predicted results are obtained. However, when predictions are not confirmed, the researcher faces a predicament that can be called the disconfirmation dilemma (Greenwald & Ronis, 1981). This dilemma is resolved by the researcher’s choosing between proceeding (a) as if the theory being tested is incorrect (e.g., by reporting the disconfirming results), or (b) as if the theory is still likely to be correct. The researcher who preserves faith in the theory’s correctness will persevere at testing the theory — perhaps by conducting additional data analyses, by collecting more data, or by revising procedures and then collecting more data.” (p. 219).

“A theory-confirming researcher perseveres by modifying procedures until prediction-supporting results are obtained. Particularly if several false starts have occurred, the resulting confirmation may well depend on conditions introduced while modifying procedures in response to initial disconfirmations. However, because no systematic empirical comparison of the evolved (confirming) procedures with earlier (disconfirming) ones has been attempted, the researcher is unlikely to detect the confirmation’s dependence on the evolved details of procedure. Although the conclusions from such research need to be qualified by reference to the tried-and-abandoned procedures, those conclusions are often stated only in the more general terms of the guiding theory. Such conclusions constitute avoidable overgeneralizations.” (p. 220)

Confusion regarding scientific theory as contributor to replicability crisis?

[DISCLAIMER: Ideas and statements made in this blog post in no way are intended to insult or disrespect my fellow psychologists.]

In this post, I will discuss psychology’s replicability crisis from a new angle. I want to consider the possibility that confusion regarding what scientific theory is and how theory is developed may have contributed to the replicability crisis in psychology.

Scientific theories are internally consistent sets of principles that are put forward to explain various empirical phenomena. Theories compete in the scientific marketplace by being evaluated according to the following five criteria (Popper, 1959; Quine & Ullian, 1978):

1. parsimony: simpler theories involving the fewest entities are preferred to more complicated theories
2. explanatory power:  theories that can explain many empirical phenomena are preferred to theories that can only explain a few phenomena
3. predictive power: a useful theory makes new empirical predictions above and beyond extant theories
4. falsifiability: a theory must yield falsifiable predictions
5. accuracy: degree to which a theory’s empirical predictions match experimental results

It is important to explicitly point out, however, that underlying all of these considerations is the fact that before a theory can be put forward, demonstrably repeatable empirical phenomena need to exist in the first place that need to be explained! Demonstrably repeatable is understood to mean that an empirical phenomenon “can be regularly reproduced by anyone who carries out the appropriate experiment in the way prescribed” (Popper, 1959, p. 23). Put simply, scientific theories aim to explain repeatable empirical phenomena; without repeatable empirical phenomena, there is nothing to explain and hence no theories can be developed.

The idea then is that confusion regarding these points may have contributed to the current replicability crisis.  To support my point, I will briefly review some examples from the beleaguered “social priming” literature. [DISCLAIMER: I contend my argument likely also holds in other areas of experimental psychology; I’ve chosen this literature out of convenience, and hence my intention was not to pick on these specific researchers.]

For example, in a piece entitled “The Alleged Crisis and the Illusion of Exact Replication”, Stroebe and Strack (2014) state that:

“Although reproducibility of scientific findings is one of science’s defining features, the ultimate issue is the extent to which a theory has undergone strict tests and has been supported by empirical findings” (p. 60).

Stroebe and Strack seem to be saying that the most important issue (i.e., the “ultimate issue”) in evaluating scientific theory is whether the theory has been supported by empirical findings (accuracy criterion #5 from above), but at the same time downplay the reproducibility of findings as “one of science’s defining features”. This kind of position, however, doesn’t seem to fit with the considerations above whereby reproducible empirical phenomena are required before a scientific theory can even be put forward, let alone be evaluated viz-a-viz other theories.

In another example, Cesario (2014) — in the context of discussing what features of the original methodology need to be duplicated for a replication attempt to be informative — states:

“We know this only because we have relevant theories that tell us that these features should matter.” (p. 42) “Theories inform us as to which variables are important and which are unimportant (i.e., which variables can be modified from one research study to the next without consequence).” (p. 45)

Cesario seems to be saying that we can use a scientific theory to tell us which methodological features in an original study need to be duplicated to reliably observe an empirical phenomenon. Such a position would seem to be putting the cart in front of the horse, however, given that without demonstrably repeatable empirical phenomena to explain, no theory can be developed in the first place.1

A final example comes from an article by Dijksterhuis (2014, “Welcome back theory!”), who summarizes Cesario’s (2014) paper by saying:

“Cesario  draws  the  conclusion  that  although  behavioral  priming researchers  could  show  more  methodological  rigor,  the relative infancy of the theory is the main reason the field faces a problem.” (p. 74)

Dijksterhuis seems to be saying that the field of behavioral priming currently has problems with non-replications because of insufficiently developed theory. This position is again difficult to reconcile with the standard conceptualization of scientific theory. With all due respect, such a position would be akin to saying that ESP researchers have yet to document replicable ESP findings because theories of ESP are insufficiently developed!

But how could this happen?

I contend that such confusion regarding scientific theory has emerged due (at least in part) to the relatively weak methods used in modal research (LeBel & Peters, 2011). This includes the improper use of null hypothesis significant testing (i.e., p<.05 indicates a “reliable” finding) and an over-emphasis on conceptual rather than direct replications. Conceptual replications involve immediately following up an observed effect with a study using a different methodology, hence rendering any negative results completely ambiguous (i.e., was the different result due to the different methodology or due to the falsity of the tested hypothesis). This practice effectively shields any positive empirical findings from falsification (see here for a great blog post precisely on this point; see also Greenwald et al., 1986). Granted, once the reproducibility of a particular effect has been independently confirmed (using the original methodology), it is of course important to subsequently test whether the effect generalizes to other methods (i.e., other operationalizations of the IV and DV). However, we simply cannot skip the first step. This broadly fits with Rozin’s (2001) position that psychologists need to place much more emphasis on first reliably describing empirical phenomena, before we set out to actually test hypotheses about those phenomena.


1. That being said, Cesario should be lauded for his public stance that behavioral priming researchers need to directly replicate their own findings (using the same methodology) before publishing their findings.

 

References

Cesario, J. (2014). Priming, replication, and the hardest science. Perspectives on Psychological Science, 9, 40–48.

Dijksterhuis, A. (2014). Welcome Back Theory!. Perspectives on Psychological Science, 9(1), 72-75.

Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., & Baumgardner, M. H. (1986). Under what conditions does theory obstruct research progress?. Psychological Review, 93(2), 216.

LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem’s (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology, 15,371-379

Popper, K. R. (1959). The logic of scientific discovery. New York, NY: Basic Books

Quine, W. V. O., & Ullian, J. S. (1978). The web of belief (2nd ed.). New York, NY: Random House

Rozin, P. (2001). Social psychology and science: Some lessons from Solomon Asch. Personality and Social Psychology Review, 5(1), 2-14.

Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59–71.

“Replicating down” can lead to “replicating up”

A few days ago, Rolf Zwaan wrote an interesting post about “replicating down” vs. “replicating up”, which he conceptualized as decreasing vs. increasing our confidence in an effect reported in an original paper. I love this distinction and definitely agree that we need to see a lot more “replicating up” type replication efforts and that editors of prominent journals should publish results from such “replication up” replication efforts.

In this blog post, I’m going to tell a story that embodies a different kind of “replicating up” and contend that “replicating down” can lead to “replicating up” in highly constructive ways for our science.

Here is the story.

We executed two high-powered pre-registered direct replication attempts of an effect where we bent over backwards (a la Feynman) in collaboration with the original authors to duplicate as closely as possible all methodological details of the original study. However, we couldn’t get the effect. So we submitted our results to the journal that originally published the results (trying for the Pottery Barn Rule), but it was rejected for not making a sufficiently substantial theoretical contribution. The editor argued that for publication we needed to provide the conditions under which the effect *does* occur.1 In a weird twist of events, one of the reviewers — who was one of the original authors on the paper — reported in their review that they had since “discovered” a moderator variable for the effect in question. The action editor suggested we “combine forces” and consider re-submitting to the journal. Indeed a few days later, I received an email from “Reviewer #1” offering that we combine forces and submit a combined paper with our null replication results and their moderator evidence. I graciously declined the offer instead asking for the methodological details to attempt to independently replicate their new “moderator effect”. Suddenly, the researcher’s tone changed communicating that they hadn’t “yet pinned down the effect”, but would email me with the details as soon as they had them. That email never came.

Fast forward six months later. Out of the blue, an independent team emailed me indicating that they also failed to replicate the original results in question in an even higher-powered design. However, in yet another weird twist of events, their replication results spoke directly to the moderator question at hand, seriously calling into question the so-called “moderator effect” explanation of our failed replication results. I emailed the original author to see if there were any developments regarding their new “moderator effect” given that I was made aware of new evidence calling into question their moderator effect explanation of our failed replication attempts.

They replied saying that since we last communicated, they have realized that the operationalization of their target manipulation was overly noisy and that they have since “substantially improved” it to make it more precise.

What music to my ears that was to hear.

And this is what I mean by saying that “replicating down” can lead to “replicating up”! Our “replicating down” eventually led to a “replicating up” situation by getting the original researchers to improve their methodology in studying their phenomenon of interest.

Take-home message: We definitely need more “replicating up” situations, but that “replicating down” can lead to “replicating up” and that this is very healthy for our science!

Last thing: my story fits very well with the Feynman-inspired name of my blog “Prove Yourself Wrong”, which is that by proving yourself — or others — wrong, scientific progress is achieved!

“We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.” (Richard Feynman)

Footnotes

1. This is ludicrous; it would be as if the Journal of Electroanalytical Chemistry — where Pons & Fleischmann’s (1989) published their now discredited cold fusion findings — would have demanded that independent replicators provide the “conditions” under which cold fusion *can* be observed!