Author: eplebel81

About eplebel81

I am an experimental psychologist also interested in meta-science. For more details see

LEBEL: Curate Science – 2017 Year in Review and Upcoming Plans for 2018

The Replication Network

Curate Science ( is an online platform to track, organize, and interpret replications of published findings in the social sciences, with a current focus on the psychology literature.
We had a very productive year in 2017. Here are some highlights of our accomplishments:
– With N=1,008 replications, we became (to our knowledge) the world’s largestdatabase of curated replications in the social sciences, covering all replications from the Reproducibility Project: Psychology, Many Labs 1 and 3, the Social Psychology special issue, and Registered Replication Reports 1 through 6).
– Several new major features, most important one being a new searchable (and sortable) table of curated replications. One can search by topic, effect, keyword, method used and can sort by sample size and effect size (for both original and replication studies), and many more fields. Curated study characteristics include links to PDFs, open/public data, open/public materials…

View original post 348 more words

Even With Nuance, Social Psychology Faces its Most Major Crisis in History

"The truth is something that burns..." (JBP)

An interesting piece has recently been posted arguing that nuance is required to accurately understand the extent to which (social) psychology is currently facing a crisis of confidence. 31 articles are listed on important issues that need to be considered to accurately evaluate the current crisis and generate nuanced solutions to our problems.

I wholeheartedly agree nuance is required here. As the saying goes, the truth is in the details, and there are many many details that all need to be considered extremely carefully. That said, even after nuanced consideration of the issues raised in these 31 articles, a careful near-decade-long evaluation of the replicability of social psychology’s published findings based on thousands of replication studies, paints a grim picture: Even with nuance, social psychology currently faces its most major crisis in history.1

Of course, many nuanced details go into informing such a bold claim.2 I could be wrong, and I genuinely hope to be proven wrong if I indeed am wrong. But for now, see this sneak preview of the tip of the iceberg of the details that inform such claim.

Kahneman worried about a train wreck looming in social psychology. He was correct in worrying, and unfortunately I believe the train wreck has manifested. Indeed, the train wreck continues to inflict pain on the social psychology community on a near-weekly basis as an alarmingly growing number of cherished findings fail to hold up to closer scrutiny.

But is the field of social psychologyrotten to the core, as some have suggested?

"Rotten to the core"

In the spirit of thinking about things in a nuanced fashion, I strongly disagree with the rotten apple metaphor because some findings in social psychology do in fact replicate (eg1, eg2, eg3; though note replicable effects aren’t necessarily also valid/generalizable; eg). So it’s false that everything is rotten.  Rather, social psychology’s most pressing problem is its low replicability rate. Current meta-meta-analytic estimates, based on the careful evaluation of thousands of replications, suggest social psychology’s replicability rate is most likely somewhere in between 15-25%.3 Hence, a better metaphor is that of a raging forest fire, which has potential to become much worse, but also has the potential to be tempered, and eventually controlled to reasonable levels.

"The truth is something that burns..." (JBP)

Consequently, it’s not time to sit back, relax, and become complacent based on distorted perceptions of the state of social psychology just because such perceptions make us feel better. As emphasized by Ledgerwood’s piece, we need to stay focused on how to get better right now, starting today. This can be done, for example, by ensuring one’s research is sufficiently falsifiable by (1) executing and reporting one’s research in a sufficiently transparent manner and (2) prioritizing replicability by thinking in more nuanced ways about the systematic use of different kinds of direct replications (see our 10 shades of falsifiability replication taxonomy).

Many, if not most, social psychologists have embraced open science reforms in some capacity, and this is very promising and inspiring. That said, many elite social psychologists4 continue to make proto- or pseudo-scientific arguments against the fundamental scientific principles of transparency, replicability, and falsifiability. This is embarrassing to the majority of social psychologists who do understand the gravity of the situation. Such proto- and pseudo-scientific sophism must stop immediately. It’s our job to engage in dialogue with such elite social psychologists, who continue to hold powerful editorial positions and exert influence on graduate students. Our collective reputation is at stake. The longer this goes on for, the sooner (government) funders will consider pulling the plug on our funding, and more importantly, the sooner the public’s trust in social psychology will be further eroded.




1. For other crises, see Elms, 1975; Greenwald, 1975; Lykken, 1968.
2. All of these details will eventually be revealed in an upcoming project about the personal deception within the broken academic system.
3. Based on replicability rates of social psych effects from ML1, ML3, SP:Special Issue, RRRs, RP:P, and non-large-scale replication efforts (see here for the working meta-meta).
4. (Who will remain nameless here, but will be named in my upcoming project.)

The Language of Science: A Primer

Scientific knowledge accrues via the falsification of theory-derived hypotheses.

Falsification is the process of proving a hypothesis wrong.

Falsifiability is the extent to which a hypothesis, or approach to research, can be falsified.

Falsifiability requires sufficient transparency (i.e., full disclosure, open materials, open data, and pre-registration)

Falsifiability is a non-optional essential aspect of the scientific approach (or else one is a historian rather than a scientist).

Falsification is achieved via meticulously executed series of direct replications.

Taxonomy of direct replications with respect to falsifiability (10 shades of falsifiability).

Replication is the activity of carrying out a direct replication.

Replicability is the extent to which a particular effect/hypothesis, or area of research, is replicable.

An effect is said to be replicable if an effect of similar magnitude can be consistently observed, under specified (boundary) conditions, across independent samples and researchers.

Scientific findings must be demonstrably replicable under conditions specified, but not necessarily replicable at will.1




1. For example, the bending of light 1919 solar eclipse findings, which eventually led to the falsification of Newton’s theory of gravitation in favor of Einstein’s General Theory of Relativity, is not replicable at will, but is (and continues to be) demonstrably replicable for solar eclipses satisfying conditions originally specified.

Need for New Code of Ethics Compliance for Professional Researchers in Era of Hyper-Competitive High-Stake Academic Culture

Etienne P. LeBel & Anne Scheel

[Version 2.4; We thank Nick Brown for valuable feedback on a previous version of this blog post.]

Imagine your child is diagnosed with cancer. You have the choice between two drugs: One was developed and tested in a series of registered studies1, the other in non-registered studies. Which one do you choose? You would probably feel that the answer is a no-brainer — you want the drug whose efficacy was based on evidence least influenced by bias.

Extremely high stakes of pharmaceutical research, in the form of billion-dollar revenues generated from FDA-approved drugs, led the World Medical Association (WMA) in 2008 to institute mandatory study registration for all clinical trials reporting evidence on drug efficacy. This was preceded by the International Committee of Medical Journal Editors’ (ICMJE) decision in 2005 whereby non-registered clinical trial studies would no longer be considered for publication. The logic is that the risk posed by researcher biases in the analysis and reporting of study results, including bias in reporting inconclusive or negative studies, is so high that non-registered studies, consequently, simply cannot and should not be trusted.

The modern era of hyper-competitive high-output academic research culture has also led to extremely high stakes for individual researchers in the form of personal rewards such as prestigious jobs, promotion, book deals, outside financial interests, social status, and media attention. Consequently, there are no intellectually honest and defensible reasons against applying this same requirement to all published research involving human subjects. The person who prefers the cancer drug from registered studies cannot simultaneously dismiss the requirement of study registration for their own psychology studies. Consequently, it follows that all human subjects research not publicly registered should not even be considered for publication in any scientific journal (psychology or otherwise).

Indeed, the latest revised Declaration of Helsinki ethical principles, from 2013, dictates precisely such requirement:

  • 35. Every research study involving human subjects must be registered in a publicly accessible database before recruitment of the first subject.
  • 36. Researchers, authors, sponsors, editors and publishers all have ethical obligations with regard to the publication and dissemination of the results of research. Researchers have a duty to make publicly available the results of their research on human subjects and are accountable for the completeness and accuracy of their reports. All parties should adhere to accepted guidelines for ethical reporting. Negative and inconclusive as well as positive results must be published or otherwise made publicly available. Sources of funding, institutional affiliations and conflicts of interest must be declared in the publication. Reports of research not in accordance with the principles of this Declaration should not be accepted for publication.

Given study registration is not yet mandatory in psychology, however, professional psychologist researchers are not yet complying with these new ethical principles.2 Due to the high-stake personal rewards of the current academic research culture, however, we strongly believe it is time that all professional psychologist researchers abide by such new ethical principles requiring mandatory study registration, in addition to minimal reporting standards, open materials/data, and hypothesis pre-registration.

Anything short of this, given the environment in which researchers operate, fails to adhere to fundamental scientific principles: That is, reporting and testing hypotheses with sufficient transparency and thus falsifiability to maximize the likelihood that we as a research community can conclude a hypothesis is wrong, if it is in fact wrong (which can be easily achieved given new technologies3):

  • Without study registration at a centralized public registry, it is impossible, for us as researchers, to account for the selective file-drawering of “failed” or inconclusive studies.
  • Without a pre-registered method protocol (specified prior to data collection), it is near-impossible for us to account for the multitude of ways researchers may have (un)intentionally exploited analytic and design flexibility to achieve a publishable result.
  • Without minimal reporting standards (e.g., 21-word solutionBASIC 4 Psychological Science reporting standard), we cannot properly evaluate the strength of the reported evidence.
  • Without open materials, we cannot properly scrutinize the experimental design, nor can we conduct diagnostic independent replicability tests.
  • Without open data, we cannot verify the analytic reproducibility or the analytic robustness of the reported results, which need to be independently confirmed before investing precious research resources conducting expensive independent replications.

Being a scientist is a special and precious privilege. It is not an irrevocable right. As credentialed professionals, public intellectuals, and mentors, we have an inordinate amount of influence on citizens, the media and journalists, industry research and corporations, government agencies, NGOs, and other researchers both within and outside our respective fields. But with such importance and respect comes great responsibility.

Consequently, it follows that insufficiently transparent, and hence insufficiently falsifiable, research should be considered professionally unethical for the following reasons:

  • When the public funds research, taxpayers provide money in good faith that the funded projects will advance knowledge and help address societal problems. Non-falsifiable research wastes public funds which could otherwise be spent on social services and programs that reduce suffering and safe lives.
  • Non-falsifiable research also wastes additional public funds spent misguidedly trying to replicate and build upon such research.
  • Non-falsifiable research also leads to costly and ineffective practical implementation attempts, which can have grave consequences on real-world practical, legal, and political decisions.
  • Non-falsifiable research wastes the time of volunteering human subjects and in some cases unjustly puts their well-being at risk.
  • Non-falsifiable research erodes the public’s trust in scientists, will lead to further research funding cuts, and stifles society’s evolution toward evidence-based policy-making.

We propose that all professional psychologists need to abide by the new 2013 Declaration of Helsinki ethical principles, which are consistent with current lower bar country-based professional society code of ethics including the APA, CPA, DGPs, VSNU, and European Code of Conduct for Research Integrity (and as has been previously argued here). This is gravely needed to finally be accountable to the public. Accountable to the fact that all published research actually follows fundamental scientific principles, ensuring the necessary degree of transparency and falsifiability required for scientific progress (building upon existing softer and voluntary initiatives such as the Commitment to Research Transparency and the TOP guidelines).

Such a new ethical code of conduct would explicitly stipulate the following standards for all published scientific research4:

  • Public registration of all studies at a field-relevant centralized registry, which includes a pre-registered method protocol document clearly describing rationale of study, study sample and design, and planned data analytic approaches (e.g., IRB ethics approval documents).
  • Compliance with fundamental reporting standards relevant to the reported research (e.g., BASIC 4, CONSORT standard for experimental studies; STROBE standard for observational/correlational studies)
  • Open materials: Public online archiving of all relevant procedural details, materials, and measures, unless proprietary exclusions apply, to allow for proper scrutiny of experimental design and independent replicability tests.
  • Open data: Public online archiving of all relevant data, raw or transformed data, unless proprietary or confidentiality exclusions apply, to allow for verification of analytic reproducibility and analytic robustness of reported results.

Compliance with this new code of ethics could be implemented by having each stakeholder in a researcher’s ecosystem (i.e., via journals, professional societies, funding agencies, university employment contracts) require that individual researchers explicitly consent to following such code. This is akin to the Hippocratic Oath for medical professionals, guided by the more general Hippocratic Oath proposed for all scientists (see also here). Upon taking such oath, violation of the new ethical standards should be considered unethical and should be investigated as researcher misconduct by the appropriate stakeholder(s) involved.

We urgently need, at this current moment, to have a serious discussion within the psychological research community about the minimum scientific standards that need to be met to be an ethical researcher in this modern era of high-stake hyper-competitive high-output academic research culture. This discussion should incite calls to action to ensure that all stakeholders vigilantly enforce compliance to this new code of ethics. Otherwise, the reputation of all professional psychologists will continue to be tarnished, extensive research waste and direct and indirect harm to society will continue, and the public’s trust in science will be further eroded.




1. “Registered studies” as in studies registered in public centralized study registries prior to data collection, such as
2. We must emphasize, however, that a growing minority of psychologists have made admirable efforts to pre-register and provide open materials/open data for some or all of their studies.
3. E.g. technologies to safely store and share data and materials, preregister studies, establish a reproducible workflow, conduct multi-lab collaborations, verify the accuracy of one’s own and others’ reported results, and make manuscripts publicly available for pre-publication peer feedback.
4. These standards should not be misconstrued as guaranteeing scientific knowledge, but rather as minimal standards that need to be in place to allow the possibility of achieving valid and generalizable knowledge about how our world works.

Emptying my implicit social cognition file-drawer from graduate school (2005-2008)

At the PsychMethods Facebook discussion group, Uli Schimmack et al. have recently been discussing the lack of merit of the implicit self-esteem (ISE) construct.  I chimed in with a brief note concurring with Uli stating that during my first three years of graduate school, I amassed over 20 “failed studies” involving implicit self-esteem (building upon Dijksterhuis’, 2004 seminal ISE paper).1 This led to a tremendous waste of time and research resources substantially derailing my main line of research before I abandoned it altogether a few years later.

Uli asked me if I’d ever published or at least archived such failed studies? I replied in the negative because this wasn’t done in pre-2010 days. I did mention, however, that I would be publicly releasing more details of these failed studies in a book I’m currently writing about social psychology’s unraveling in the context of the broken academic system.

As a sneak preview, I’ve decided to empty my entire file-drawer for all implicit social cognition studies2 I executed during my first three years of graduate school (2005-2008), which includes the 20+ failed studies on implicit self-esteem specifically:

I became so frustrated with my “lack of success” that I created this table in the Spring of 2008 to more carefully document my failures. I also printed out a hard copy of the table and would show it to professors and visiting external speakers. In an exasperated tone, I would ask them: What the hell am I doing wrong?


1. I wouldn’t go as far as Uli in declaring that “implicit self-esteem is DEAD; R.I.P Implicit Self-Esteem (2000-2015).” I would, however, strongly caution any researcher, particularly early-career researchers, against investing research resources on this topic.

2. Sample sizes for the studies ranged between N=80 to N=140, following the traditional heuristic of N=~20 per cell for between-subjects designs (sometimes re-sampling an additional N=20 to N=40 in the case of statistically marginal effects).

We Need Federally Funded Daisy Chains


One of the most provocative requests in the reproducibility crisis was Daniel Kahneman’s call for psychological scientists to collaborate on a “daisy chain” of research replication. He admonished proponents of priming research to step up and work together to replicate the classic priming studies that had, up to that point, been called into question.

What happened? Nothing. Total crickets. There were no grand collaborations among the strongest and most capable labs to reproduce each other’s work. Why not? Using 20:20 hindsight it is clear that the incentive structure in psychological science militated against the daisy chain idea.

The scientific system in 2012 (and the one still in place) rewarded people who were the first to discover a new, counterintuitive feature of human nature, preferably using an experimental method. Since we did not practice direct replications, the veracity of our findings weren’t really the point. The point was to be the…

View original post 800 more words

LEBEL: Introducing “CurateScience.Org”

The Replication Network

It is my pleasure to introduce Curate Science ( to The Replication Network. Curate Science is a web application that aims to facilitate and incentivize the curation and verification of empirical results in the social sciences (initial focus in Psychology). Science is the most successful approach to generating cumulative knowledge about how our world works. This success stems from a key activity, independent verification, which maximizes the likelihood of detecting errors, hence maximizing the reliability and validity of empirical results. The current academic incentive structure, however, does not reward verification and so verification rarely occurs and when it does, is highly difficult and inefficient. Curate Science aims to help change this by facilitating the verification of empirical results (pre- and post-publication) in terms of (1) replicability of findings in independent samples and (2) reproducibility of results from the underlying raw data.
The platform facilitates replicability by enabling users…

View original post 451 more words

High-powered direct replications of social psychology findings (for in press paper; out-of-date)

***IMPORTANT NOTE***: This list was compiled on October 13, 2015 solely for an in press paper at JPSP, to be referenced as additional replications of social psych findings **beyond** large-scale replication efforts such as RP:P, Social Psych special issue, ML1, and ML3 and was not meant to be disseminated widely. Hence, this list is completely out-of-date. For a more systematic effort to track replications in psychology, see Curate Science.

The table below lists successful (n=3) and unsuccessful (n=111) high-powered direct replications of social psychology findings (known to us on October 13, 2015). For simplicity, only replications with statistical power >= 80% to detect an effect size as large (or larger) than the original finding are included (citation counts according to Google Scholar, retrieved October 2015). This list was tabulated as additional evidence to support the broader position that the current incentive structure in social psychology is not conducive to generating cumulative knowledge in light of several meta-scientific investigations revealing low replicability rates of social psychology findings (e.g., Reproducibility Project: 76% replication failure rate of social psychology studies; Social Psych special issue: 70% failure rate; Many Labs 3: 88% failure rate).








Pashler et al. (2009)

Cesario et al. (2007, Study 2)

Doyen et al. (2012, Study 1)

Doyen et al. (2012, Study 2)



Harris et al. (2013, Study 1)

Harris et al. (2013, Study 2)



Acker (2008)

Calvillo & Penaloza

Lassiter et al. (2009)

Newell et al. (2009)

Rey et al. (2009)

Thorsteinson & Withrow

Nieuwenstein & van Rijn (2012, Study 1)

Nieuwenstein & van Rijn (2012, Study 2)



Rotteveel et al. (2015, Study 1)

Rotteveel et al. (2015, Study 2)



Cesario & Corker (2010)

Astrologo et al. (2014)



Johnson et al. (2015)

Zhong et al. (2010, Study 2)

warmth promotes interpersonal warmth


Lynott et al. (2014, Study 1)

Lynott et al. (2014, Study 2)

Lynott et al. (2014, Study 3)



Tate (2009)

Grenier et al. (2012)

Rohrer et al. (2015, Study 1)

Rohrer et al. (2015, Study 2)

Rohrer et al. (2015, Study 3)



Eder et al. (2001)

Shanks et al. (2013, Study 4)

Shanks et al. (2013, Study 5)

Shanks et al. (2013, Study 6)

Shanks et al. (2013, Study 8)

facial-preferences effect


Harris (2011)



Earp et al. (2014, Study 1)

Earp et al. (2014, Study 2)

Earp et al. (2014, Study 3)

Gamez et al. (2011, Study 2)

Gamez et al. (2011, Study 3)

Fayard et al. (2009, Study 1)



Wagenmakers et al. (2011)

Galak et al. (2012, Study 1)

Galak et al. (2012, Study 2)

Galak et al. (2012, Study 3)

Galak et al. (2012, Study 4)

Galak et al. (2012, Study 6)

Galak et al. (2012, Study 7)

Ritchie et al. (2012, Study 1)

Ritchie et al. (2012, Study 2)

Ritchie et al. (2012, Study 3)

Galak et al. (2012, Study 5)




Brandt (2013, Study 1)

Brandt (2013, Study 2)

Brandt (2013, Study 3)



Steele et al. (2015, Study 1)

Steele et al. (2015, Study 2)

Steele et al. (2015, Study 3)

Steele et al. (2015, Study 4)



Ranehill et al. (2015)

Koch & Broughal



Johnson et al. (2014a, Study 1)

Johnson et al. (2014a, Study 2)

Lee et al. (2013)

Johnson et al. (2014b)

pro-sociality of high SES effect


Korndorfer et al. (2015, Study 1)

Korndorfer et al. (2015, Study 2)

Korndorfer et al. (2015, Study 3)

Korndorfer et al. (2015, Study 4)

Korndorfer et al. (2015, Study 5)

Korndorfer et al. (2015, Study 6)

Korndorfer et al. (2015, Study 7)

Korndorfer et al. (2015, Study 8)

Morling et al. (2014)

distance priming


Pashler et al. (2012, Study 1)

Pashler et al. (2012, Study 2)

Johnson & Cesario
(2012, Study 1)

Johnson & Cesario
(2012, Study 2)

Sykes et al. (2012)

on approach/avoidance


Steele (2013)

Steele (2014)

warmth embodiment effect


Donnellan et al. (2015, Study 1)

Donnellan et al. (2015, Study 2)

Donnellan et al. (2015, Study 3)

Donnellan et al. (2015, Study 4)

Donnellan et al. (2015, Study 5)

Donnellan et al. (2015, Study 6)

Donnellan et al. (2015, Study 7)

Donnellan et al. (2015, Study 8)

Donnellan et al. (2015, Study 9)

Ferrell et al. (2014)

McDonald et al. (2015)



Banas et al. (2013)

Blech (2014)

Hesslinger et al. (2015)

on voting


Harris & Mickes

model of AMP


Tobin & LeBel
(Study 1)

Tobin & LeBel
(Study 2)



Pashler et al. (2013, Study 1)

Pashler et al. (2013, Study 2)

Pashler et al. (2013, Study 3)

of 1/f noise on WIT


Madurski & LeBel
(2015, Study 1)

Madurski & LeBel
(2015, Study 2)

of secrets


LeBel & Wilbur (2014, Study 1)

LeBel & Wilbur (2014, Study 2)

Perfecto, Moon, & Nelson (2012)

is money effect


Connnors et al. (in press, Study 1)

Connnors et al. (in press, Study 2)



McCarthy (2014, Study 1)

McCarthy (2014, Study 1)



McDonald et al. (2014, Study 1)

McDonald et al. (2014, Study 2)



McCullough & Hone (2015)

embodiment effect


LeBel & Campbell (2013, Study 1)

LeBel & Campbell (2013, Study 2)

Recommendations for peer review in current (strained?) climate

In this post, I will discuss challenges that arise when peer reviewing submitted articles in the current tense climate. Such climate stems from the growing recognition that we need to more openly and fully report our methods and results, avoiding questionable research practices and hence questionable conclusions. This is inspired by a recent piece wherein two authors felt unfairly accused of “nefarious practices” and also based on some of my own recent experiences peer reviewing articles.

The goal of peer review — for empirical articles at least — is to carefully evaluate research to make sure that conclusions drawn from evidence are valid (i.e., correct). This involves evaluating many different aspects of the reported research including whether correct statistical analyses were carried out, whether appropriate experimental designs were used,  and whether any confounds were unintentionally introduced, to name a few.

Another concern, which has recently received a lot more attention, is to assess the extent to which flexibility in design and/or analyses may have contributed to the reported results (Simmons et al., 2011; Gelman & Lokan, 2013). That is, if a set of data are analyzed in many different ways and such analytic multiplicity isn’t appropriately accounted for, incorrect conclusions can be drawn from the evidence due to an inflated false positive error rate (e.g., incorrectly concluding an IV had a causal effect on a DV when in fact the data are entirely consistent with what one would expect due to sampling error assuming the null is true).

Hence, a crucial task when reviewing an empirical article is to rule out that flexibility in analyses (&/or design, e.g., data collection termination rule) can account for the reported results, and hence avoid the possibility that invalid conclusions have been made. From my perspective, however, it is really important that as reviewers we do this very carefully so that authors (whose work is being reviewed) do not feel accused of intentional p-hacking or researcher misconduct.

Here’s an example to demonstrate my point. During peer-review of an article on goal-directed bias in memory judgments (at Consciousness & Cognition), O’Connor & Mill felt unfairly accused of “unconventional and nefarious practices” in analyzing their data (see here for details). We don’t have all of the details, but it looks like one of the reviewers was concerned about how exclusions were made by the authors with regard to (1) overly low sensitivity index (d’) and (2) native language requirements. This reviewer went on to say that “the authors must accept the consequences of data that might disagree with their hypotheses”. It should be clear that this reviewer was completely justified in being concerned that flexibility in the different exclusions criteria that could have been used could have lead to invalid conclusions regarding the target phenomenon (i.e., how goals can bias memory processes). However, in my opinion, the language used to express such a concern was inappropriate because it insinuated that such flexibility may have been intentionally exploited.

Another example comes from a recent paper I reviewed that reported evidence that “response effort” may moderate the impact of cleanliness priming on moral judgments (under review at Frontiers). On the surface, the evidence seemed very strong, but upon closer inspection I realized that there seemed to be quite a bit of flexibility with respect to (1) how “response effort” was operationalized across the 4 reported studies and (2) exclusion criteria used for excluding participants who exhibited “insufficient effort responding”. Concerned that such flexibility may have contributed to an inflated false positive error rate (and hence invalid conclusions), I carefully delineated these concerns and concluded my review by stating:

“In sum, the main problem is that based on the methods and results presented in the current manuscript, we cannot rule out the possibility that unintentional confirmation bias inadvertently (1) biased the operationalization of “response effort” and (2) biased the chosen exclusion criteria, which in combination represents a potential alternative explanation for the current pattern of results.”

It is important to notice that I intentionally framed my concern in terms of the fact that flexibility in analyses may have unintentionally biased the results. This is extremely crucial because most authors are probably not aware that flexibility in analyses/methods may have unduly influenced their reported results. Hence, of course they will become defensive if you insinuate that they have intentionally exploited such flexibility, when they in fact have not intentionally done so. This would be akin to insinuating that researchers intentionally confounded their experimental manipulation! The point here is that flexibility in analyses/design — just like experimental confounds — need to be ruled out, and this is necessary for valid inference regardless of whether these problems were intentionally or unintentionally introduced.


1. Always frame your concerns about flexibility in analyses/design (or any other concern) using language that focuses on the ideas rather than the authors.
2. Give the benefit of the doubt to authors and always assume that flexibility in analyses/design may have unintentionally influenced the reported results.
3. Use a standard reviewer statement that has been specifically designed to help with such matters. The statement (developed by Uri Simonsohn, Joe Simmons, Leif Nelson, Don Moore, and myself) can be used by any reviewer to request disclosure of additional methodological details, which can help assess the extent to which flexibility in analyses/design may have contributed to the reported results. Using this standard statement is another way to avoid having the authors feel as though you are insinuating they have intentionally done something questionable.

“I request that the authors add a statement to the paper confirming whether, for all experiments, they have reported all measures, conditions, data exclusions, and how they determined their sample sizes. The authors should, of course, add any additional text to ensure the statement is accurate. This is the standard reviewer disclosure request endorsed by the Center for Open Science [see]. I include it in every review.”

Insufficiently open science — not theory — obstructs empirical progress!

I stumbled upon Greenwald et al.’s (1986) “Under what conditions does theory obstruct research progress” article the other day and decided to re-read it. I found it very interesting to re-read in the context of current controversies about p-hacking and replication difficulties! Very prescient indeed.

In the article, Greenwald et al. argued that theory obstructs research progress when:
1. testing theory is the central goal of research, and
2. the researcher has more faith in the correctness of the theory than in the suitability of the procedures used to test the theory.

Though I agree with their main argument (& indeed we’ve made a very similar argument here), I don’t think it’s completely correct (or at least incomplete given what we now know about modal research practices).

I want to put forward the possibility that it is insufficiently open research practices — rather than theory-confirming practices — that obstruct empirical progress! Testing theory has always involved the (precarious) goal of producing experimental results that confirm a theory-derived novel empirical predictions. Such endeavors almost always involve repeated tweaking and refinement of procedures and calibration of instruments. As long as researchers are sufficiently open about the methods used to execute their experimental tests, however, such theory-confirming practices *can* lead to empirical progress. This is the case because being open means other researchers can gauge more objectively all of the required methodological tweakings that were required to get the theory-confirming result, but is also the case because being open means using stronger methods and better thought-out experimental designs to begin with. Consequently, being more open means theory-derived empirical predictions are more open to disconfirmation (given disconfirmation requires strong methods), which actually substantially *accelerates* research progress! Don’t take my word for it, here’s what Richard Feynman had to say on the subject:

“We are trying to prove ourselves wrong as quickly as possible, because only in that way can we find progress.” (Richard Feynman)


Two quotes from Greenwald et al.’s article that inspired this post!

“The theory-testing approach runs smoothly enough when theoretically predicted results are obtained. However, when predictions are not confirmed, the researcher faces a predicament that can be called the disconfirmation dilemma (Greenwald & Ronis, 1981). This dilemma is resolved by the researcher’s choosing between proceeding (a) as if the theory being tested is incorrect (e.g., by reporting the disconfirming results), or (b) as if the theory is still likely to be correct. The researcher who preserves faith in the theory’s correctness will persevere at testing the theory — perhaps by conducting additional data analyses, by collecting more data, or by revising procedures and then collecting more data.” (p. 219).

“A theory-confirming researcher perseveres by modifying procedures until prediction-supporting results are obtained. Particularly if several false starts have occurred, the resulting confirmation may well depend on conditions introduced while modifying procedures in response to initial disconfirmations. However, because no systematic empirical comparison of the evolved (confirming) procedures with earlier (disconfirming) ones has been attempted, the researcher is unlikely to detect the confirmation’s dependence on the evolved details of procedure. Although the conclusions from such research need to be qualified by reference to the tried-and-abandoned procedures, those conclusions are often stated only in the more general terms of the guiding theory. Such conclusions constitute avoidable overgeneralizations.” (p. 220)