At this past SPSP, Uri Simonsohn gave a talk on new ways of thinking about statistical power. From this new perspective, you first determine how large a sample size you can afford for a particular project. Then, you can determine the minimum effect size that can reliably detected (i.e., 95% power) for that sample size (e.g., d_min = .73 can be reliably detected with n=50/cell). I believe that this approach is a much more productive way of thinking about power for several reasons, one being that it substantially enhances the interpretation of null results. For instance, you can conclude (assuming integrity of methods and measurement instruments) that the effect you’re studying is unlikely to be the size of the minimum effect size reliably detectable for your sample size (or else you would have detect it). That being said, it is still possible the effect exists but is much smaller in magnitude, which would require a much larger sample size to reliably detect.
In this post, I use the core ideas from this new approach to come up with a simpler and more intuitive way of gauging publication bias for extant empirical studies.
The idea is simple. If a study reports an observed effect size smaller than the minimum effect size reliably detectable for the sample size used, then the study likely suffers from publication bias and should be interpreted with caution. The further away the observed effect size is from the minimally detectable effect size, the larger the bias. Let’s look at some concrete examples.
Zhong & Liljenquist’s (2006) Study 1 on the “Macbeth effect” found a d=.53 using n=30/cell. At this sample size, however, only effect sizes as large as d=.95 (or greater) are reliably detectable with 95% power. On the other hand, Tversky & Kahneman’s (1981) Framing effect study found a d=1.13 using n=153/cell. At that sample size, effect sizes as small as d=.41 are reliably detectable. See Table below for other examples:
The new bias index can be calculated as follows:
(And note we’d want to calculate a 95% C.I. around the bias estimate, given that bias estimates should be more precise for larger Ns all else being equal.)
To shed more light on the value of this simpler publication bias index, in the near future I will calculate these for studies where replicability information exists and test empirically whether the index predicts lower likelihood of replication.
One problem I have with this, or any other, simple view on evaluating outcomes of a measurement system (empirical study), is that its purpose is reduced to increasing precision by increasing sample size. There is also Accuracy and Resolution of measurement and increasing sample size is just one of many things you can do to increase Precision.
Moreover, this is all ergodic logic. Increasing the number of human participants is considered to be the same thing as adding random events in order to force a random variable to reveal more of the values it can take on.
As the concluding remarks of this excellent paper by Friston point out:
“In this treatment, we have assumed biological systems are ergodic. Clearly, this is a simplification, in that real systems are only locally ergodic. The implication here is that self-organized systems cannot endure indefinitely and are only ergodic over a particular (somatic) timescale, which raises the question of evolutionary timescales: is evolution itself the slow and delicate unwinding of a trajectory through a vast state space—as the universe settles on its global random attractor?”
http://rsif.royalsocietypublishing.org/content/10/86/20130475.full
To advance the field in the scientific sense I’m afraid we need more complexity and less ergodic theory.