|
 Originally Posted by Savy
Does that not more lead to the point that your (because you are soft sciences) just aren't all that meaningful to begin with? That's why differing results aren't that uncommon because the original point wasn't all that true to begin with.
Differing results are uncommon not because people don't know the distribution they're dealing with (most things in psychological science is normally distributed because it involves a lot of variables contributing that tend to average out into a Gaussian; in some cases the distributions are skewed or exponential but it's fairly easy to identify when those situations arise). The kinds of fat-tailed distributions Taleb refers to mostly occur in fields like economics afaik.
Differing results occur for a variety of reasons, the two main ones being that people don't have enough statistical power (i.e., a large enough sample) to allow their result to replicate reliably, and because until recently we have accepted evidence that was not all that compelling. P-values provide a poor metric for the strength of the evidence because they imply a false positive rate that is vastly deflated - it's like saying 'it looks like this must be a real effect because the p-value is statistically significant at p = 0.05' which intuitively seems to mean a false positive rate of 0.05. But such a p-value would have a false positive rate closer to 25%.
There are other reasons that relate to the use of p-values and the binary nature of the decision process in null hypothesis statistical testing such as p-hacking, optimal stopping, publication bias, and inflation of effect size etc., that would likely be eliminated if the pressure was to publish good, solid science rather than to just publish.
The neurosciences have a problem because running experiments is costly. Scanning a single person in an fMRI can cost about £400. So they tend to run (too) small sample studies that tend not to replicate reliably.
Social psychology has a problem because they are often looking for small effects that arguably aren't all that important, and that are difficult to interpret due to statistical noise. The effects can also be sensitive to small changes in the methods that suggests they aren't all that robust.
My field of human movement tends to produce large effects and it's cheap to run subjects, so I don't run into too many problems with stats. If you get an effect size of 1.5 standard deviations in a sample of n = 20 it's going to replicate close to 100% of the time. Movements are pretty stereotypical, and it doesn't matter if I measure you, an undergrad, or the queen, I'm going to get pretty consistent data from everyone.
|