The Problem of False Positives
When we fail to think like Bayesians, false positives are a problem not just for mammograms but for all of science. In the introduction to this book, I noted the work of the medical researcher John P. A. Ioannidis. In 2005, Ioannidis published an influential paper, “Why Most Published Research Findings Are False,” 40 in which he cited a variety of
statistical and theoretical arguments to claim that (as his title implies) the majority of hypotheses deemed to be true in journals in medicine and most other academic and scientific professions are, in fact, false.
Ioannidis’s hypothesis, as we mentioned, looks to be one of the true ones; Bayer Laboratories found that they could not replicate about two-thirds of the positive findings claimed in medical journals when they attempted the experiments themselves.41 Another way to check the veracity of a research finding is to see whether it makes accurate predictions in the real world—and as we have seen throughout this book, it very often does not. The failure rate for predictions made in entire fields ranging from seismology to political science appears to be extremely high.
“In the last twenty years, with the exponential growth in the availability of information, genomics, and other technologies, we can measure millions and millions of potentially interesting variables,” Ioannidis told me. “The expectation is that we can use that information to make predictions work for us. I’m not saying that we haven’t made any progress. Taking into account that there are a couple of million papers, it would be a shame if there wasn’t. But there are obviously not a couple of million discoveries. Most are not really contributing much to generating knowledge.”
This is why our predictions may be more prone to failure in the era of Big Data. As there is an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate. For instance, the U.S. government now publishes data on about 45,000 economic statistics. If you want to test for relationships between all combinations of two pairs of these statistics—is there a causal relationship between the bank prime loan rate and the unemployment rate in Alabama?—that gives you literally one billion hypotheses to test.*
But the number of meaningful relationships in the data—those that speak to causality rather than correlation and testify to how the world really works—is orders of magnitude smaller. Nor is it likely to be increasing at nearly so fast a rate as the information itself; there isn’t any more truth in the world than there was before the Internet or the printing press. Most of the data is just noise, as most of the universe is filled with empty space.
Meanwhile, as we know from Bayes’s theorem, when the underlying incidence of something in a population is low (breast cancer in young women; truth in the sea of data), false positives can dominate the results if we are not careful. Figure 8-6 represents this graphically. In the figure, 80 percent of true scientific hypotheses are correctly deemed to be true, and about 90 percent of false hypotheses are correctly rejected. And yet, because true findings are so rare, about two-thirds of the findings deemed to be true are actually false!
Unfortunately, as Ioannidis figured out, the state of published research in most fields that conduct statistical testing is probably very much like what you see in figure 8-6.* Why is the error rate so high? To some extent, this entire book represents an answer to that question. There are many reasons for it—some having to do with our psychological biases, some having to do with common methodological errors, and some having to do with misaligned incentives. Close to the root of the problem, however, is a flawed type of statistical thinking that these researchers are applying.
We value our customers and so we ensure that what we do is 100% original..
With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.Read more
The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.Read more
The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.Read more
By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.Read more