Thursday, October 24, 2013

Why a lot of published scientific research could be wrong

We've talked about this before, but this infographic presents it pretty cleverly.

----

First I would love to see a source on where he gets his numbers of false positives and false negatives.

He also oversimplifies the way experiments work.  For example with the higgs boson.  The experiments were run continuously until statistical significance  was seen.  That means the hypothesis was testing hundreds and hundreds of times.  The false negative/positive issue is resolved by virtue of the experimental setup.  

And on top of that, published work is not set on a pedestal because it is published.  Once it is published there is generally rigorous work to review and replicate the findings.  
So in short, he is wrong, and when he is right it doesn't matter because that is part of the process.  

-----

Those #s were just hypothetical based on a probability distrib, just to make his point. But sadly few scientists even consider type 1 and 2 errors, and just project confidence that their findings are bulletproof. In bio, when ppl try to reproduce expts, they can't about half the time, according to Science mag.
Yeah I think they are referring to distinct hypotheses, so a series of expts relating to the same hypoth. would probably be 1 data point only. There is also a risk associated with repeated "testing until significance", cuz the more measurements you make, the higher the chance you'll get some random signal and mistakenly use it to prove your hypoth. 

Actually you would be surprised how little rigor goes into the review process, I am sure S and A could comment. If the author is a big name with political clout, free pass. No one has the time or resources to verify and replicate work, it is the honor system. If another group wants to build on the research and don't get the same results, they may publish that, but may not to avoid controversy and be embarrassed if others don't buy it. 

But considering the incentive structure and human nature and the higher stakes of published results, it's a bad combo. Also the journal wants to publish big sexy results to make more money so there is COI.
----
re: "mistakenly use it to prove your hypoth" I think you don't understand what I'm describing.  The measurements means more averaging as I'm describing it.  And more averaging means LESS chance of random signal causing an error.  It is just a method for eliminating "noise" whatever that noise may be.  For the higgs boson it was a very specific energy level measured from the decay.  Every time they took another measurement, if the higgs existed the energy signature would be reinforced and other sources of energy negated, or the opposite if it did not exist.  And with some fancy math they get a confidence level based on number of trials and blah blah.  So not all experiments are susceptible in the same way.
Re COI and big names.  Yea.  Reinheart and Roghoff (sp?) is an example where more weight was given to big names.  But it is also an example of the fact that people do look at this stuff.  Though you could argue the damage was done by the time we figured out their conclusions weren't great.

In any case I'm not arguing that false positivies don't exist in academic journals.  I am arguing that this particular guy's math that proves it is false at worst or misleading at best.
----
I see, thanks M. In that case, the higgs boson expt was doing its own internal QA and replicating the experiment many times (something very lacking in bio and business, where many expts are underpowered). So if the variance is small and there is a strong signal from most of the trials, then that is a very convincing result.
At work we were struggling with a related issue. We usually don't approve changes to the website unless they pass a lot of statistical and sanity checks. But due to probability we know that we are likely rejecting some features that were truly positive (but we couldn't detect it, by chance or by poor design), and we are approving some features that are in fact neutral or even negative. That is the scary possibility. Sometime our approval criterion is "do no harm", as in if all the key metrics look to be within noise, and we can't reject the null hypothesis, then it's OK. But there is a slim chance that we are approving stuff that is actually very harmful. And if the company is approving 100's of features a year, it's likely some of them are harmful but mislabeled as positive or neutral. Though for mature businesses, where you are squeezing out basis points on the margin, the risk is probably not terrible. I guess there is a balancing act between scientific rigor and business expediency/strategy. But at least in e-commerce, an error isn't likely going to cost lives (unless it's Apple maps LOL). But for consumables, finance, or public works, it could be really bad. And they say we don't need regulation? :)
----
Yeah, it's too bad that knowledge of this sort of thing seems to only reinforce the worst of carpetbagging instincts: get in early, publish as much as you can whether of integrity or not, and then do everything in your power to ensure that no one who comes after you could possibly do it as its supposed to be done in the first place.

Maybe it's a cultural thing - like, perpetuating a hierarchical ponzi scheme requires the smartest of people to buy in to the stupidest of ideas (like blinders, narrowness, and petty infighting)...

PS:  Technology will (ultimately) solve all these problems as democratized access to information will lead to market forces being brought to bear on the most indefensible of regressive attitudes and mentalities (most of which are sustained on the basis of petty economics).
----
We were talking about this, and here were our general follow-up thoughts:

- The sci. method and the peer-review process are hundreds of years old and developed during times when methods were not so empirical & specialized, and there wasn't such $$ at stake for discoveries (for people like Euler and Carnot, I think reputation, knowledge, and prestige might have trumped the incentive to cheat, be negligent, and cut corners)

- Obviously we live in a new age now, and as C said the democratizing effects of technology could mitigate the problem (i.e. a grad student found Rogoff's errors when big time editorial staff and top peers didn't), but there are limits to that in hard science because of the cost and specialization of certain expts (though if you need a certain highly skilled postdoc to do a test a particular nuanced way to get a positive result, probably your result is not that robust)

- At least for bio, we thought to develop an independent, confidential auditing lab within the NIH to verify all published results. Journals and authors have to pay a fee to support it, and an article that passes the audit gets a certification that gives it more clout. The author's lab has to send all the materials over to the NIH (or let them use their equipment), with instructions, and the expert NIH staff have to reproduce the result within reasonable variance (they sign NDAs and no-competes so the author has no fear of being scooped). If they can't reproduce, then the paper can still be published at the journal's discretion, but without certification.
----
I would think an algorithm could be developed (or maybe already has?) that can blindly take in the statistical data from an experiment and verify the conclusions.  Rogoff was a case of bad math not bad data right?  That removes the requirement to reproduce experiments and leaves you with only the first two cases of lies, damned lies, and statistics.
The NIH thing is interesting but there is still money involved which is always problematic.  The NIH is not free from politics since that is its major source of funding.  And there are many many public institutions that don't share the data found through public dollar funded experiments already.  The democratization of technology doesn't help when all the research is behind a pay wall.
----
My comments are enclosed (in-line)...  Also, in case anyone didn't see the full article:  http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble

I would think an algorithm could be developed (or maybe already has?) that can blindly take in the statistical data from an experiment and verify the conclusions.  Rogoff was a case of bad math not bad data right?  That removes the requirement to reproduce experiments and leaves you with only the first two cases of lies, damned lies, and statistics. 

It's a good idea; one would think some of that would be built-in to the tools the scientists and researchers are using but clearly there is progress to be made...
The NIH thing is interesting but there is still money involved which is always problematic.  The NIH is not free from politics since that is its major source of funding.  
True... but federal institutions tend to be bulwarks of trust (certainly to a greater extent than a tobacco company or The Koch Brothers)...
And there are many many public institutions that don't share the data found through public dollar funded experiments already.  

This is also true... but changing.  Even institutions which are known for being benefactors of the public (e.g. U.C. Berkeley) are getting more formal about sharing research and ensuring access... so the change is bound to ripple through to other public institutions too...
The democratization of technology doesn't help when all the research is behind a pay wall.

Yeah, that is slow to change - but perhaps the most hopeful front.  The economics of information mean is such that librarians and libraries are being confronted by these questions so the clock is already ticking: if our engineering library at U.C. Berkeley is getting rid of its books for the value the physical space has to the college then its probably a matter of time until someone rationalizes our giving research to private journals so that they can rip off the campus with subscription fees that (over time) do not seem to be worth more than the promise of a new student.
-----
For Rogoff, I think it was a variety of issues, but some of it was excel formulas pointing to the wrong cells, as well as "improper" suppression of some data points. So that is mostly human error and poor judgment, which is hard for an algo to rectify.

An algo would be cool (like how the IRS algos randomly scan over people's returns and flag errors/possible issues), but then the scientific data has to be formatted and structured according to some standards so the algo can read it properly. And since each article kind of measures different things and employs different significance tests, it could be tricky. Hell, they can't even get healthcare.gov right using 50 contractors (maybe that is their problem!).
There are some free or lower cost internet journals out there... hopefully they can give the heavies a run for their money some day. But due to the prestige factor, no big names want to "slum it" with online journals. But I agree that those journals are ripping off institutions to give then "novel research" that is at least 30% useless and another 30% incorrect.

Jokes aside, we piss and moan about how bad public institutions are (and they are of course not free of corruption either... see the MMS), but as C said, they are our last resort against a totally for-profit world. We need to strengthen the public institutions so that they offer a legit alternative to the for-profits, and then with customer choice it will compel the for-profits to clean up and stop shafting us so much. I guess that's why the health industrial complex fought so hard against single payer, which is BY FAR the best feasible health system in the Western world, warts and all. But with all the dysfunction in Washington, furloughs, and a reduction in public worker compensation/respect, it makes it less likely that our best and brightest will want to go into public service and stay there long enough to make an impact. 

No comments: