----
First I would love to see a source on where he gets his numbers of false positives and false negatives.
He
also oversimplifies the way experiments work. For example with the
higgs boson. The experiments were run continuously until
statistical significance was seen. That means the hypothesis was
testing hundreds and hundreds of times. The false negative/positive
issue is resolved by virtue of the experimental setup.
And on top of that, published work is not set on a
pedestal because it is published. Once it is published there is
generally rigorous work to review and replicate the findings.
So in short, he is wrong, and when he is right it doesn't matter because that is part of the process.
-----
Those #s were just hypothetical based on a probability distrib, just to
make his point. But sadly few scientists even consider type 1 and 2
errors, and just project confidence that their findings are bulletproof.
In bio, when ppl try to reproduce expts, they can't about half the
time, according to Science mag.
But considering the incentive structure and human
nature and the higher stakes of published results, it's a bad combo.
Also the journal wants to publish big sexy results to make more money so
there is COI.
Yeah I think they are referring to distinct hypotheses,
so a series of expts relating to the same hypoth. would probably be 1
data point only. There is also a risk associated with repeated "testing
until significance", cuz the more measurements you make, the higher the
chance you'll get some random signal and mistakenly use it to prove your
hypoth.
Actually you would be surprised how little rigor
goes into the review process, I am sure S and A could comment.
If the author is a big name with political clout, free pass. No one has
the time or resources to verify and replicate work, it is the honor
system. If another group wants to build on the research and don't get
the same results, they may publish that, but may not to avoid
controversy and be embarrassed if others don't buy it.
----
re: "mistakenly use it to prove your hypoth" I think you don't
understand what I'm describing. The measurements means more averaging
as I'm describing it. And more averaging means LESS chance of random
signal causing an error. It is just a method for eliminating "noise"
whatever that noise may be. For the higgs boson it was a very specific
energy level measured from the decay. Every time they took another
measurement, if the higgs existed the energy signature would be
reinforced and other sources of energy negated, or the opposite if it
did not exist. And with some fancy math they get a confidence level
based on number of trials and blah blah. So not all experiments are
susceptible in the same way.
Re COI and big names. Yea. Reinheart and Roghoff (sp?)
is an example where more weight was given to big names. But it is also
an example of the fact that people do look at this stuff. Though you
could argue the damage was done by the time we figured out their
conclusions weren't great.
In any case I'm not arguing that false positivies
don't exist in academic journals. I am arguing that this particular
guy's math that proves it is false at worst or misleading at best.
----
I see, thanks M. In that case, the higgs boson expt was doing
its own internal QA and replicating the experiment many times (something
very lacking in bio and business, where many expts are underpowered).
So if the variance is small and there is a strong signal from most of
the trials, then that is a very convincing result.
At work we were struggling with a related issue. We usually
don't approve changes to the website unless they pass a lot of
statistical and sanity checks. But due to probability we know that we
are likely rejecting some features that were truly positive (but we
couldn't detect it, by chance or by poor design), and we are approving
some features that are in fact neutral or even negative. That is the
scary possibility. Sometime our approval criterion is "do no harm", as
in if all the key metrics look to be within noise, and we can't reject
the null hypothesis, then it's OK. But there is a slim chance that we
are approving stuff that is actually very harmful. And if the company is
approving 100's of features a year, it's likely some of them are
harmful but mislabeled as positive or neutral. Though for mature
businesses, where you are squeezing out basis points on the margin, the
risk is probably not terrible. I guess there is a balancing act between
scientific rigor and business expediency/strategy. But at least in
e-commerce, an error isn't likely going to cost lives (unless it's Apple
maps LOL). But for consumables, finance, or public works, it could be
really bad. And they say we don't need regulation? :)
----
Yeah, it's too bad that knowledge of this sort
of thing seems to only reinforce the worst of carpetbagging instincts:
get in early, publish as much as you can whether of integrity or not,
and then do everything in your power to ensure that no one who comes
after you could possibly do it as its supposed to be done in the first
place.
PS:
Technology will (ultimately) solve all these problems as democratized
access to information will lead to market forces being brought to bear
on the most indefensible of regressive attitudes and mentalities (most
of which are sustained on the basis of petty economics).
Maybe it's a
cultural thing - like, perpetuating a hierarchical ponzi scheme
requires the smartest of people to buy in to the stupidest of ideas
(like blinders, narrowness, and petty infighting)...
----
We were talking about this, and here were our general follow-up thoughts:
- The sci. method and the peer-review process are hundreds of years old and developed during times when methods were not so empirical & specialized, and there wasn't such $$ at stake for discoveries (for people like Euler and Carnot, I think reputation, knowledge, and prestige might have trumped the incentive to cheat, be negligent, and cut corners)
- Obviously we live in a new age now, and as C said the democratizing effects of technology could mitigate the problem (i.e. a grad student found Rogoff's errors when big time editorial staff and top peers didn't), but there are limits to that in hard science because of the cost and specialization of certain expts (though if you need a certain highly skilled postdoc to do a test a particular nuanced way to get a positive result, probably your result is not that robust)
- At least for bio, we thought to develop an independent, confidential auditing lab within the NIH to verify all published results. Journals and authors have to pay a fee to support it, and an article that passes the audit gets a certification that gives it more clout. The author's lab has to send all the materials over to the NIH (or let them use their equipment), with instructions, and the expert NIH staff have to reproduce the result within reasonable variance (they sign NDAs and no-competes so the author has no fear of being scooped). If they can't reproduce, then the paper can still be published at the journal's discretion, but without certification.
- The sci. method and the peer-review process are hundreds of years old and developed during times when methods were not so empirical & specialized, and there wasn't such $$ at stake for discoveries (for people like Euler and Carnot, I think reputation, knowledge, and prestige might have trumped the incentive to cheat, be negligent, and cut corners)
- Obviously we live in a new age now, and as C said the democratizing effects of technology could mitigate the problem (i.e. a grad student found Rogoff's errors when big time editorial staff and top peers didn't), but there are limits to that in hard science because of the cost and specialization of certain expts (though if you need a certain highly skilled postdoc to do a test a particular nuanced way to get a positive result, probably your result is not that robust)
- At least for bio, we thought to develop an independent, confidential auditing lab within the NIH to verify all published results. Journals and authors have to pay a fee to support it, and an article that passes the audit gets a certification that gives it more clout. The author's lab has to send all the materials over to the NIH (or let them use their equipment), with instructions, and the expert NIH staff have to reproduce the result within reasonable variance (they sign NDAs and no-competes so the author has no fear of being scooped). If they can't reproduce, then the paper can still be published at the journal's discretion, but without certification.
----
I would think an algorithm could be developed (or maybe already has?)
that can blindly take in the statistical data from an experiment and
verify the conclusions. Rogoff was a case of bad math not bad data
right? That removes the requirement to reproduce experiments and leaves
you with only the first two cases of lies, damned lies, and
statistics.
The NIH thing is interesting but there is still money
involved which is always problematic. The NIH is not free from politics
since that is its major source of funding. And there are many many
public institutions that don't share the data found through public
dollar funded experiments already. The democratization of technology
doesn't help when all the research is behind a pay wall.
----
My comments are enclosed (in-line)... Also, in case anyone didn't see the full article: http://www.economist.com/ news/briefing/21588057- scientists-think-science-self- correcting-alarming-degree-it- not-trouble
An algo would be cool (like how the IRS algos randomly scan over people's returns and flag errors/possible issues), but then the scientific data has to be formatted and structured according to some standards so the algo can read it properly. And since each article kind of measures different things and employs different significance tests, it could be tricky. Hell, they can't even get healthcare.gov right using 50 contractors (maybe that is their problem!).
Jokes aside, we piss and moan about how bad public
institutions are (and they are of course not free of corruption
either... see the MMS), but as C said, they are our last resort
against a totally for-profit world. We need to strengthen the public
institutions so that they offer a legit alternative to the for-profits,
and then with customer choice it will compel the for-profits to clean up
and stop shafting us so much. I guess that's why the health industrial
complex fought so hard against single payer, which is BY FAR the best
feasible health system in the Western world, warts and all. But with all
the dysfunction in Washington, furloughs, and a reduction in public
worker compensation/respect, it makes it less likely that our best and
brightest will want to go into public service and stay there long enough
to make an impact.
I would think an algorithm could be developed (or maybe already has?) that can blindly take in the statistical data from an experiment and verify the conclusions. Rogoff was a case of bad math not bad data right? That removes the requirement to reproduce experiments and leaves you with only the first two cases of lies, damned lies, and statistics.
It's a good idea; one would think
some of that would be built-in to the tools the scientists and
researchers are using but clearly there is progress to be made...
The NIH thing is interesting but there is still money involved which is always problematic. The NIH is not free from politics since that is its major source of funding.
True... but federal institutions tend to be
bulwarks of trust (certainly to a greater extent than a tobacco company
or The Koch Brothers)...
And there are many many public institutions that don't share the data found through public dollar funded experiments already.
This
is also true... but changing. Even institutions which are known for
being benefactors of the public (e.g. U.C. Berkeley) are getting more
formal about sharing research and ensuring access... so the change is
bound to ripple through to other public institutions too...
The democratization of technology doesn't help when all the research is behind a pay wall.
Yeah, that is slow to
change - but perhaps the most hopeful front. The economics of
information mean is such that librarians and libraries are being
confronted by these questions so the clock is already ticking: if our
engineering library at U.C. Berkeley is getting rid of its books for the
value the physical space has to the college then its probably a matter
of time until someone rationalizes our giving research to private
journals so that they can rip off the campus with subscription fees that
(over time) do not seem to be worth more than the promise of a new
student.
-----
For Rogoff, I think it was a variety of issues, but some of it was
excel formulas pointing to the wrong cells, as well as "improper"
suppression of some data points. So that is mostly human error and poor
judgment, which is hard for an algo to rectify.
An algo would be cool (like how the IRS algos randomly scan over people's returns and flag errors/possible issues), but then the scientific data has to be formatted and structured according to some standards so the algo can read it properly. And since each article kind of measures different things and employs different significance tests, it could be tricky. Hell, they can't even get healthcare.gov right using 50 contractors (maybe that is their problem!).
There are some free or lower cost internet journals out
there... hopefully they can give the heavies a run for their money some
day. But due to the prestige factor, no big names want to "slum it" with
online journals. But I agree that those journals are ripping off
institutions to give then "novel research" that is at least 30% useless
and another 30% incorrect.