The jury is in!  The weight of science has been thrown down on yet another study and now we can all safely rest at night.  We can all look our dentists right in the eye (or should it be mouth?) when he asks have we flossed and we can say “Nope, science says we don’t need to!

But it is worth asking just why is everyone sure that this is the right thing to do?  Haven’t scientific studies been wrong before?  Wasn’t there science behind the recommendation to floss? Just how long will it be until a new study overturns the old one?  And is it really true flossing has no benefit (outside the profits for the manufacturers)?

Let me take a stab at addressing the first questions immediately and deferring the benefits of flossing until later.

Perhaps the best place to start is by discussing a thought-provoking article entitled Scientific Regress from the May 2016 edition of First Things.  In that piece, William A. Wilson, a software engineer, gives a nice summary and analysis of the state of modern science.  Not the state of its discoveries or knowledge base but the state of what it knows about itself and how it knows what it knows to be true.  In other words, Wilson gives a meta-analysis of the state of the scientific method and its corresponding epistemology.

Of course, the central notion to any of the scientific enterprises is the idea that what happens in the hear-and-now is applicable to the there-and-then.  Without that basic premise, science would nothing more than a set of anecdotes starting with “I swear that I witnessed…”.  It is vital that scientific claims are verifiable.  Repeating an experiment, which should be done often, should give rise to the same results and the same conclusions.  After all, that is the underlying mechanism by which scientific discoveries become technological breakthroughs.  Think what would happen if the original experiment that established the proof-of-concept of the solid state transistor were a one-off.  Goodbye cellphone, household computers, the internet, inexpensive televisions and radios, and hosts of other modern-day goodies.

And yet, the picture that Wilson paints about modern scientific explorations shows a system that is seriously flawed.  He cites the efforts of the Open Science Collaboration (OSC) that tried to replicate 100 published psychology experiments taken from three of the most prestigious journals of the field.  (I’ll have occasion to revisit the notion of prestigious a bit later).

According to the article, OSC found that, in 65 cases, they could not replicate the positive results that were reported in these so-called scientific studies.  In addition, they found that a bulk of the remaining 35 cases were marginal in that their positive results were not nearly as statistically significant as first claimed.

And while other disciplines are not plagued with irreproducibility of this magnitude, there are still many cases where the results of given experiments can’t be corroborated by other groups.

So what is behind this lack of reproducibility? According to Wilson, the answer lies in one of two areas.

First is the possibility that there is a set of confounding variables in the experiment – conditions that need to be controlled but are not recognized as such. For example, if temperature were important in a study of perceptual psychology but was never imagined to be so then the study authors may not report the temperature and therefore their experiment would not be exactly reproduced.  This explanation would be one-part blessing and one-part curse as the presence of such an effect would reveal layers of reality unknown to this point but would make it hard to ever replicate an experiment.  Of course, this kind of thing happens, but to occur with a frequency high enough to explain the OSC results strains credulity.

The second, and more likely reason, is that the original conclusions of the study are simply wrong.  Three possibilities here; what I will call statistical false alarms, group think (what Bacon & Bacon call either Authority or Idols of the Theater), and downright fakery.

The most reassuring one is the statistical false alarms.  There is a nice Bayesian argument that Wilson attributes to John Ioannidis that argues that many scientific studies must be wrong.  The argument goes something like this.  Suppose you have a gem detector that has an accuracy of 95%, which means that if you know it is hovering over a gem it will alert the user positively most of the time and, conversely, if it is hovering over a piece of glass it will not react most of the time.  Armed with this good detector, you then journey out to a field filled with pieces of glass and an occasional gem dropped into the mix.  Since you don’t know which is which and the population of the gems within the greater population of useless baubles is very small, the actual probability of getting a false alarm can be very high.  Details on this kind of argument can be found in an earlier column on Bayesian analysis.  Now consider the gem or useless bauble as a positive scientific discovery or null result, respectively, and the usual machinery of statistical inference to a 95% confidence as the gem detector and you are left to conclude that the possibilities of new discoveries reported in a study bearing up under scrutiny are rather low.

But isn’t this how science feels its way forward?  Isn’t this how we progress?  The answer to these queries is actually a guarded yes.  Our tolerance to being wrong shouldn’t blind us to wrong-doing.  Some of that wrong-doing is subtle and often due to the perverse incentives that we, as a society, has heaped on the scientific enterprise.

Consider the possibility of group think in research.  Positive results receive huge press releases, the promise of fame and fortune, and a huge boost to the intellectual pride.  Articles based on ‘breakthroughs’ aim for large impact factors and all but guarantee tenured positions and speaking circuits. Negative results, often far more important in the scheme of things, are judged to be not worth reporting.  This despite the fact that as Edison famously said about ‘failure’:

Negative results are just what I want.  They’re just as valuable to me as positive results.  I can never find the thing that does the job best until I find ones that don’t.

Far worse than this institutionalized self-delusion is the case of outright fakery.  As the recent spate of scandals indicate, it is all too common – and not just in the soft sciences.  It is standard to see complaints of plagiarism in the physical sciences (especially from India and China).  Reports abound of authors gaming of the peer-review system by creating aliases for themselves that give them the ability to vet their own articles.

Far worse than any of these offenses is the downright cheating that is becoming all-too common.  There have been amazing examples of the wool being pulled over the collective eyes of peer review.  One of my favorites is narrated in the book Plastic Fantastic, by Eugenie Samuel, which narrates the flummery of one Jans Hendrik Schön a German physicist who became a famous figure in condensed matter physics with his breakthrough results.  He was given two prestigious awards and published in the prestigious journals of Science and Nature (there is that adjective again) before the house of cards came tumbling down when it was determine that he hoodwinked ‘smart’ people for years with amazing but made-up results.  Papers were retracted from many prestigious journals, proving that prestige is more perception than anything else.  Schön’s story is hardly an isolated case and some simple searches can find lots of examples of academic foul play.  There have always been cheats in science but the current system encourages it to a degree never seen before – after all there is big money in and power in science.

So what to do when a new study comes out?  For the most part, take it with a grain of salt, especially if it is predominantly populated by statistical analysis using hypothesis testing with no clear physical mechanism to explain the results.  As for flossing – well I will continue to do it (even though I would love to stop).  The reason being that I have actually experimented with flossing and not flossing and have found that my mouth feels better, I breathe better, and my teeth have fewer complaints (pain, cavities, etc.) for the dentist.  I suspect that the reason that my results are in conflict with this study is that all of human kind can’t be summarized by a ‘representative sample’ and that something about me, whether it is nature or nurture, sets into one of those small populations that Bayesian analysis warns us about.