Double-blind study

“a double-blind study,” photograph by Casey Holford

On October 1 the science section of the New York Times ran two articles next to each other. One of them describes a recent study that concluded young children at play display behaviors similar to those of scientists, suggesting scientific inquiry is driven by human instinct. The other refers to the alarming extent to which that human instinct muddies scientific inquiry along the way.

Recently the scientific community has dealt with controversies cascading across many areas of research.  Most of them relate to a phenomenon known as publication bias.  Put simply, publication bias occurs when research journals prioritize studies with thought-provoking—and at the very least statistically significant—results. This makes sense; it’s hard to get excited about studies that don’t show anything conclusive. We crave good stories, stunning breakthroughs, and world-changing discoveries. Such desire has driven scientific (and artistic) innovation throughout history.

The dark underbelly of this lust for meaning, however, is something called “significance chasing.” Researchers know their chances of getting published – and advancing their professional status – hinge on getting statistically significant results.  They have a huge incentive to hunt for and read into anomalies in data – raising the possibility of over-interpreting those anomalies as due to something other than chance. An article in the journal Psychological Science illustrates this point eerily well.  As the authors point out,

It is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields ‘statistical significance,’ and then to report only what ‘worked’… This exploratory behavior is not the by-product of malicious intent, but rather the result of two factors: (a) ambiguity in how best to make these decisions and (b) the researcher’s desire to find a statistically significant result.

To compound the problem, many researchers do not openly share their full data sets or calculation methods, and have few incentives to challenge one another’s findings.  The Psychological Science article hammers the former point home with a simulated experiment that “shows” listening to a Beatles song makes you older.  That’s hooey, of course, but the authors’ point is that without stricter guidelines around how data sets are reported, nearly any relationship can be presented as statistically significant.

How big of a problem is this? In the medical community it has raised frightening questions about cancer studies that had been the basis for new treatments. It has caused an increase in the number of retractions issued in high-profile scientific journals – and a blog devoted to tracking them. And lest you think this concern is limited to the “hard” sciences, think again – it has already raised discussions of implications in humanitarian aid and in the more mainstream business community (the latter summing things up nicely with a headline, “Why You Can’t Trust Any of the Research You Read”).


The idea that the scientific method is easily mucked up opens up a whole host of mind-bending questions. (What if there’s a publication bias toward studies about publication bias?  Eeek…). It forces us to stop and think about the fledgling world of arts research – a world that has desperately wanted to find good, hard scientific evidence of impact for a long time. Randomized controlled trials, double-blind studies and other sophisticated research methods seemed like a holy grail, promising that if we could cleverly adapt them to meet our needs, we would have indisputable evidence of the importance of the arts, and good, hard data to guide how we direct our resources. In light of these controversies, should we question our desire to be better researchers?

No – but we should learn from others’ mistakes, and take a hard look at institutional issues common across our fields. Many of the problems the scientific community is experiencing aren’t about the tools scientists have at their disposal, but the cultures in which those tools are used. A few months ago the editors of two high-profile medical journals, Drs. Ferric Fang and Arturo Casadevall, put out a call for “structural reforms” to combat a “hypercompetitive” and “insecure” working environment they believe to be the heart of the issue. The structural flaws they identify include inadequate resources, a “leaky pipeline” of emerging talent, agenda-driven funding and administrative bloat.

Sound familiar?

The long-term implications on all research communities will unfold over time. Many of Fang and Casadevall’s recommendations are similar to those made within our own field: directing more funding toward salary support to increase job stability, streamlining grant application and reporting processes, and examining the strengths and weaknesses of peer grant review. A number of other ideas have been floated that may change established research practices. Creating a “journal of good questions” that decides which studies to publish before their results are known would reward researchers for their curiosity and the strength of their proposed methodology. Limiting the “degrees of freedom” researchers have in gathering additional data if their original data set does not yield anything “interesting” would limit significance chasing and, in theory, create a culture more tolerant of inconclusive results.

Regardless of which, if any, of these ideas stick, we need to acknowledge two things: a) our research is in all likelihood as prone, if not more prone, to these problems as the “hard sciences,” and b) the “best practices” we have been trying to emulate are not “fixed practices.” It’s often said that what arts researchers seek to measure is too squishy to fit into the traditional scientific process. If more and more people are realizing the process has a squish of its own – well then, maybe we don’t need to play “catch up” so much as try new things.

We may even come up with ideas useful to the more “established” fields we have been trying to emulate. The authors of the study in the first (less depressing) New York Times article concluded the preschoolers they observed behaved like scientists because they “form[ed] hypotheses, [ran] experiments, calculat[ed] probabilities and decipher[ed] causal relationships about the world.” I suspect that a group of arts researchers, observing the same group of children, would have interpreted those same behaviors as artistic. Human instinct drives scientific inquiry and artistic inquiry, and muddies both. Artists, one could argue, are a little more used to the mud.

  • I think it’s interesting that this was published the same week that many are celebrating the remarkable performance of Nate Silver’s and others’ quantitative political science models in predicting the outcome of Tuesday’s election. (The timing is a coincidence; Talia’s been working on this post for a few weeks.) These are complex issues, and I agree with Talia that, just as it’s a mistake to trust blindly in published research (a position that we’ve taken for a long time at Createquity), it’s equally a mistake to squeeze the concerns above into a narrative about how you can’t trust research at all. It’s helpful to remember that we in the arts often draw conclusions about the broader world from a survey here or there of hundreds of participants that may not even use a random sample. Silver’s database, by contrast, contains thousands of such surveys, most of which are reasonably comparable to each other on at least several dimensions. To me, this is all reinforcement that the context of research matters just as much as the findings, and that a skillful and honest treatment of uncertainty is the key to increasing knowledge.