What exactly is a “failed experiment”?

In non-scientific life, an experiment is when we try something out to see what happens. A failure, for such experiments,  is when we don’t get the outcomes we want. Legalisation of soft drugs for example, would probably be reported in the press as a “bold experiment”. This doesn’t mean that it would actually be an experiment in any sort of formal sense. It just means that we don’t know for sure what will happen when we do it.

In a scientific sense, experiments are more precisely defined. In an experiment we are trying to distinguish between competing explanations of the world. We may expect one particular explanation to be correct, but it isn’t necessarily a failure if we turn out to be wrong. Either way, we may acquire knowledge. There is still such a thing as a failed experiment though. Since the goal of an experiment is to distinguish between hypotheses, we may get to the end of the experiment and still not have managed to do this. The only knowledge we gain is that the experiment wasn’t good enough.

Let’s take, for example, the search for the Higgs Boson. In a formal sense, the overall search is not an experiment. If we believe that the Higgs exists, as the Standard Model predicts, then the search for the Higgs is a  search for confirmation, rather than trying to falsify any particular hypothesis. We could search for a long time and still not have disproven the existance of our shy God Particle. However, each individual experiment within the program is set up to be able to state confidently that the Higgs does not appear at particular energy levels. So CERN is generating new knowledge whenever they learn with confidence that a particular version of the Higgs isn’t there.

Contrast this with those unlawful hooligans and now almost certainly non-existent particles, the faster-than-light neutrinos. If it turns out that the measurements are untrustworthy due to a cable connection, this is a failed experiment. We have learnt nothing except to take better care with cables. If it turns out to be something more subtle, we may advance our experimental methods so that future findings can be more certain and less error-prone.

Concern about failed experiments entered my life during a recent grant application. I was designing an experiment to test whether Fault Tree Analysis (FTA) produced trustworthy results. When it came to discussing the size of the experiment and the types of controls, I realised that my experiment had three outcomes. The expected outcome was that FTA would prove not to be trustworthy. This would justify my reasons for wanting to conduct the experiment, since many people place faith in FTA. A negative result would not find the untrustworthiness that I was looking for. This would still be knowledge, even though it is a negative outcome. I would have given FTA a solid workout, and confirmed the faith that is placed in it. The third outcome is that my results would be within the margin of error, or confounded by factors in the experiment design – ie, I could not draw positive conclusions either way. This would be a true failure, the chance of which could be reduced by increasing the size and cost of the experiment.

Not all experiments worry about margin of error. In medical drug trials, for instance, failure to show effectiveness above a certain significance factor is considered the same as not showing effectiveness at all. This does not confirm the null hypothesis, that the drug has no effect, but the research community is far more concerned with finding definitely effective drugs than in certain knowledge about ineffective ones. This does not mean that medical trials cannot fail – it is still possible to discover flaws which invalidate the knowledge gained. For example, if the dropout rates of experimental and control groups differ beyond what the experiment allows for, the original statistical design is broken and the experiment has failed.

As a general rule, there is a trade-off between experiment cost and the chance of failure from known causes. A bigger experiment has better power to detect differences between groups and samples, and can put in place more controls to avoid confounding factors. One big experiment which succeeds provides much more certain knowledge than many small experiments which are unable to distinguish between different possibilities. It is often possible to ask in advance whether an experiment has the power to find the effect it is looking for. On the other hand, there are always sources of failure which can ruin even well designed experiments. A poorly connected cable on a particle detector is a very expensive and embarrassing way not to learn anything.