How should we evaluate improvement interventions?
Whenever a new way of diagnosing or treating people in healthcare is developed, rigorous evaluation is a key stage in the process. Everyone accepts that new drugs should be subject to careful assessment of their effectiveness, acceptability and cost. As Karolina Kuberska argued in an earlier blog in this series, evaluation is also critical in determining whether novel approaches to improving really do live up to their promise. Well-meaning improvement efforts may not always work as intended; occasionally they can even do harm. It’s important to consider, for example, whether improvement interventions address the priorities and needs of NHS patients, carers and staff, whether they work better or worse in different circumstances, whether they are a good use of finite healthcare resources. But exactly what does that mean for how we go about evaluating improvement?
One important issue is whether the methods used to evaluate new diagnostic or therapeutic interventions—a new drug, for example, or a new machine—are transferable to the field of improvement.
Evaluating improvement interventions: of snails and evangelists
The approach to evaluating new medicines that is usually seen as best is the randomised controlled trial, or RCT. Trials provide a means of comparing the impact of an intervention with what would have happened in its absence. They are viewed as the gold standard for good reasons: they go a long way to address potential biases in methods, and allow new interventions to be compared with what’s already being done. But they are also complicated, costly and time-consuming.
Whether RCTs, and similar experimental approaches, are also the right approach when evaluating improvement efforts is a question that has divided the improvement studies community for some time. Frank Davidoff, and later Robert Burke and Kaveh Shojania, characterise this division as a difference between ‘evangelists’ and ‘snails’ (an odd pairing, first used to describe similarly trenchant views about how best to evaluate screening interventions in the 1970s). Snails would answer ‘yes’: improvement interventions are not so very different from other healthcare interventions, they are competing for resources in the same system, and so we should hold them to the same standards of evidence before widespread adoption. Evangelists would say ‘no’: improvement is different; its potential to cause harm is lower; and improvement interventions are better understood and evaluated through alternative methods.
It’s easy to see merit in both sides of the argument. As the snails would note, the history of healthcare improvement is littered with examples of promising interventions that, when subjected to the most rigorous form of evaluation through RCTs or similar study designs, turn out to have made little difference to the outcomes that matter.
the more complex … and the more sensitive to context, the trickier it becomes to do an RCT
On the other hand, as evangelists would reply, improvement interventions are self-evidently very different beasts from new pharmaceutical treatments, or even many new diagnostic innovations. Improvement interventions are complex interventions, typically involving many components, different people, lots of moving parts. They are context-dependent: they will work better in some places than others. By design, they often involve iteration, with the people involved improving and adapting the intervention as they go along. These features of improvement efforts arguably make them ill-suited to evaluation through RCTs, which tend to work best with static, well defined interventions that are little affected by context (who’s doing them, how and where). The more complex the improvement intervention, and the more sensitive to context, the trickier it becomes to do an RCT—and very quickly an RCT approach can start to become unwieldy.
Beyond RCTs
More than this, though, RCTs don’t tell you everything. A result from an RCT will typically tell you whether, on average across the populations and contexts covered, an intervention is better, worse, or roughly equivalent to an alternative for some key outcomes of interest. What it doesn’t tell you is why and how an intervention worked, or didn’t. That may not matter so much for a medicine, but for improvement interventions—where things like the people involved can make such a crucial difference—it is vital.
process evaluations provide important additional knowledge—and are an important part of any evaluation
For complex interventions of this kind, process evaluations provide important additional knowledge—and are an important part of any evaluation, not just trial-based studies. The knowledge generated by process evaluations can be particularly important for improvement work, providing deep insights into mechanisms of action, contextual influences, how the programme played out in practice rather than in design, and how interventions were delivered and experienced by those implementing and receiving them. Process evaluation can also generate important learning about promising interventions that don’t quite make the grade when it comes to demonstrating their effectiveness according to the high statistical standards of proof of an RCT, but which don’t deserve to be discarded. Take, for example, the Patient Reporting and Action for a Safe Environment intervention (see box).
What about locally led improvement initiatives?
Complex, mixed-methods approaches of this kind, combining experimental approaches, process evaluation and health economics, are now common, and are increasingly seen as the best way to evaluate complex interventions before they are rolled out nationally. While this is welcome, exhortations to do evaluation, and do it well, can be daunting. Much improvement work is led locally, by small teams with tight resources and limited access to evaluation expertise. Thorough process evaluation, let alone a fully fledged RCT, may be well beyond the capacity of such teams. But the improvement work they do can make an important contribution to the quality of care provided in the health service—and as such, it deserves evaluation, so that other people in the system can learn from it, replicate it, and adapt it elsewhere.
There is plenty of scope to produce useful knowledge from well executed studies that don’t follow the RCT model, drawing on approaches such as before-and-after studies and interrupted time-series analyses. There’s also scope to adapt the principles of process evaluation to small-scale improvement work. But there is limited agreement on what ‘good enough’ looks like in these situations: for example, what should always be included, and what is nice-to-have but not essential. Clifford Ko’s PhD research, currently in progress at THIS Institute, seeks to address this problem, with a view to developing guidance that can help to guide rigorous, useful evaluation for improvers in all settings.
In seeking to evaluate improvement interventions, then, it is vital not to allow the perfect to be the enemy of the good. Large multi-centre trials, with accompanying process and economic evaluations, have an important place in health services research—and in improvement research, especially when seeking to scale up successful approaches to broader settings. But ensuring that the effort put into improvement at a local level is matched by proportionate efforts to evaluate it is also a crucial way of identifying what works, learning from experience, and ensuring that time invested in improvement bears fruit for patients, carers and staff.