Improving healthcare service assessment using generative AI

14 February 2025

Share

Background

To make sure patients receive the best care, we need to understand healthcare quality. This can be measured using statistics, which help identify patterns but often lack deeper context. Qualitative data like interviews and written feedback can provide valuable insights into patient experiences and perspectives, adding context that makes statistical findings more meaningful. However, analysing qualitative data is slow, requires a lot of effort, and is hard to scale up.

Recently, artificial intelligence (AI) tools have shown promise in analysing qualitative data, which potentially gives us new ways to better understand healthcare quality. We wanted to use AI tools to look at how healthcare services are evaluated. We’re focusing on a programme called iTHRIVE, which aims to improve child and adolescent mental health services (CAMHS).

Approach

We will be looking at how generative AI tools – specifically Large Language Models (LLMs), could be used to help evaluate healthcare services. These tools can understand the meaning and context of written text, making them valuable for analysing large amounts of qualitative data. We want to find out if LLMs can speed up this analysis while still delivering accurate and meaningful insights. This will make it easier to analyse qualitative data and increase its value.

Our study will focus on data from a national evaluation of the i-THRIVE programme, which includes interviews, notes, and policy documents that were previously analysed by experts using a detailed assessment tool.

First, we will review available LLMs.
- We will choose one that protects patient data privacy, prevents unauthorised access or manipulation, consistently generates accurate insights, and minimises hallucinations (incorrect or misleading results that AI models generate).
Next, we will re-analyse the i-THRIVE data using this LLM.
- We will test different analysis strategies, such as fine-tuning the LLM (using text data specific to the task that we ask it to perform) and refining the way questions (or “prompts”) are asked to get the best results. We will also try using techniques like combining the LLM with external sources to make it more accurate.
- In this project we want the LLM output to include relevant interview quotes, and so we’ll use a technique called retrieval augmented generation where the output consists of a combination of text produced by the LLM and references to (or quotes from) existing documents that are relevant for the task at hand. This keeps the LLM response grounded in the existing documents and prevents hallucinations.
Throughout the project, we will focus on reducing LLM mistakes, avoiding bias, and preventing it from generating misleading results.
Finally, we will create generative AI tools that produce detailed feedback reports for CAMHS sites.