Skip to content

Dr Anoop D. Shah

Area of study
Data and AI
Fellowship level
Year awarded
Host university
Institute of Health Informatics
University College London
Anoop is a clinical research fellow and consultant in clinical pharmacology and general medicine working at University College London Hospitals.
View profile

Improving diagnosis recording for better patient care: case study in heart failure using natural language processing (project complete)


The ability to record diagnosis in a detailed, accurate way is essential for both clinical care and research about healthcare. But  current electronic health record (EHR) systems store much of this information in free text.

Free-text EHRs can be analysed using natural language processing (NLP) to extract information for research purposes. But for the purpose of supporting safe clinical decision making, features of patients’ diagnoses need to be organised in an agreed way, according to an information model.

Standardised diagnosis information models can help to ensure consistent care throughout the NHS and reduce the need for duplicate data collection for audit or research.


This project aims to generate an evidence base for generalisable improvements in diagnosis recording in the NHS by applying natural language processing methods, with a focus on heart failure as a clinical example.

Patient journeys will be constructed from initial symptoms to detailed diagnosis and evaluated on how well proposed information models accommodate information that currently exists only in text.

Then, the study will develop information models and recommendations for systems. The proposed models will be evaluated in pilot implementations such as a local heart failure clinic.

The overall learning from the process will be disseminated through academic publications and via clinical academic networks, aiming to engage specialist societies to develop models for recording diagnoses in their domains.

Watch and listen to find out more about Anoop’s fellowship project

Anoop gave a lightning talk about his research project at our 2020 annual event, THIS Space.

Research articles

Shah, A. D. et al. (2019) Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death. Journal of Biomedical Semantics. 

Shah, A. D. et al. (2019) Recording problems and diagnoses in clinical care: developing guidance for healthcare professionals and system designers. BMJ Health & Care Informatics.

Shah, A. D. et al. (2021) Data gaps in electronic health record (EHR) systems: An audit of problem list completeness during the COVID-19 pandemic.  International Journal of Medical Informatics.

Shah, A. D. et al. (2022) EPH34 Long COVID Symptoms and Diagnosis in Primary Care: A Cohort Study Using the Thin Database Including Unstructured Text. Value in Health Journal.

Shah, A. D. et al. (2023) Translating and evaluating historic phenotyping algorithms using SNOMED CT. Journal of the American Medical Informatics Association. 

Shah, A. D. et al. (2023) Long Covid symptoms and diagnosis in primary care: a cohort study using structured and unstructured data in The Health Improvement Network primary care database. medRxiv.

Sign up to receive the latest news, reports and articles from THIS Institute.