Defining clinical subtypes of adult asthma using electronic health records: analysis of a large UK primary care database with external validation

Elsie MF Horne* (Corresponding Author), Susannah Mclean, Mohammad A Alsallakh, Gwyneth A Davies, David Price, Aziz Sheikh, Athanasios Tsanas

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Downloads (Pure)


Asthma is one of the commonest chronic conditions in the world. Subtypes of asthma have been defined, typically from clinical datasets on small, well-characterised subpopulations of asthma patients. We sought to define asthma subtypes from large longitudinal primary care electronic health records (EHRs) using cluster analysis.
In this retrospective cohort study, we extracted asthma subpopulations from the Optimum Patient Care Research Database (OPCRD) to robustly train and test algorithms, and externally validated findings in the Secure Anonymised Information Linkage (SAIL) Databank. In both databases, we identified adults with an asthma diagnosis code recorded in the three years prior to an index date. Train and test datasets were selected from OPCRD using an index date of Jan 1, 2016. Two internal validation datasets were selected from OPCRD using
index dates of Jan 1, 2017 and 2018. Three external validation datasets were selected from SAIL using index dates of Jan 1, 2016, 2017 and 2018. Each dataset comprised 50,000 randomly selected non-overlapping patients. Subtypes were defined by applying multiple correspondence analysis and k-means cluster analysis to the train dataset, and were validated in the internal and external validation datasets.

We defined six asthma subtypes with clear clinical interpretability: low inhaled
corticosteroid (ICS) use and low healthcare utilisation (30% of patients); low-to-medium ICS use (36%); low-to-medium ICS use and comorbidities (12%); varied ICS use and comorbid chronic obstructive pulmonary disease (4%); high (10%) and very high ICS use (7%). The subtypes were replicated with high accuracy in internal (91-92%) and external (84-86%) datasets.
Asthma subtypes derived and validated in large independent EHR databases were primarily defined by level of ICS use, level of healthcare use, and presence of comorbidities. This has important clinical implications towards defining asthma subtypes, facilitating patient stratification, and developing more personalised monitoring and treatment strategies.
Original languageEnglish
Article number104942
Number of pages11
JournalInternational Journal of Medical Informatics
Early online date16 Dec 2022
Publication statusPublished - 1 Feb 2023

Bibliographical note

EMFH was supported by a Medical Research Council PhD Studentship (eHERC/Farr). This work is carried out with the support of the Asthma UK Centre for Applied Research [AUKAC-2012-01] and Health Data Research UK which receives its funding from HDR UK Ltd (HDR-5012) funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and the Wellcome Trust. The funders had no role in the study and the decision to submit this work to be considered for publication. This Project is based in part/wholly on Data from the Optimum Patient Care Research Database ( obtained under licence from Optimum Patient Care Limited and its execution is approved by recognised experts affiliated to the Respiratory Effectiveness Group. However, the interpretation and conclusion contained in this report are those of the author/s alone.
This study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL) Databank. We would like to acknowledge all the data providers who make anonymised data available for research. SAIL is not responsible for the interpretation of these data.


  • Asthma
  • Electronic health records
  • Cluster Analysis


Dive into the research topics of 'Defining clinical subtypes of adult asthma using electronic health records: analysis of a large UK primary care database with external validation'. Together they form a unique fingerprint.

Cite this