The cardiovascular phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying machine learning to the prediction of cardiovascular comorbidities

Vasilis Nikolaou* (Corresponding Author), Sebastiano Massaro, Wolfgang Garn, Masoud Fakhimi, Lampros Stergioulas, David Price

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
3 Downloads (Pure)


Background Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous group of lung conditions that are challenging to diagnose and treat. As the presence of comorbidities often exacerbates this scenario, the characterization of patients with COPD and cardiovascular comorbidities may allow early intervention and improve disease management and care. Methods We analysed a 4-year observational cohort of 6883 UK patients who were ultimately diagnosed with COPD and at least one cardiovascular comorbidity. The cohort was extracted from the UK Royal College of General Practitioners and Surveillance Centre database. The COPD phenotypes were identified prior to diagnosis and their reproducibility was assessed following COPD diagnosis. We then developed four classifiers for predicting cardiovascular comorbidities. Results Three subtypes of the COPD cardiovascular phenotype were identified prior to diagnosis. Phenotype A was characterised by a higher prevalence of severe COPD, emphysema, hypertension. Phenotype B was characterised by a larger male majority, a lower prevalence of hypertension, the highest prevalence of the other cardiovascular comorbidities, and diabetes. Finally, phenotype C was characterised by universal hypertension, a higher prevalence of mild COPD and the low prevalence of COPD exacerbations. These phenotypes were reproduced after diagnosis with 92% accuracy. The random forest model was highly accurate for predicting hypertension while ruling out less prevalent comorbidities. Conclusions This study identified three subtypes of the COPD cardiovascular phenotype that may generalize to other populations. Among the four models tested, the random forest classifier was the most accurate at predicting cardiovascular comorbidities in COPD patients with the cardiovascular phenotype.
Original languageEnglish
Article number106528
Number of pages8
JournalRespiratory Medicine
Early online date7 Jul 2021
Publication statusPublished - 30 Sept 2021

Bibliographical note

We would like to thank patients for allowing their data to be used for surveillance and research, General Practitioners who agreed to be part of the RCGP RSC and allowed us to extract and use health data for surveillance and research, Ms. Filipa Ferreira from RCGP, Mr. Julian Sherlock from the University of Surrey, Apollo Medical Systems for data extraction, collaborators with EMIS, TPP, In-Practice and Micro-Test CMR suppliers for facilitating data extraction, and colleagues at Public Health England.


  • Cardiovascular subtypes
  • Machine learning
  • Cluster analysis
  • Random forest


Dive into the research topics of 'The cardiovascular phenotype of Chronic Obstructive Pulmonary Disease (COPD): Applying machine learning to the prediction of cardiovascular comorbidities'. Together they form a unique fingerprint.

Cite this