Creating a Next Generation Phenotype Library: the health data research UK Phenotype Library

  • Daniel Thayer
  • , Shahzad Mumtaz* (Corresponding Author)
  • , Muhammad Elmessary
  • , Ieuan Scanlon
  • , Artur Zinnurov
  • , Alex-Ioan Coldea
  • , Jack Scanlon
  • , Martin Chapman
  • , Vasa Curcin
  • , Ann John
  • , Marcos DelPozo-Banos
  • , Hannah Davies
  • , Andreas Karwath
  • , Georgios Gkoutos
  • , Natalie Fitzpatrick
  • , Jennifer K Quint
  • , Susheel Varma
  • , Chris Milner
  • , Carla Oliveira
  • , Helen Parkinson
  • Spiros Denaxas, Harry Hemingway, Emily Jefferson
*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

Objective
To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms.

Materials and Methods
We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control. The application has creation and editing functionality, enabling researchers to submit phenotypes directly.

Results
We created and launched the Phenotype Library in October 2021. The platform currently hosts 1049 phenotype definitions defined against 40 health data sources and >200K terms across 16 medical ontologies. We present several case studies demonstrating its utility for supporting and enabling research: the library hosts curated phenotype collections for the BREATHE respiratory health research hub and the Adolescent Mental Health Data Platform, and it is supporting the development of an informatics tool to generate clinical evidence for clinical guideline development groups.

Discussion
This platform makes an impact by being open to all health data users and accepting all appropriate content, as well as implementing key features that have not been widely available, including managing structured metadata, access via an API, and support for computable phenotypes.

Conclusions
We have created the first openly available, programmatically accessible resource enabling the global health research community to store and manage phenotyping algorithms. Removing barriers to describing, sharing, and computing phenotypes will help unleash the potential benefit of health data for patients and the public.
Original languageEnglish
Article numberooae049
Number of pages11
JournalJournal of the American Medical Informatics Association
Volume7
Issue number2
Early online date17 Jun 2024
DOIs
Publication statusPublished - 1 Jul 2024

Bibliographical note

Open Access via the OUP Agreement

Data Availability Statement

All the phenotype definitions are accessible through the HDR UK Phenotype Library (https://phenotypes.healthdatagateway.org/) and code of the library is accessible through the GitHub (https://github.com/SwanseaUniversityMedical/concept-library/tree/master)

Funding

This work was supported by Health Data Research UK, which receives its funding from Health Data Research UK Ltd (NIWA1; G.V.G. and A.K.: HDRUK/CFC/01) funded by the UK Medical Research Council (MRC), Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation, and the Wellcome Trust. In addition, S.D. acknowledges that this study is part of the BigData@Heart programme that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (116074), which receives support from the European Union\u2019s Horizon 2020 research and innovation programme (H2020) and the European Federation of Pharmaceutical Industries and Associations (EFPIA). M.C. and V.C. are supported by the Public Health and Multimorbidity Theme of the National Institute for Health Research\u2019s Applied Research Collaboration (ARC) South London. G.V.G. and A.K. also acknowledge support from the NIHR Birmingham Experimental Cancer Medicine Centre (ECMC), the NIHR Birmingham Surgical Reconstruction Microbiology Research Centre (SRMRC), and the NIHR Birmingham Biomedical Research Centre, as well as Nanocommons H2020 (731032) and an MRC fellowship grant (MR/S003991/1). H.P. acknowledges support from European Molecular Biology Laboratory (EMBL-EBI=core funds).

FundersFunder number
Health Data Research UKHDRUK/CFC/01
Innovative Medicines Initiative 2 Joint Undertaking116074
H2020 European Research Council731032
Medical Research CouncilMR/S003991/1

    UN SDGs

    This output contributes to the following UN Sustainable Development Goals (SDGs)

    1. SDG 3 - Good Health and Well-being
      SDG 3 Good Health and Well-being

    Keywords

    • electronic health records
    • phenotyping
    • public health informatics
    • algorithms
    • application programming interface
    • medical informatics.

    Fingerprint

    Dive into the research topics of 'Creating a Next Generation Phenotype Library: the health data research UK Phenotype Library'. Together they form a unique fingerprint.

    Cite this