Desiderata for the development of next-generation electronic health record phenotype libraries

  • Shahzad Mumtaz
  • , Luke V Rasmussen
  • , Andreas Karwath
  • , Georgios V Gkoutos
  • , Chuang Gao
  • , Dan Thayer
  • , Jennifer A Pacheco
  • , Helen Parkinson
  • , Rachel L Richesson
  • , Emily Jefferson
  • , Spiros Denaxas
  • , Vasa Curcin
  • , Martin Chapman* (Corresponding Author)
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

24 Citations (Scopus)
6 Downloads (Pure)

Abstract

BACKGROUND: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling.

METHODS: A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices.

RESULTS: We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing.

CONCLUSIONS: There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.

Original languageEnglish
Article numbergiab059
Number of pages13
JournalGigaScience
Volume10
Issue number9
Early online date11 Sept 2021
DOIs
Publication statusPublished - 11 Sept 2021

Bibliographical note

This work was supported by Health Data Research UK, which receives its funding from Health Data Research UK Ltd (NIWA1; G.V.G. and A.K.: HDRUK/CFC/01) funded by the UK Medical Research Council (MRC), Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation, and the Wellcome Trust. In addition, S.D. acknowledges that this study is part of the BigData@Heart programme that has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (116074), which receives support from the European Union’s Horizon 2020 research and innovation programme (H2020) and the European Federation of Pharmaceutical Industries and Associations (EFPIA). M.C. and V.C. are supported by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Guy’s and St Thomas’ National Health Service Foundation Trust and King’s College London, and the Public Health and Multimorbidity Theme of the National Institute for Health Research’s Applied Research Collaboration (ARC) South London. G.V.G. and A.K. also acknowledge support from the NIHR Birmingham Experimental Cancer Medicine Centre (ECMC), the NIHR Birmingham Surgical Reconstruction Microbiology Research Centre (SRMRC), and the NIHR Birmingham Biomedical Research Centre, as well as Nanocommons H2020 (731032) and an MRC fellowship grant (MR/S003991/1). H.P. acknowledges support from European Molecular Biology Laboratory (EMBL) core funds. L.V.R. and J.A.P. acknowledge support from the National Institute of General Medical Sciences (R01GM105688) and the National Human Genome Research Institute (U01HG011169). The opinions in this article are those of the authors and do not necessarily reflect the opinions of the funders.

Keywords

  • Electronic Health Records
  • Humans
  • Phenotype
  • Reproducibility of Results
  • phenotype library
  • EHR-based phenotyping
  • electronic health records
  • computable phenotype

Fingerprint

Dive into the research topics of 'Desiderata for the development of next-generation electronic health record phenotype libraries'. Together they form a unique fingerprint.

Cite this