Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project

Evelyn Crowley; Shaun Treweek; Katie Banister; Suzanne Breeman; Lynda Constable; Seonaidh Cotton; Anne Duncan; Adel El Feky; Heidi Gardner; Kirsteen  Goodman; Doris Lanz; Alison McDonald; Emma Ogburn; Kath Starr; Natasha Stevens; Marie  Valente; Gordon Fernie

doi:10.1186/s13063-020-04388-x

Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project

Evelyn Crowley, Shaun Treweek^* (Corresponding Author), Katie Banister, Suzanne Breeman, Lynda Constable, Seonaidh Cotton, Anne Duncan, Adel El Feky, Heidi Gardner, Kirsteen Goodman , Doris Lanz, Alison McDonald, Emma Ogburn, Kath Starr, Natasha Stevens, Marie Valente, Gordon Fernie

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

5 Downloads (Pure)

Abstract

Background: Data collection consumes a large proportion of clinical trial resources. Each data item requires time and effort for collection, processing and quality control procedures. In general, more data equals a heavier burden
for trial staff and participants. It is also likely to increase costs. Knowing the types of data being collected, and in what proportion, will be helpful to ensure that limited trial resources and participant goodwill are used wisely.
Aim: The aim of this study is to categorise the types of data collected across a broad range of trials and assess what proportion of collected data each category represents.
Methods: We developed a standard operating procedure to categorise data into primary outcome, secondary outcome and 15 other categories. We categorised all variables collected on trial data collection forms from 18,
mainly publicly funded, randomised superiority trials, including trials of an investigational medicinal product and complex interventions. Categorisation was done independently in pairs: one person having in-depth knowledge of
the trial, the other independent of the trial. Disagreement was resolved through reference to the trial protocol and discussion, with the project team being consulted if necessary.
Key results: Primary outcome data accounted for 5.0% (median)/11.2% (mean) of all data items collected.
Secondary outcomes accounted for 39.9% (median)/42.5% (mean) of all data items. Non-outcome data such as participant identifiers and demographic data represented 32.4% (median)/36.5% (mean) of all data items collected.
Conclusion: A small proportion of the data collected in our sample of 18 trials was related to the primary outcome.
Secondary outcomes accounted for eight times the volume of data as the primary outcome. A substantial amount of data collection is not related to trial outcomes. Trialists should work to make sure that the data they collect are only those essential to support the health and treatment decisions of those whom the trial is designed to inform.

Original language	English
Article number	535
Pages (from-to)	535
Number of pages	10
Journal	Trials
Volume	21
Issue number	1
DOIs	https://doi.org/10.1186/s13063-020-04388-x
Publication status	Published - 16 Jun 2020

Bibliographical note

We would like to thank Joanne Palmer and all attendees of the 2015 workshop at the UK Trial Managers’ Network meeting. We thank all Chief Investigators of the trials in our sample for giving their permission to use their trial data collections forms in our analysis: Annie S Anderson (ActWELL), Doreen McClurg (AMBER), Charles Knowles (CONFIDeNT), Augusto AzuaraBlanco (EAGLE), Frank Sullivan (ECLS), Shakila Thangaratinam (EMPiRE), Kevin Cooper (HEALTH), Eugene Dempsey (HIP), Craig Ramsay (iQUAD), Ian Reid (KANECT), David Murray (KAT), Saruban Pasu (PIMS), Khalid S Khan (SALVO), Robert Pickard (SUSPEND), Anthony King (TAGS), Graham Devereux (TWICS), Adrian R Martineau (ViDiFlu), Charis Glazener (VUE). Similarly, we thank the funders of all the trials: Chief Scientist Office (CSO), Scottish Government Health Directorate; CSO, Scottish Government and Oncimmune Ltd; European Commission within the Seventh Framework Programme; National Institute for Health Research - Efficacy and Mechanism Evaluation (NIHR-EME); National Institute for Health Research - Health Technology Assessment (NIHR-HTA) programme; National Institute for Health Research - Programme Grants for Applied Research (NIHR-PGAR); the Scottish Government. The Health Services Research Unit, University of Aberdeen, receives core funding from the CSO of the Scottish Government Health Directorates.

Keywords

PROTOCOL
IMPACT

Access to Document

10.1186/s13063-020-04388-xLicence: CC BY

Crowley_et_al_Trial_UsingSistematicData_VoR
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Final published version, 1.45 MBLicence: CC BY

Cite this

Crowley, E., Treweek, S., Banister, K., Breeman, S., Constable, L., Cotton, S., Duncan, A., El Feky, A., Gardner, H., Goodman , K., Lanz, D., McDonald, A., Ogburn, E., Starr, K., Stevens, N., Valente, M., & Fernie, G. (2020). Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project. Trials, 21(1), 535. Article 535. https://doi.org/10.1186/s13063-020-04388-x

Crowley, E, Treweek, S, Banister, K, Breeman, S , Constable, L , Cotton, S, Duncan, A, El Feky, A, Gardner, H, Goodman , K, Lanz, D, McDonald, A, Ogburn, E, Starr, K, Stevens, N, Valente, M & Fernie, G 2020, 'Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project', Trials, vol. 21, no. 1, 535, pp. 535. https://doi.org/10.1186/s13063-020-04388-x

@article{3a9c4d19a6f94f6e9172e7aca784cd17,

title = "Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project",

abstract = "Background: Data collection consumes a large proportion of clinical trial resources. Each data item requires time and effort for collection, processing and quality control procedures. In general, more data equals a heavier burdenfor trial staff and participants. It is also likely to increase costs. Knowing the types of data being collected, and in what proportion, will be helpful to ensure that limited trial resources and participant goodwill are used wisely.Aim: The aim of this study is to categorise the types of data collected across a broad range of trials and assess what proportion of collected data each category represents.Methods: We developed a standard operating procedure to categorise data into primary outcome, secondary outcome and 15 other categories. We categorised all variables collected on trial data collection forms from 18,mainly publicly funded, randomised superiority trials, including trials of an investigational medicinal product and complex interventions. Categorisation was done independently in pairs: one person having in-depth knowledge ofthe trial, the other independent of the trial. Disagreement was resolved through reference to the trial protocol and discussion, with the project team being consulted if necessary.Key results: Primary outcome data accounted for 5.0% (median)/11.2% (mean) of all data items collected.Secondary outcomes accounted for 39.9% (median)/42.5% (mean) of all data items. Non-outcome data such as participant identifiers and demographic data represented 32.4% (median)/36.5% (mean) of all data items collected.Conclusion: A small proportion of the data collected in our sample of 18 trials was related to the primary outcome.Secondary outcomes accounted for eight times the volume of data as the primary outcome. A substantial amount of data collection is not related to trial outcomes. Trialists should work to make sure that the data they collect are only those essential to support the health and treatment decisions of those whom the trial is designed to inform.",

keywords = "PROTOCOL, IMPACT",

author = "Evelyn Crowley and Shaun Treweek and Katie Banister and Suzanne Breeman and Lynda Constable and Seonaidh Cotton and Anne Duncan and {El Feky}, Adel and Heidi Gardner and Kirsteen Goodman and Doris Lanz and Alison McDonald and Emma Ogburn and Kath Starr and Natasha Stevens and Marie Valente and Gordon Fernie",

note = "We would like to thank Joanne Palmer and all attendees of the 2015 workshop at the UK Trial Managers{\textquoteright} Network meeting. We thank all Chief Investigators of the trials in our sample for giving their permission to use their trial data collections forms in our analysis: Annie S Anderson (ActWELL), Doreen McClurg (AMBER), Charles Knowles (CONFIDeNT), Augusto AzuaraBlanco (EAGLE), Frank Sullivan (ECLS), Shakila Thangaratinam (EMPiRE), Kevin Cooper (HEALTH), Eugene Dempsey (HIP), Craig Ramsay (iQUAD), Ian Reid (KANECT), David Murray (KAT), Saruban Pasu (PIMS), Khalid S Khan (SALVO), Robert Pickard (SUSPEND), Anthony King (TAGS), Graham Devereux (TWICS), Adrian R Martineau (ViDiFlu), Charis Glazener (VUE). Similarly, we thank the funders of all the trials: Chief Scientist Office (CSO), Scottish Government Health Directorate; CSO, Scottish Government and Oncimmune Ltd; European Commission within the Seventh Framework Programme; National Institute for Health Research - Efficacy and Mechanism Evaluation (NIHR-EME); National Institute for Health Research - Health Technology Assessment (NIHR-HTA) programme; National Institute for Health Research - Programme Grants for Applied Research (NIHR-PGAR); the Scottish Government. The Health Services Research Unit, University of Aberdeen, receives core funding from the CSO of the Scottish Government Health Directorates.",

year = "2020",

month = jun,

day = "16",

doi = "10.1186/s13063-020-04388-x",

language = "English",

volume = "21",

pages = "535",

journal = "Trials",

issn = "1745-6215",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Using systematic data categorisation to quantify the types of data collected in clinical trials

T2 - the DataCat project

AU - Crowley, Evelyn

AU - Treweek, Shaun

AU - Banister, Katie

AU - Breeman, Suzanne

AU - Constable, Lynda

AU - Cotton, Seonaidh

AU - Duncan, Anne

AU - El Feky, Adel

AU - Gardner, Heidi

AU - Goodman , Kirsteen

AU - Lanz, Doris

AU - McDonald, Alison

AU - Ogburn, Emma

AU - Starr, Kath

AU - Stevens, Natasha

AU - Valente, Marie

AU - Fernie, Gordon

N1 - We would like to thank Joanne Palmer and all attendees of the 2015 workshop at the UK Trial Managers’ Network meeting. We thank all Chief Investigators of the trials in our sample for giving their permission to use their trial data collections forms in our analysis: Annie S Anderson (ActWELL), Doreen McClurg (AMBER), Charles Knowles (CONFIDeNT), Augusto AzuaraBlanco (EAGLE), Frank Sullivan (ECLS), Shakila Thangaratinam (EMPiRE), Kevin Cooper (HEALTH), Eugene Dempsey (HIP), Craig Ramsay (iQUAD), Ian Reid (KANECT), David Murray (KAT), Saruban Pasu (PIMS), Khalid S Khan (SALVO), Robert Pickard (SUSPEND), Anthony King (TAGS), Graham Devereux (TWICS), Adrian R Martineau (ViDiFlu), Charis Glazener (VUE). Similarly, we thank the funders of all the trials: Chief Scientist Office (CSO), Scottish Government Health Directorate; CSO, Scottish Government and Oncimmune Ltd; European Commission within the Seventh Framework Programme; National Institute for Health Research - Efficacy and Mechanism Evaluation (NIHR-EME); National Institute for Health Research - Health Technology Assessment (NIHR-HTA) programme; National Institute for Health Research - Programme Grants for Applied Research (NIHR-PGAR); the Scottish Government. The Health Services Research Unit, University of Aberdeen, receives core funding from the CSO of the Scottish Government Health Directorates.

PY - 2020/6/16

Y1 - 2020/6/16

N2 - Background: Data collection consumes a large proportion of clinical trial resources. Each data item requires time and effort for collection, processing and quality control procedures. In general, more data equals a heavier burdenfor trial staff and participants. It is also likely to increase costs. Knowing the types of data being collected, and in what proportion, will be helpful to ensure that limited trial resources and participant goodwill are used wisely.Aim: The aim of this study is to categorise the types of data collected across a broad range of trials and assess what proportion of collected data each category represents.Methods: We developed a standard operating procedure to categorise data into primary outcome, secondary outcome and 15 other categories. We categorised all variables collected on trial data collection forms from 18,mainly publicly funded, randomised superiority trials, including trials of an investigational medicinal product and complex interventions. Categorisation was done independently in pairs: one person having in-depth knowledge ofthe trial, the other independent of the trial. Disagreement was resolved through reference to the trial protocol and discussion, with the project team being consulted if necessary.Key results: Primary outcome data accounted for 5.0% (median)/11.2% (mean) of all data items collected.Secondary outcomes accounted for 39.9% (median)/42.5% (mean) of all data items. Non-outcome data such as participant identifiers and demographic data represented 32.4% (median)/36.5% (mean) of all data items collected.Conclusion: A small proportion of the data collected in our sample of 18 trials was related to the primary outcome.Secondary outcomes accounted for eight times the volume of data as the primary outcome. A substantial amount of data collection is not related to trial outcomes. Trialists should work to make sure that the data they collect are only those essential to support the health and treatment decisions of those whom the trial is designed to inform.

AB - Background: Data collection consumes a large proportion of clinical trial resources. Each data item requires time and effort for collection, processing and quality control procedures. In general, more data equals a heavier burdenfor trial staff and participants. It is also likely to increase costs. Knowing the types of data being collected, and in what proportion, will be helpful to ensure that limited trial resources and participant goodwill are used wisely.Aim: The aim of this study is to categorise the types of data collected across a broad range of trials and assess what proportion of collected data each category represents.Methods: We developed a standard operating procedure to categorise data into primary outcome, secondary outcome and 15 other categories. We categorised all variables collected on trial data collection forms from 18,mainly publicly funded, randomised superiority trials, including trials of an investigational medicinal product and complex interventions. Categorisation was done independently in pairs: one person having in-depth knowledge ofthe trial, the other independent of the trial. Disagreement was resolved through reference to the trial protocol and discussion, with the project team being consulted if necessary.Key results: Primary outcome data accounted for 5.0% (median)/11.2% (mean) of all data items collected.Secondary outcomes accounted for 39.9% (median)/42.5% (mean) of all data items. Non-outcome data such as participant identifiers and demographic data represented 32.4% (median)/36.5% (mean) of all data items collected.Conclusion: A small proportion of the data collected in our sample of 18 trials was related to the primary outcome.Secondary outcomes accounted for eight times the volume of data as the primary outcome. A substantial amount of data collection is not related to trial outcomes. Trialists should work to make sure that the data they collect are only those essential to support the health and treatment decisions of those whom the trial is designed to inform.

KW - PROTOCOL

KW - IMPACT

UR - http://www.scopus.com/inward/record.url?scp=85086686457&partnerID=8YFLogxK

U2 - 10.1186/s13063-020-04388-x

DO - 10.1186/s13063-020-04388-x

M3 - Article

C2 - 32546192

SN - 1745-6215

VL - 21

SP - 535

JO - Trials

JF - Trials

IS - 1

M1 - 535

ER -

Using systematic data categorisation to quantify the types of data collected in clinical trials: the DataCat project

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this