Abstract
Abstract Tumour content plays a pivotal role in directing the bioinformatic analysis of molecular profiles such as copy number variation (CNV). In clinical application, tumour purity estimation (TPE) is achieved either through visual pathological review [conventional pathology (CP)] or the deconvolution of molecular data. While CP provides a direct measurement, it demonstrates modest reproducibility and lacks standardisation. Conversely, deconvolution methods offer an indirect assessment with uncertain accuracy, underscoring the necessity for innovative approaches. SoftCTM is an open-source, multiorgan deep-learning (DL) model for the detection of tumour and non-tumour cells in H&E-stained slides, developed within the Overlapped Cell on Tissue Dataset for Histopathology (OCELOT) Challenge 2023. Here, using three large multicentre colorectal cancer (CRC) cohorts (N?=?1,097 patients) with digital pathology and multi-omic data, we compare the utility and accuracy of TPE with SoftCTM versus CP and bioinformatic deconvolution methods (RNA expression, DNA methylation) for downstream molecular analysis, including CNV profiling. SoftCTM showed technical repeatability when applied twice on the same slide (r?=?1.0) and excellent correlations in paired H&E slides (r?>?0.9). TPEs profiled by SoftCTM correlated highly with RNA expression (r?=?0.59) and DNA methylation (r?=?0.40), while TPEs by CP showed a lower correlation with RNA expression (r?=?0.41) and DNA methylation (r?=?0.29). We show that CP and deconvolution methods respectively underestimate and overestimate tumour content compared to SoftCTM, resulting in 6?13% differing CNV calls. In summary, TPE with SoftCTM enables reproducibility, automation, and standardisation at single-cell resolution. SoftCTM estimates (M?=?58.9%, SD ±16.3%) reconcile the overestimation by molecular data extrapolation (RNA expression: M?=?79.2%, SD ±10.5, DNA methylation: M?=?62.7%, SD ±11.8%) and underestimation by CP (M?=?35.9%, SD ±13.1%), providing a more reliable middle ground. A fully integrated computational pathology solution could therefore be used to improve downstream molecular analyses for research and clinics. ? 2024 The Author(s). The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Original language | English |
---|---|
Pages (from-to) | 184-197 |
Number of pages | 14 |
Journal | The Journal of pathology |
Volume | 265 |
Issue number | 2 |
Early online date | 22 Dec 2024 |
DOIs | |
Publication status | E-pub ahead of print - 22 Dec 2024 |
Bibliographical note
The authors thank Claire Butler and Michael Youdell for excellent management in S:CORT and the MRC Clinical Trials Unit which provided the clinical data from the FOCUS trial with permission from the FOCUS trial steering group. We would further like to thank Indica Labs for providing the HALO™ software. The results published or shown here are based in part upon data generated by the TCGA Research Network established by the National Cancer Institute and National Human Genome Research Institute. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov. We would especially like to thank all patients who consented to take part in S:CORT and TCGA. The views expressed are those of the author(s) and not necessarily those of the National Health Service, the National Institute for Health and Care Research, or the Department of Health.Data Availability Statement
FOCUS raw expression data and molecular metadata are publicly available at GEO under reference GSE156915. The transcriptomic data from GRAMPIAN are publicly available at the following link: https://www.scort.org/sites/default/files/exports/scort_ws3_grampian_export_84m9fndk/ws3_grampian_expression_raw.zip. Sequencing data from whole S:CORT are publicly available in EGA (EGAS00001001521). Additional S:CORT data are available to all academic researchers on submission of a data request to the data access committee. For commercial agencies, the data will be made available through Cancer Research Horizons acting on behalf of the funders and consortium members. The TCGA datasets and images analysed in this study are openly and publicly available at https://portal.gdc.cancer.gov/.Keywords
- pathology
- artificial intelligence
- colorectal cancer
- diagnostic molecular pathology
- personalised medicine