Differential abundance testing on single-cell data using k-nearest neighbor graphs

Emma Dann; Neil C Henderson; Sarah A Teichmann; Michael D Morgan; John C Marioni

doi:10.1038/s41587-021-01033-z

Differential abundance testing on single-cell data using k-nearest neighbor graphs

Emma Dann, Neil C Henderson, Sarah A Teichmann, Michael D Morgan, John C Marioni

Research output: Contribution to journal › Article › peer-review

Abstract

Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. Using simulations and single-cell RNA sequencing (scRNA-seq) data, we show that Milo can identify perturbations that are obscured by discretizing cells into clusters, that it maintains false discovery rate control across batch effects and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the aging mouse thymus and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell similarity structure, it might also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR .

Original language	English
Pages (from-to)	245-253
Number of pages	9
Journal	Nature Biotechnology
Volume	40
Issue number	2
Early online date	30 Sept 2021
DOIs	https://doi.org/10.1038/s41587-021-01033-z
Publication status	Published - Feb 2022

Bibliographical note

Acknowledgements
We thank S. Ghazanfar for feedback on the method; N. Kumasaka for comments on the manuscript; C. Suo, V. Kedlian, R. Elmentaite, J. P. Pett, K. Tuong and B. Stewart for feedback on the software package; and D. Burkhardt, M. Luecken and W. Lewis for discussions on benchmarking. J.C.M. acknowledges core funding from the European Molecular Biology Laboratory and core funding from Cancer Research UK (C9545/A29580), which supports M.D.M. E.D. and S.A.T. acknowledge Wellcome Sanger core funding (WT206194). N.C.H. is supported by a Wellcome Trust Senior Research Fellowship in Clinical Science (ref. 219542/Z/19/Z), the Medical Research Council and a Chan Zuckerberg Initiative Seed Network Grant.

Keywords

Animals
Cluster Analysis
Gene Expression Profiling
Mice
Sequence Analysis, RNA
Single-Cell Analysis
Software

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1038/s41587-021-01033-z

Cite this

@article{9616eb7760c7418398ab1ccddd3760ef,

title = "Differential abundance testing on single-cell data using k-nearest neighbor graphs",

abstract = "Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. Using simulations and single-cell RNA sequencing (scRNA-seq) data, we show that Milo can identify perturbations that are obscured by discretizing cells into clusters, that it maintains false discovery rate control across batch effects and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the aging mouse thymus and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell similarity structure, it might also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR .",

keywords = "Animals, Cluster Analysis, Gene Expression Profiling, Mice, Sequence Analysis, RNA, Single-Cell Analysis, Software",

author = "Emma Dann and Henderson, {Neil C} and Teichmann, {Sarah A} and Morgan, {Michael D} and Marioni, {John C}",

note = "Acknowledgements We thank S. Ghazanfar for feedback on the method; N. Kumasaka for comments on the manuscript; C. Suo, V. Kedlian, R. Elmentaite, J. P. Pett, K. Tuong and B. Stewart for feedback on the software package; and D. Burkhardt, M. Luecken and W. Lewis for discussions on benchmarking. J.C.M. acknowledges core funding from the European Molecular Biology Laboratory and core funding from Cancer Research UK (C9545/A29580), which supports M.D.M. E.D. and S.A.T. acknowledge Wellcome Sanger core funding (WT206194). N.C.H. is supported by a Wellcome Trust Senior Research Fellowship in Clinical Science (ref. 219542/Z/19/Z), the Medical Research Council and a Chan Zuckerberg Initiative Seed Network Grant.",

year = "2022",

month = feb,

doi = "10.1038/s41587-021-01033-z",

language = "English",

volume = "40",

pages = "245--253",

journal = "Nature Biotechnology",

issn = "1087-0156",

publisher = "Nature Publishing Group",

number = "2",

}

TY - JOUR

T1 - Differential abundance testing on single-cell data using k-nearest neighbor graphs

AU - Dann, Emma

AU - Henderson, Neil C

AU - Teichmann, Sarah A

AU - Morgan, Michael D

AU - Marioni, John C

N1 - Acknowledgements We thank S. Ghazanfar for feedback on the method; N. Kumasaka for comments on the manuscript; C. Suo, V. Kedlian, R. Elmentaite, J. P. Pett, K. Tuong and B. Stewart for feedback on the software package; and D. Burkhardt, M. Luecken and W. Lewis for discussions on benchmarking. J.C.M. acknowledges core funding from the European Molecular Biology Laboratory and core funding from Cancer Research UK (C9545/A29580), which supports M.D.M. E.D. and S.A.T. acknowledge Wellcome Sanger core funding (WT206194). N.C.H. is supported by a Wellcome Trust Senior Research Fellowship in Clinical Science (ref. 219542/Z/19/Z), the Medical Research Council and a Chan Zuckerberg Initiative Seed Network Grant.

PY - 2022/2

Y1 - 2022/2

N2 - Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. Using simulations and single-cell RNA sequencing (scRNA-seq) data, we show that Milo can identify perturbations that are obscured by discretizing cells into clusters, that it maintains false discovery rate control across batch effects and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the aging mouse thymus and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell similarity structure, it might also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR .

AB - Current computational workflows for comparative analyses of single-cell datasets typically use discrete clusters as input when testing for differential abundance among experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighborhoods on a k-nearest neighbor graph. Using simulations and single-cell RNA sequencing (scRNA-seq) data, we show that Milo can identify perturbations that are obscured by discretizing cells into clusters, that it maintains false discovery rate control across batch effects and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the aging mouse thymus and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell similarity structure, it might also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR .

KW - Animals

KW - Cluster Analysis

KW - Gene Expression Profiling

KW - Mice

KW - Sequence Analysis, RNA

KW - Single-Cell Analysis

KW - Software

U2 - 10.1038/s41587-021-01033-z

DO - 10.1038/s41587-021-01033-z

M3 - Article

C2 - 34594043

SN - 1087-0156

VL - 40

SP - 245

EP - 253

JO - Nature Biotechnology

JF - Nature Biotechnology

IS - 2

ER -

Differential abundance testing on single-cell data using k-nearest neighbor graphs

Abstract

Bibliographical note

Keywords

UN SDGs

Access to Document

Fingerprint

Cite this