Small sample sizes: A big data problem in high-dimensional data analysis

Frank  Konietschke; Karima Schwab; Markus Pauly

doi:10.1177/0962280220970228

Small sample sizes: A big data problem in high-dimensional data analysis

Frank Konietschke^* (Corresponding Author), Karima Schwab, Markus Pauly

^*Corresponding author for this work

Applied Medicine

Research output: Contribution to journal › Article › peer-review

27 Citations (Scopus)

8 Downloads (Pure)

Abstract

In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods.

Original language	English
Pages (from-to)	687–701
Number of pages	15
Journal	Statistical Methods in Medical Research
Volume	30
Issue number	3
Early online date	24 Nov 2020
DOIs	https://doi.org/10.1177/0962280220970228
Publication status	Published - 1 Mar 2021

Bibliographical note

Acknowledgements
The authors are grateful to the Editor, Associate Editor and three anonymous referees for their helpful suggestions, which greatly improved the manuscript.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by the German Science Foundation awards number DFG KO 4680/3-2 and PA 2409/3-2.

Keywords

Multiple contrast tests
max t-test
repeated measures
resampling
simultaneous confidence intervals

Access to Document

10.1177/0962280220970228Licence: CC BY

Konietschke_etal_SMMR_Small_sample_size_VOR
https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
Final published version, 408 KBLicence: CC BY

Cite this

@article{f6983a36ac074790bb74ebdad5a3aa40,

title = "Small sample sizes: A big data problem in high-dimensional data analysis",

abstract = "In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods. ",

keywords = "Multiple contrast tests, max t-test, repeated measures, resampling, simultaneous confidence intervals",

author = "Frank Konietschke and Karima Schwab and Markus Pauly",

note = "Acknowledgements The authors are grateful to the Editor, Associate Editor and three anonymous referees for their helpful suggestions, which greatly improved the manuscript. Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by the German Science Foundation awards number DFG KO 4680/3-2 and PA 2409/3-2.",

year = "2021",

month = mar,

day = "1",

doi = "10.1177/0962280220970228",

language = "English",

volume = "30",

pages = "687–701",

journal = "Statistical Methods in Medical Research",

issn = "0962-2802",

publisher = "SAGE Publications Ltd",

number = "3",

}

TY - JOUR

T1 - Small sample sizes

T2 - A big data problem in high-dimensional data analysis

AU - Konietschke, Frank

AU - Schwab, Karima

AU - Pauly, Markus

N1 - Acknowledgements The authors are grateful to the Editor, Associate Editor and three anonymous referees for their helpful suggestions, which greatly improved the manuscript. Funding The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by the German Science Foundation awards number DFG KO 4680/3-2 and PA 2409/3-2.

PY - 2021/3/1

Y1 - 2021/3/1

N2 - In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods.

AB - In many experiments and especially in translational and preclinical research, sample sizes are (very) small. In addition, data designs are often high dimensional, i.e. more dependent than independent replications of the trial are observed. The present paper discusses the applicability of max t-test-type statistics (multiple contrast tests) in high-dimensional designs (repeated measures or multivariate) with small sample sizes. A randomization-based approach is developed to approximate the distribution of the maximum statistic. Extensive simulation studies confirm that the new method is particularly suitable for analyzing data sets with small sample sizes. A real data set illustrates the application of the methods.

KW - Multiple contrast tests

KW - max t-test

KW - repeated measures

KW - resampling

KW - simultaneous confidence intervals

UR - http://dx.doi.org/10.1177/0962280220970228

U2 - 10.1177/0962280220970228

DO - 10.1177/0962280220970228

M3 - Article

SN - 0962-2802

VL - 30

SP - 687

EP - 701

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

IS - 3

ER -

Small sample sizes: A big data problem in high-dimensional data analysis

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this