TY - JOUR
T1 - Multivariate outlier detection applied to multiply imputed laboratory data
AU - Penny, Kay I.
AU - Jolliffe, Ian T.
N1 - ACKNOWLEDGEMENTS The first author was supported by a Co-operative Award in Science and Engineering studentship funded by the Engineering and Physical Sciences Research Council and Knoll Pharmaceuticals (formerly Boots Pharmaceuticals), Nottingham.
PY - 1999/7/30
Y1 - 1999/7/30
N2 - In clinical laboratory safety data, multivariate outlier detection methods may highlight a patient whose laboratory measurements do not follow the same pattern of relationships as the majority of patients, although their individual measurements are not found to be outlying when considered one at a time. Missing data problems are often dealt with by imputing a single value as an estimate of the missing value. The completed data set may then be analysed using traditional methods. A disadvantage of using single imputation is the underestimation of variability, with a corresponding distortion of power in hypothesis testing. Multiple imputation methods attempt to overcome this problem, and in this paper a study is described which considers the application of multivariate outlier detection methods to multiply imputed clinical laboratory safety data sets. Three different proportions of missing data are generated in laboratory data sets of dimensions 4, 7, 12 and 30, and a comparison of eight multiple imputation methods is carried out. Two outlier detection techniques, Mahalanobis distance and generalized principal component analysis, are applied to the multiply imputed data sets, and their performances are discussed. Measures are introduced for assessing the accuracy of the missing data results, depending on which method of analysis is used.
AB - In clinical laboratory safety data, multivariate outlier detection methods may highlight a patient whose laboratory measurements do not follow the same pattern of relationships as the majority of patients, although their individual measurements are not found to be outlying when considered one at a time. Missing data problems are often dealt with by imputing a single value as an estimate of the missing value. The completed data set may then be analysed using traditional methods. A disadvantage of using single imputation is the underestimation of variability, with a corresponding distortion of power in hypothesis testing. Multiple imputation methods attempt to overcome this problem, and in this paper a study is described which considers the application of multivariate outlier detection methods to multiply imputed clinical laboratory safety data sets. Three different proportions of missing data are generated in laboratory data sets of dimensions 4, 7, 12 and 30, and a comparison of eight multiple imputation methods is carried out. Two outlier detection techniques, Mahalanobis distance and generalized principal component analysis, are applied to the multiply imputed data sets, and their performances are discussed. Measures are introduced for assessing the accuracy of the missing data results, depending on which method of analysis is used.
UR - http://www.scopus.com/inward/record.url?scp=0033618487&partnerID=8YFLogxK
U2 - 10.1002/(SICI)1097-0258(19990730)18:14<1879::AID-SIM225>3.0.CO;2-6
DO - 10.1002/(SICI)1097-0258(19990730)18:14<1879::AID-SIM225>3.0.CO;2-6
M3 - Article
C2 - 10407259
AN - SCOPUS:0033618487
SN - 0277-6715
VL - 18
SP - 1879
EP - 1895
JO - Statistics in Medicine
JF - Statistics in Medicine
IS - 14
ER -