首页 | 本学科首页   官方微博 | 高级检索  
     


Multiple imputation and analysis for high‐dimensional incomplete proteomics data
Authors:Xiaoyan Yin  Daniel Levy  Christine Willinger  Aram Adourian  Martin G. Larson
Affiliation:1. The Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA, U.S.A.;2. Department of Biostatistics, School of Public Health, Boston University, Boston, MA, U.S.A.;3. Department of Cardiology, Boston University, Boston, MA, U.S.A.;4. Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, Boston, MA, U.S.A.;5. BG Medicine Inc., Waltham, MA, U.S.A.;6. Department of Mathematics and Statistics, Boston University, Boston, MA, U.S.A.
Abstract:Multivariable analysis of proteomics data using standard statistical models is hindered by the presence of incomplete data. We faced this issue in a nested case–control study of 135 incident cases of myocardial infarction and 135 pair‐matched controls from the Framingham Heart Study Offspring cohort. Plasma protein markers (K = 861) were measured on the case–control pairs (N = 135), and the majority of proteins had missing expression values for a subset of samples. In the setting of many more variables than observations (K ? N), we explored and documented the feasibility of multiple imputation approaches along with subsequent analysis of the imputed data sets. Initially, we selected proteins with complete expression data (K = 261) and randomly masked some values as the basis of simulation to tune the imputation and analysis process. We randomly shuffled proteins into several bins, performed multiple imputation within each bin, and followed up with stepwise selection using conditional logistic regression within each bin. This process was repeated hundreds of times. We determined the optimal method of multiple imputation, number of proteins per bin, and number of random shuffles using several performance statistics. We then applied this method to 544 proteins with incomplete expression data (≤40% missing values), from which we identified a panel of seven proteins that were jointly associated with myocardial infarction. © 2015 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Keywords:multiple imputation  stepwise selection  high dimension  imputation quality
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号