首页 | 本学科首页   官方微博 | 高级检索  
检索        


EMLasso: logistic lasso with missing data
Authors:N Sabbe  O Thas  J‐P Ottoy
Institution:1. Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, , Coupure Links 653a Ghent, Belgium;2. Centre for Statistical and Survey Methodology, School of Mathematics and Applied Statistics, University of Wollongong, , NSW 2522, Australia
Abstract:In clinical settings, missing data in the covariates occur frequently. For example, some markers are expensive or hard to measure. When this sort of data is used for model selection, the missingness is often resolved through a complete case analysis or a form of single imputation. An alternative sometimes comes in the form of leaving the most damaged covariates out. All these strategies jeopardise the goal of model selection. In earlier work, we have applied the logistic Lasso in combination with multiple imputation to obtain results in such settings, but we only provided heuristic arguments to advocate the method. In this paper, we propose an improved method that builds on firm statistical arguments and that is developed along the lines of the stochastic expectation–maximisation algorithm. We show that our method can be used to handle missing data in both categorical and continuous predictors, as well as in a nonpenalised regression. We demonstrate the method by applying it to data of 273 lung cancer patients. The objective is to select a model for the prediction of acute dysphagia, starting from a large set of potential predictors, including clinical and treatment covariates as well as a set of single‐nucleotide polymorphisms. Copyright © 2013 John Wiley & Sons, Ltd.
Keywords:missing data  model selection  Lasso  EM
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号