EMLasso: logistic lasso with missing data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

EMLasso: logistic lasso with missing data

Authors:	N Sabbe O Thas J‐P Ottoy

Institution:	1. Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, , Coupure Links 653a Ghent, Belgium;2. Centre for Statistical and Survey Methodology, School of Mathematics and Applied Statistics, University of Wollongong, , NSW 2522, Australia

Abstract:	In clinical settings, missing data in the covariates occur frequently. For example, some markers are expensive or hard to measure. When this sort of data is used for model selection, the missingness is often resolved through a complete case analysis or a form of single imputation. An alternative sometimes comes in the form of leaving the most damaged covariates out. All these strategies jeopardise the goal of model selection. In earlier work, we have applied the logistic Lasso in combination with multiple imputation to obtain results in such settings, but we only provided heuristic arguments to advocate the method. In this paper, we propose an improved method that builds on firm statistical arguments and that is developed along the lines of the stochastic expectation–maximisation algorithm. We show that our method can be used to handle missing data in both categorical and continuous predictors, as well as in a nonpenalised regression. We demonstrate the method by applying it to data of 273 lung cancer patients. The objective is to select a model for the prediction of acute dysphagia, starting from a large set of potential predictors, including clinical and treatment covariates as well as a set of single‐nucleotide polymorphisms. Copyright © 2013 John Wiley & Sons, Ltd.

Keywords:	missing data model selection Lasso EM

设为首页 | 免责声明 | 关于勤云 | 加入收藏