首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wu H  Wu L 《Statistics in medicine》2002,21(5):753-771
Non-linear mixed-effects models are powerful tools for modelling HIV viral dynamics. In AIDS clinical trials, the viral load measurements for each subject are often sparse. In such cases, linearization procedures are usually used for inferences. Under such linearization procedures, however, standard covariate selection methods based on the approximate likelihood, such as the likelihood ratio test, may not be reliable. In order to identify significant host factors for HIV dynamics, in this paper we consider two alternative approaches for covariate selection: one is based on individual non-linear least square estimates and the other is based on individual empirical Bayes estimates. Our simulation study shows that, if the within-individual data are sparse and the between-individual variation is large, the two alternative covariate selection methods are more reliable than the likelihood ratio test, and the more powerful method based on individual empirical Bayes estimates is especially preferable. We also consider the missing data in covariates. The commonly used missing data methods may lead to misleading results. We recommend a multiple imputation method to handle missing covariates. A real data set from an AIDS clinical trial is analysed based on various covariate selection methods and missing data methods.  相似文献   

2.
Standard implementations of multiple imputation (MI) approaches provide unbiased inferences based on an assumption of underlying missing at random (MAR) mechanisms. However, in the presence of missing data generated by missing not at random (MNAR) mechanisms, MI is not satisfactory. Originating in an econometric statistical context, Heckman's model, also called the sample selection method, deals with selected samples using two joined linear equations, termed the selection equation and the outcome equation. It has been successfully applied to MNAR outcomes. Nevertheless, such a method only addresses missing outcomes, and this is a strong limitation in clinical epidemiology settings, where covariates are also often missing. We propose to extend the validity of MI to some MNAR mechanisms through the use of the Heckman's model as imputation model and a two‐step estimation process. This approach will provide a solution that can be used in an MI by chained equation framework to impute missing (either outcomes or covariates) data resulting either from a MAR or an MNAR mechanism when the MNAR mechanism is compatible with a Heckman's model. The approach is illustrated on a real dataset from a randomised trial in patients with seasonal influenza. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
Missing data are ubiquitous in longitudinal studies. In this paper, we propose an imputation procedure to handle dropouts in longitudinal studies. By taking advantage of the monotone missing pattern resulting from dropouts, our imputation procedure can be carried out sequentially, which substantially reduces the computation complexity. In addition, at each step of the sequential imputation, we set up a model selection mechanism that chooses between a parametric model and a nonparametric model to impute each missing observation. Unlike usual model selection procedures that aim at finding a single model fitting the entire data set well, our model selection procedure is customized to find a suitable model for the prediction of each missing observation. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
In behavioral, biomedical, and social‐psychological sciences, it is common to encounter latent variables and heterogeneous data. Mixture structural equation models (SEMs) are very useful methods to analyze these kinds of data. Moreover, the presence of missing data, including both missing responses and missing covariates, is an important issue in practical research. However, limited work has been done on the analysis of mixture SEMs with non‐ignorable missing responses and covariates. The main objective of this paper is to develop a Bayesian approach for analyzing mixture SEMs with an unknown number of components, in which a multinomial logit model is introduced to assess the influence of some covariates on the component probability. Results of our simulation study show that the Bayesian estimates obtained by the proposed method are accurate, and the model selection procedure via a modified DIC is useful in identifying the correct number of components and in selecting an appropriate missing mechanism in the proposed mixture SEMs. A real data set related to a longitudinal study of polydrug use is employed to illustrate the methodology. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

6.
Analysis of health care cost data is often complicated by a high level of skewness, heteroscedastic variances and the presence of missing data. Most of the existing literature on cost data analysis have been focused on modeling the conditional mean. In this paper, we study a weighted quantile regression approach for estimating the conditional quantiles health care cost data with missing covariates. The weighted quantile regression estimator is consistent, unlike the naive estimator, and asymptotically normal. Furthermore, we propose a modified BIC for variable selection in quantile regression when the covariates are missing at random. The quantile regression framework allows us to obtain a more complete picture of the effects of the covariates on the health care cost and is naturally adapted to the skewness and heterogeneity of the cost data. The method is semiparametric in the sense that it does not require to specify the likelihood function for the random error or the covariates. We investigate the weighted quantile regression procedure and the modified BIC via extensive simulations. We illustrate the application by analyzing a real data set from a health care cost study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
We are interested in developing integrative approaches for variable selection problems that incorporate external knowledge on a set of predictors of interest. In particular, we have developed an integrative Bayesian model uncertainty (iBMU) method, which formally incorporates multiple sources of data via a second‐stage probit model on the probability that any predictor is associated with the outcome of interest. Using simulations, we demonstrate that iBMU leads to an increase in power to detect true marginal associations over more commonly used variable selection techniques, such as least absolute shrinkage and selection operator and elastic net. In addition, iBMU leads to a more efficient model search algorithm over the basic BMU method even when the predictor‐level covariates are only modestly informative. The increase in power and efficiency of our method becomes more substantial as the predictor‐level covariates become more informative. Finally, we demonstrate the power and flexibility of iBMU for integrating both gene structure and functional biomarker information into a candidate gene study investigating over 50 genes in the brain reward system and their role with smoking cessation from the Pharmacogenetics of Nicotine Addiction and Treatment Consortium. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

8.
A Bayesian toolkit for genetic association studies   总被引:3,自引:0,他引:3  
We present a range of modelling components designed to facilitate Bayesian analysis of genetic-association-study data. A key feature of our approach is the ability to combine different submodels together, almost arbitrarily, for dealing with the complexities of real data. In particular, we propose various techniques for selecting the "best" subset of genetic predictors for a specific phenotype (or set of phenotypes). At the same time, we may control for complex, non-linear relationships between phenotypes and additional (non-genetic) covariates as well as accounting for any residual correlation that exists among multiple phenotypes. Both of these additional modelling components are shown to potentially aid in detecting the underlying genetic signal. We may also account for uncertainty regarding missing genotype data. Indeed, at the heart of our approach is a novel method for reconstructing unobserved haplotypes and/or inferring the values of missing genotypes. This can be deployed independently or, alternatively, it can be fully integrated into arbitrary genotype- or haplotype-based association models such that the missing data and the association model are "estimated" simultaneously. The impact of such simultaneous analysis on inferences drawn from the association model is shown to be potentially significant. Our modelling components are packaged as an "add-on" interface to the widely used WinBUGS software, which allows Markov chain Monte Carlo analysis of a wide range of statistical models. We illustrate their use with a series of increasingly complex analyses conducted on simulated data based on a real pharmacogenetic example.  相似文献   

9.
In many health services applications, research to determine the effectiveness of a particular treatment cannot be carried out using a controlled clinical trial. In settings such as these, observational studies must be used. Propensity score methods are useful tools to employ in order to balance the distribution of covariates between treatment groups and hence reduce the potential bias in treatment effect estimates in observational studies. A challenge in many health services research studies is the presence of missing data among the covariates that need to be balanced. In this paper, we compare three simple propensity models using data that examine the effectiveness of self-monitoring of blood glucose (SMBG) in reducing hemoglobin A1c in a cohort of 10,566 type 2 diabetics. The first propensity score model uses only subjects with complete case data (n=6,687), the second incorporates missing value indicators into the model, and the third fits separate propensity scores for each pattern of missing data. We compare the results of these methods and find that incorporating missing data into the propensity score model reduces the estimated effect of SMBG on hemoglobin A1c by more than 10%, although this reduction was not clinically significant. In addition, beginning with the complete data, we artificially introduce missing data using a nonignorable missing data mechanism and compare treatment effect estimates using the three propensity score methods and a simple analysis of covariance (ANCOVA) method. In these analyses, we find that the complete case analysis and the ANCOVA method both perform poorly, the missing value indicator model performs moderately well, and the pattern mixture model performs even better in estimating the original treatment effect observed in thecomplete data prior to the introduction of artificial missing data. We conclude that in observational studies onemust not only adjust for potentially confounding variables using methods such as propensity scores, but oneshould also account for missing data in these models in order to allow for causal inference more appropriately to be applied.  相似文献   

10.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
In linear mixed models the influence of covariates is restricted to a strictly parametric form. With the rise of semi- and non-parametric regression also the mixed model has been expanded to allow for additive predictors. The common approach uses the representation of additive models as mixed models. An alternative approach that is proposed in the present paper is likelihood based boosting. Boosting originates in the machine learning community where it has been proposed as a technique to improve classification procedures by combining estimates with reweighted observations. Likelihood based boosting is a general method which may be seen as an extension of L2 boost. In additive mixed models the advantage of boosting techniques in the form of componentwise boosting is that it is suitable for high dimensional settings where many explanatory variables are present. It allows to fit additive models for many covariates with implicit selection of relevant variables and automatic selection of smoothing parameters. Moreover, boosting techniques may be used to incorporate the subject-specific variation of smooth influence functions by specifying 'random slopes' on smooth effects. This results in flexible semiparametric mixed models which are appropriate in cases where a simple random intercept is unable to capture the variation of effects across subjects.  相似文献   

12.
A variety of methods and algorithms are available for estimating parameters in the class of a generalized linear model in the presence of missing values. However, there is little information on how this already built model can be used for prediction in new observations with missing data in the covariates. Dropping the observations with missing values is a widespread practice with serious statistical and non-statistical implications. One solution is to fit separate regression models, or submodels, to each pattern of missing covariates. In practice, for any iterative regression method, this approach is computationally intensive. We propose a simple methodology to predict outcomes for individuals with incomplete information based on the estimated coefficients and covariance from the already built model. This method does not require revisiting the original data set used to build the original model and works by generating a first-order approximation of any submodel coefficient estimates. This is achieved by using the SWEEP operator on an augmented covariance matrix obtained from the original model. We refer to this approach as the one-step sweep (OSS) method. The methodology is demonstrated using data from the Department of Veterans Affairs Continuous Improvement in Cardiac Surgery Program (CICSP). These data contain 30 day mortality, the outcome of interest, and risk information for over 14,000 patients who underwent coronary artery bypass grafting (CABG) surgery over a four-year period. Using complete data from the first 3.5 years of this study period, a logistic regression model was built. This model was then used to predict mortality for patients undergoing CABG in the most recent 6-months. In order to evaluate the performance of the OSS method we randomly generated observations with missing covariates in the 6-month prediction database. We use this simulation to demonstrate that the computationally efficient OSS substantially reduces the error in risk-adjusted mortality created when cases with incomplete information are eliminated. Lastly, we derive the relationship between the OSS method and data imputation.  相似文献   

13.
Several approaches exist for handling missing covariates in the Cox proportional hazards model. The multiple imputation (MI) is relatively easy to implement with various software available and results in consistent estimates if the imputation model is correct. On the other hand, the fully augmented weighted estimators (FAWEs) recover a substantial proportion of the efficiency and have the doubly robust property. In this paper, we compare the FAWEs and the MI through a comprehensive simulation study. For the MI, we consider the multiple imputation by chained equation and focus on two imputation methods: Bayesian linear regression imputation and predictive mean matching. Simulation results show that the imputation methods can be rather sensitive to model misspecification and may have large bias when the censoring time depends on the missing covariates. In contrast, the FAWEs allow the censoring time to depend on the missing covariates and are remarkably robust as long as getting either the conditional expectations or the selection probability correct due to the doubly robust property. The comparison suggests that the FAWEs show the potential for being a competitive and attractive tool for tackling the analysis of survival data with missing covariates. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

14.
Although randomized experiments are widely regarded as the gold standard for estimating causal effects, missing data of the pretreatment covariates makes it challenging to estimate the subgroup causal effects. When the missing data mechanism of the covariates is nonignorable, the parameters of interest are generally not pointly identifiable, and we can only get bounds for the parameters of interest, which may be too wide for practical use. In some real cases, we have prior knowledge that some restrictions may be plausible. We show the identifiability of the causal effects and joint distributions for four interpretable missing data mechanisms and evaluate the performance of the statistical inference via simulation studies. One application of our methods to a real data set from a randomized clinical trial shows that one of the nonignorable missing data mechanisms fits better than the ignorable missing data mechanism, and the results conform to the study's original expert opinions. We also illustrate the potential applications of our methods to observational studies using a data set from a job‐training program. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

15.
The treatment of missing data in comparative effectiveness studies with right-censored outcomes and time-varying covariates is challenging because of the multilevel structure of the data. In particular, the performance of an accessible method like multiple imputation (MI) under an imputation model that ignores the multilevel structure is unknown and has not been compared to complete-case (CC) and single imputation methods that are most commonly applied in this context. Through an extensive simulation study, we compared statistical properties among CC analysis, last value carried forward, mean imputation, the use of missing indicators, and MI-based approaches with and without auxiliary variables under an extended Cox model when the interest lies in characterizing relationships between non-missing time-varying exposures and right-censored outcomes. MI demonstrated favorable properties under a moderate missing-at-random condition (absolute bias <0.1) and outperformed CC and single imputation methods, even when the MI method did not account for correlated observations in the imputation model. The performance of MI decreased with increasing complexity such as when the missing data mechanism involved the exposure of interest, but was still preferred over other methods considered and performed well in the presence of strong auxiliary variables. We recommend considering MI that ignores the multilevel structure in the imputation model when data are missing in a time-varying confounder, incorporating variables associated with missingness in the MI models as well as conducting sensitivity analyses across plausible assumptions.  相似文献   

16.
Many model selection criteria proposed over the years have become common procedures in applied research. However, these procedures were designed for complete data. Complete data is rare in applied statistics, in particular in medical, public health and health policy settings. Incomplete data, another common problem in applied statistics, introduces its own set of complications in light of which the task of model selection can get quite complicated. Recently, few have suggested model selection procedures for incomplete data with varying degrees of success. In this paper we explore model selection by the Akaike Information Criterion (AIC) in the multivariate regression setting with ignorable missing data accounted for via multiple imputation.  相似文献   

17.
In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.  相似文献   

18.
Logic Regression is a new adaptive regression methodology that attempts to construct predictors as Boolean combinations of (binary) covariates. In this paper we use this algorithm to deal with single‐nucleotide polymorphism (SNP) sequence data. The predictors that are found are interpretable as risk factors of the disease. Significance of these risk factors is assessed using techniques like cross‐validation, permutation tests, and independent test sets. These model selection techniques remain valid when data is dependent, as is the case for the family data used here. In our analysis of the Genetic Analysis Workshop 12 data we identify the exact locations of mutations on gene 1 and gene 6 and a number of mutations on gene 2 that are associated with the affected status, without selecting any false positives. © 2001 Wiley‐Liss, Inc.  相似文献   

19.
Survival analysis has been conventionally performed on a continuous time scale. In practice, the survival time is often recorded or handled on a discrete scale; when this is the case, the discrete-time survival analysis would provide analysis results more relevant to the actual data scale. Besides, data on time-dependent covariates in the survival analysis are usually collected through intermittent follow-ups, resulting in the missing and mismeasured covariate data. In this work, we propose the sufficient discrete hazard (SDH) approach to discrete-time survival analysis with longitudinal covariates that are subject to missingness and mismeasurement. The SDH method employs the conditional score idea available for dealing with mismeasured covariates, and the penalized least squares for estimating the missing covariate value using the regression spline basis. The SDH method is developed for the single event analysis with the logistic discrete hazard model, and for the competing risks analysis with the multinomial logit model. Simulation results revel good finite-sample performances of the proposed estimator and the associated asymptotic theory. The proposed SDH method is applied to the scleroderma lung study data, where the time to medication withdrawal and time to death were recorded discretely in months, for illustration.  相似文献   

20.
BACKGROUND AND OBJECTIVES: To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. METHODS: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. RESULTS: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. CONCLUSION: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号