首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Linear regression is one of the most popular statistical techniques. In linear regression analysis, missing covariate data occur often. A recent approach to analyse such data is a weighted estimating equation. With weighted estimating equations, the contribution to the estimating equation from a complete observation is weighted by the inverse 'probability of being observed'. In this paper, we propose a weighted estimating equation in which we wrongly assume that the missing covariates are multivariate normal, but still produces consistent estimates as long as the probability of being observed is correctly modelled. In simulations, these weighted estimating equations appear to be highly efficient when compared to the most efficient weighted estimating equation as proposed by Robins et al. and Lipsitz et al. However, these weighted estimating equations, in which we wrongly assume that the missing covariates are multivariate normal, are much less computationally intensive than the weighted estimating equations given by Lipsitz et al. We compare the weighted estimating equations proposed in this paper to the efficient weighted estimating equations via an example and a simulation study. We only consider missing data which are missing at random; non-ignorably missing data are not addressed in this paper.  相似文献   

2.
We propose a semiparametric marginal modeling approach for longitudinal analysis of cohorts with data missing due to death and non‐response to estimate regression parameters interpreted as conditioned on being alive. Our proposed method accommodates outcomes and time‐dependent covariates that are missing not at random with non‐monotone missingness patterns via inverse‐probability weighting. Missing covariates are replaced by consistent estimates derived from a simultaneously solved inverse‐probability‐weighted estimating equation. Thus, we utilize data points with the observed outcomes and missing covariates beyond the estimated weights while avoiding numerical methods to integrate over missing covariates. The approach is applied to a cohort of elderly female hip fracture patients to estimate the prevalence of walking disability over time as a function of body composition, inflammation, and age. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double‐robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood‐based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
Missing data is a very common problem in medical and social studies, especially when data are collected longitudinally. It is a challenging problem to utilize observed data effectively. Many papers on missing data problems can be found in statistical literature. It is well known that the inverse weighted estimation is neither efficient nor robust. On the other hand, the doubly robust (DR) method can improve the efficiency and robustness. As is known, the DR estimation requires a missing data model (i.e., a model for the probability that data are observed) and a working regression model (i.e., a model for the outcome variable given covariates and surrogate variables). Because the DR estimating function has mean zero for any parameters in the working regression model when the missing data model is correctly specified, in this paper, we derive a formula for the estimator of the parameters of the working regression model that yields the optimally efficient estimator of the marginal mean model (the parameters of interest) when the missing data model is correctly specified. Furthermore, the proposed method also inherits the DR property. Simulation studies demonstrate the greater efficiency of the proposed method compared with the standard DR method. A longitudinal dementia data set is used for illustration. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
The augmented inverse weighting method is one of the most popular methods for estimating the mean of the response in causal inference and missing data problems. An important component of this method is the propensity score. Popular parametric models for the propensity score include the logistic, probit, and complementary log-log models. A common feature of these models is that the propensity score is a monotonic function of a linear combination of the explanatory variables. To avoid the need to choose a model, we model the propensity score via a semiparametric single-index model, in which the score is an unknown monotonic nondecreasing function of the given single index. Under this new model, the augmented inverse weighting estimator (AIWE) of the mean of the response is asymptotically linear, semiparametrically efficient, and more robust than existing estimators. Moreover, we have made a surprising observation. The inverse probability weighting and AIWEs based on a correctly specified parametric model may have worse performance than their counterparts based on a nonparametric model. A heuristic explanation of this phenomenon is provided. A real-data example is used to illustrate the proposed methods.  相似文献   

6.
Specific age‐related hypotheses are tested in population‐based longitudinal studies. At specific time intervals, both the outcomes of interest and the time‐varying covariates are measured. When participants are approached for follow‐up, some participants do not provide data. Investigations may show that many have died before the time of follow‐up whereas others refused to participate. Some of these non‐participants do not provide data at later follow‐ups. Few statistical methods for missing data distinguish between ‘non‐participation’ and ‘death’ among study participants. The augmented inverse probability‐weighted estimators are most commonly used in marginal structure models when data are missing at random. Treating non‐participation and death as the same, however, may lead to biased estimates and invalid inferences. To overcome this limitation, a multiple inverse probability‐weighted approach is presented to account for two types of missing data, non‐participation and death, when using a marginal mean model. Under certain conditions, the multiple weighted estimators are consistent and asymptotically normal. Simulation studies will be used to study the finite sample efficiency of the multiple weighted estimators. The proposed method will be applied to study the risk factors associated with the cognitive decline among the aging adults, using data from the Chicago Health and Aging Project (CHAP). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

7.
Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for some subjects. In this article, we investigate methods for multinomial logistic regression with missing outcome data, including a bootstrap hot deck multiple imputation (BHMI), simple inverse probability weighted (SIPW), augmented inverse probability weighted (AIPW), and expected estimating equation (EEE) estimators. These methods are important approaches for missing data regression. The BHMI modifies the standard hot deck multiple imputation method such that it can provide valid confidence interval estimation. Under the situation when the covariates are discrete, the SIPW, AIPW, and EEE estimators are numerically identical. When the covariates are continuous, nonparametric smoothers can be applied to estimate the selection probabilities and the estimating scores. These methods perform similarly. Extensive simulations show that all of these methods yield unbiased estimators while the complete-case (CC) analysis can be biased if the missingness depends on the observed data. Our simulations also demonstrate that these methods can gain substantial efficiency compared with the CC analysis. The methods are applied to a colorectal cancer study in which cancer subtype data are missing among some study individuals.  相似文献   

8.
A popular method for analysing repeated‐measures data is generalized estimating equations (GEE). When response data are missing at random (MAR), two modifications of GEE use inverse‐probability weighting and imputation. The weighted GEE (WGEE) method involves weighting observations by their inverse probability of being observed, according to some assumed missingness model. Imputation methods involve filling in missing observations with values predicted by an assumed imputation model. WGEE are consistent when the data are MAR and the dropout model is correctly specified. Imputation methods are consistent when the data are MAR and the imputation model is correctly specified. Recently, doubly robust (DR) methods have been developed. These involve both a model for probability of missingness and an imputation model for the expectation of each missing observation, and are consistent when either is correct. We describe DR GEE, and illustrate their use on simulated data. We also analyse the INITIO randomized clinical trial of HIV therapy allowing for MAR dropout. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

9.
Incomplete multi‐level data arise commonly in many clinical trials and observational studies. Because of multi‐level variations in this type of data, appropriate data analysis should take these variations into account. A random effects model can allow for the multi‐level variations by assuming random effects at each level, but the computation is intensive because high‐dimensional integrations are often involved in fitting models. Marginal methods such as the inverse probability weighted generalized estimating equations can involve simple estimation computation, but it is hard to specify the working correlation matrix for multi‐level data. In this paper, we introduce a latent variable method to deal with incomplete multi‐level data when the missing mechanism is missing at random, which fills the gap between the random effects model and marginal models. Latent variable models are built for both the response and missing data processes to incorporate the variations that arise at each level. Simulation studies demonstrate that this method performs well in various situations. We apply the proposed method to an Alzheimer's disease study. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
Several methods for the estimation and comparison of rates of change in longitudinal studies with staggered entry and informative drop-outs have been recently proposed. For multivariate normal linear models, REML estimation is used. There are various approaches to maximizing the corresponding log-likelihood; in this paper we use a restricted iterative generalized least squares method (RIGLS) combined with a nested EM algorithm. An important statistical problem in such approaches is the estimation of the standard errors adjusted for the missing data (observed data information matrix). Louis has provided a general technique for computing the observed data information in terms of completed data quantities within the EM framework. The multiple imputation (MI) method for obtaining variances can be regarded as an alternative to this. The aim of this paper is to develop, apply and compare the Louis and a modified MI method in the setting of longitudinal studies where the source of missing data is either death or disease progression (informative) or end of the study (assumed non-informative). Longitudinal data are simultaneously modelled with the missingness process. The methods are illustrated by modelling CD4 count data from an HIV-1 clinical trial and evaluated through simulation studies. Both methods, Louis and MI, are used with Monte Carlo simulations of the missing data using the appropriate conditional distributions, the former with 100 simulations, the latter with 5 and 10. It is seen that naive SEs based on the completed data likelihood can be seriously biased. This bias was largely corrected by Louis and modified MI methods, which gave broadly similar estimates. Given the relative simplicity of the modified MI method, it may be preferable.  相似文献   

11.
The attributable fraction (AF) is often used to explore the policy implications of an association between a disease and an exposure. To date, there have been no proposed estimators of AF in the context of partial questionnaire designs (PQD). The PQD, first proposed in a public health context by Wacholder is often used to enhance response rates in questionnaires. It involves eliciting responses from each subject on preassigned subsets of questions, thereby reducing the burden of response. We propose a computationally efficient method of estimating logistic (or more generally, binary) regression parameters from a PQD model where there is non-response to the questionnaire and the rates of non-response differ between sub-populations. Assuming a log-linear model for the distribution of missing covariates, we employ the methods of Wacholder to motivate consistent estimating equations, and weight each subject's contribution to the estimating function by the inverse probability of responding to the questionnaire. We also propose techniques for goodness-of-fit to assist in model selection. We then use the PQD regression estimates to derive an estimate of AF similar to that proposed by Bruzzi. Finally, we demonstrate our methods using data obtained from a study on adult occupational asthma, conducted within a Massachusetts HMO. Although we concentrate on a particular type of missing data mechanism, other missing data techniques can be incorporated into AF estimation in a similar manner.  相似文献   

12.
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool‐adjacent‐violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood‐based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

13.
Although randomized experiments are widely regarded as the gold standard for estimating causal effects, missing data of the pretreatment covariates makes it challenging to estimate the subgroup causal effects. When the missing data mechanism of the covariates is nonignorable, the parameters of interest are generally not pointly identifiable, and we can only get bounds for the parameters of interest, which may be too wide for practical use. In some real cases, we have prior knowledge that some restrictions may be plausible. We show the identifiability of the causal effects and joint distributions for four interpretable missing data mechanisms and evaluate the performance of the statistical inference via simulation studies. One application of our methods to a real data set from a randomized clinical trial shows that one of the nonignorable missing data mechanisms fits better than the ignorable missing data mechanism, and the results conform to the study's original expert opinions. We also illustrate the potential applications of our methods to observational studies using a data set from a job‐training program. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

14.
We extend the marginalized transition model of Heagerty to accommodate non-ignorable monotone drop-out. Using a selection model, weakly identified drop-out parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40 per cent compared to a likelihood-based marginalized transition model (MTM) with comparable modelling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and non-ignorable missing data, and both reduce bias noticeably by improving model fit.  相似文献   

15.
目的:介绍4种多重并行中介模型的分析方法,包括纯回归法、逆概率加权法、扩展的自然效应模型和基于权重的填补法,并对其进行探讨和比较。方法:针对多重并行中介模型,通过3种情境的模拟试验比较不同方法在不同情境下估计直接效应和间接效应的表现,并应用英国生物样本库的数据集进行实例分析。结果:模拟试验和实例分析结果显示纯回归法和逆...  相似文献   

16.
In longitudinal studies with potentially nonignorable drop-out, one can assess the likely effect of the nonignorability in a sensitivity analysis. Troxel et al. proposed a general index of sensitivity to nonignorability, or ISNI, to measure sensitivity of key inferences in a neighbourhood of the ignorable, missing at random (MAR) model. They derived detailed formulas for ISNI in the special case of the generalized linear model with a potentially missing univariate outcome. In this paper, we extend the method to longitudinal modelling. We use a multivariate normal model for the outcomes and a regression model for the drop-out process, allowing missingness probabilities to depend on an unobserved response. The computation is straightforward, and merely involves estimating a mixed-effects model and a selection model for the drop-out, together with some simple arithmetic calculations. We illustrate the method with three examples.  相似文献   

17.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

18.
The analysis of quality of life (QoL) data can be challenging due to the skewness of responses and the presence of missing data. In this paper, we propose a new weighted quantile regression method for estimating the conditional quantiles of QoL data with responses missing at random. The proposed method makes use of the correlation information within the same subject from an auxiliary mean regression model to enhance the estimation efficiency and takes into account of missing data mechanism. The asymptotic properties of the proposed estimator have been studied and simulations are also conducted to evaluate the performance of the proposed estimator. The proposed method has also been applied to the analysis of the QoL data from a clinical trial on early breast cancer, which motivated this study.  相似文献   

19.
Propensity score (PS) methods have been used extensively to adjust for confounding factors in the statistical analysis of observational data in comparative effectiveness research. There are four major PS‐based adjustment approaches: PS matching, PS stratification, covariate adjustment by PS, and PS‐based inverse probability weighting. Though covariate adjustment by PS is one of the most frequently used PS‐based methods in clinical research, the conventional variance estimation of the treatment effects estimate under covariate adjustment by PS is biased. As Stampf et al. have shown, this bias in variance estimation is likely to lead to invalid statistical inference and could result in erroneous public health conclusions (e.g., food and drug safety and adverse events surveillance). To address this issue, we propose a two‐stage analytic procedure to develop a valid variance estimator for the covariate adjustment by PS analysis strategy. We also carry out a simple empirical bootstrap resampling scheme. Both proposed procedures are implemented in an R function for public use. Extensive simulation results demonstrate the bias in the conventional variance estimator and show that both proposed variance estimators offer valid estimates for the true variance, and they are robust to complex confounding structures. The proposed methods are illustrated for a post‐surgery pain study. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
It is common to have missing genotypes in practical genetic studies, but the exact underlying missing data mechanism is generally unknown to the investigators. Although some statistical methods can handle missing data, they usually assume that genotypes are missing at random, that is, at a given marker, different genotypes and different alleles are missing with the same probability. These include those methods on haplotype frequency estimation and haplotype association analysis. However, it is likely that this simple assumption does not hold in practice, yet few studies to date have examined the magnitude of the effects when this simplifying assumption is violated. In this study, we demonstrate that the violation of this assumption may lead to serious bias in haplotype frequency estimates, and haplotype association analysis based on this assumption can induce both false-positive and false-negative evidence of association. To address this limitation in the current methods, we propose a general missing data model to characterize missing data patterns across a set of two or more markers simultaneously. We prove that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under our general missing data model. Simulation studies on the analysis of haplotypes consisting of two single nucleotide polymorphisms illustrate that our proposed model can reduce the bias both for haplotype frequency estimates and association analysis due to incorrect assumption on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号