首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double‐robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood‐based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

2.
The inverse probability weighted estimator is often applied to two-phase designs and regression with missing covariates. Inverse probability weighted estimators typically are less efficient than likelihood-based estimators but, in general, are more robust against model misspecification. In this paper, we propose a best linear inverse probability weighted estimator for two-phase designs and missing covariate regression. Our proposed estimator is the projection of the SIPW onto the orthogonal complement of the score space based on a working regression model of the observed covariate data. The efficiency gain is from the use of the association between the outcome variable and the available covariates, which is the working regression model. One advantage of the proposed estimator is that there is no need to calculate the augmented term of the augmented weighted estimator. The estimator can be applied to general missing data problems or two-phase design studies in which the second phase data are obtained in a subcohort. The method can also be applied to secondary trait case-control genetic association studies. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via extensive simulation studies. The methods are applied to a bladder cancer case-control study.  相似文献   

3.
Missing data is a very common problem in medical and social studies, especially when data are collected longitudinally. It is a challenging problem to utilize observed data effectively. Many papers on missing data problems can be found in statistical literature. It is well known that the inverse weighted estimation is neither efficient nor robust. On the other hand, the doubly robust (DR) method can improve the efficiency and robustness. As is known, the DR estimation requires a missing data model (i.e., a model for the probability that data are observed) and a working regression model (i.e., a model for the outcome variable given covariates and surrogate variables). Because the DR estimating function has mean zero for any parameters in the working regression model when the missing data model is correctly specified, in this paper, we derive a formula for the estimator of the parameters of the working regression model that yields the optimally efficient estimator of the marginal mean model (the parameters of interest) when the missing data model is correctly specified. Furthermore, the proposed method also inherits the DR property. Simulation studies demonstrate the greater efficiency of the proposed method compared with the standard DR method. A longitudinal dementia data set is used for illustration. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

4.
Analysis of health care cost data is often complicated by a high level of skewness, heteroscedastic variances and the presence of missing data. Most of the existing literature on cost data analysis have been focused on modeling the conditional mean. In this paper, we study a weighted quantile regression approach for estimating the conditional quantiles health care cost data with missing covariates. The weighted quantile regression estimator is consistent, unlike the naive estimator, and asymptotically normal. Furthermore, we propose a modified BIC for variable selection in quantile regression when the covariates are missing at random. The quantile regression framework allows us to obtain a more complete picture of the effects of the covariates on the health care cost and is naturally adapted to the skewness and heterogeneity of the cost data. The method is semiparametric in the sense that it does not require to specify the likelihood function for the random error or the covariates. We investigate the weighted quantile regression procedure and the modified BIC via extensive simulations. We illustrate the application by analyzing a real data set from a health care cost study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
Survival analysis has been conventionally performed on a continuous time scale. In practice, the survival time is often recorded or handled on a discrete scale; when this is the case, the discrete-time survival analysis would provide analysis results more relevant to the actual data scale. Besides, data on time-dependent covariates in the survival analysis are usually collected through intermittent follow-ups, resulting in the missing and mismeasured covariate data. In this work, we propose the sufficient discrete hazard (SDH) approach to discrete-time survival analysis with longitudinal covariates that are subject to missingness and mismeasurement. The SDH method employs the conditional score idea available for dealing with mismeasured covariates, and the penalized least squares for estimating the missing covariate value using the regression spline basis. The SDH method is developed for the single event analysis with the logistic discrete hazard model, and for the competing risks analysis with the multinomial logit model. Simulation results revel good finite-sample performances of the proposed estimator and the associated asymptotic theory. The proposed SDH method is applied to the scleroderma lung study data, where the time to medication withdrawal and time to death were recorded discretely in months, for illustration.  相似文献   

6.
Motivated by an epidemiological survey of fracture in elderly women, we develop a semiparametric regression analysis of current status data with incompletely observed covariate under the proportional odds model. To accommodate both the interval‐censored nature of current status failure time data and the incompletely observed covariate data, we propose an analysis based on the validation likelihood (VL), which is derived from likelihood pertaining to the validation sample, namely the subset of the sample where the data are completely observed. The missing data mechanism is assumed to be missing at random and is explicitly modeled and estimated in the VL approach. We propose implementing the VL method by integrating self‐consistency and Newton–Raphson algorithms. Asymptotic normality and standard error estimation for the proposed estimator of the regression parameter are guaranteed. Simulation results reveal good performance of the VL estimator. The VL method has some gain in efficiency compared with the naive complete case method. But the VL method leads to unbiased estimators, whereas the complete case method does not when missing covariates are not missing completely at random. Application of the VL approach to the fracture data confirms that osteoporosis (low bone density) is a strong risk factor for the age at onset of fracture in elderly women. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
Sato T 《Statistics in medicine》2001,20(17-18):2761-2774
When analysing repeated binary data from randomized trials, the model-based approaches, such as generalized estimating equations, are frequently used. Such methods ignore compliance information and give the model-based intention-to-treat estimate of treatment effect. In this paper, the design-based (randomization-based) semi-parametric estimation procedure is given in the estimation of causal risk difference. The resulting risk difference estimator is interpreted as an extension of the instrumental variables estimator for a binary outcome which has the causal interpretation. Extension of the proposed method to stratified analysis is given for data from stratified randomization or meta-analysis. It yields a Mantel-Haenszel type risk difference estimator. As a special case of stratified analysis, the pattern mixture model which stratifies the data by pattern of missing data is performed. Application of the proposed method to a trial in which endpoints were the occurrences of fever over three courses is provided. The same ideas are applied to the causal risk ratio estimation.  相似文献   

8.
BACKGROUND: Using an application and a simulation study we show the bias induced by missing data in the outcome in longitudinal studies and discuss suitable statistical methods according to the type of missing responses when the variable under study is gaussian. Method: The model used for the analysis of gaussian longitudinal data is the mixed effects linear model. When the probability of response does not depend on the missing values of the outcome and on the parameters of the linear model, missing data are ignorable, and parameters of the mixed effects linear model may be estimated by the maximum likelihood method with classical softwares. When the missing data are non ignorable, several methods have been proposed. We describe the method proposed by Diggle and Kenward (1994) (DK method) for which a software is available. This model consists in the combination of a linear mixed effects model for the outcome variable and a logistic model for the probability of response which depends on the outcome variable. RESULTS: A simulation study shows the efficacy of this method and its limits when the data are not normal. In this case, estimators obtained by the DK approach may be more biased than estimators obtained under the hypothesis of ignorable missing data even if the data are non ignorable. Data of the Paquid cohort about the evolution of the scores to a neuropsychological test among elderly subjects show the bias of a naive analysis using all available data. Although missing responses are not ignorable in this study, estimates of the linear mixed effects model are not very different using the DK approach and the hypothesis of ignorable missing data. CONCLUSION: Statistical methods for longitudinal data including non ignorable missing responses are sensitive to hypotheses difficult to verify. Thus, it will be better in practical applications to perform an analysis under the hypothesis of ignorable missing responses and compare the results obtained with several approaches for non ignorable missing data. However, such a strategy requires development of new softwares.  相似文献   

9.
Individualized coefficient alpha is defined. It is item and subject specific and is used to measure the quality of test score data with heterogenicity among the subjects and items. A regression model is developed based on 3 sets of generalized estimating equations. The first set of generalized estimating equation models the expectation of the responses, the second set models the response's variance, and the third set is proposed to estimate the individualized coefficient alpha, defined and used to measure individualized internal consistency of the responses. We also use different techniques to extend our method to handle missing data. Asymptotic property of the estimators is discussed, based on which inference on the coefficient alpha is derived. Performance of our method is evaluated through simulation study and real data analysis. The real data application is from a health literacy study in Hunan province of China.  相似文献   

10.
Wang M  Long Q 《Statistics in medicine》2011,30(11):1278-1291
Generalized estimating equations (GEE (Biometrika 1986; 73(1):13-22) is a general statistical method to fit marginal models for correlated or clustered responses, and it uses a robust sandwich estimator to estimate the variance-covariance matrix of the regression coefficient estimates. While this sandwich estimator is robust to the misspecification of the correlation structure of the responses, its finite sample performance deteriorates as the number of clusters or observations per cluster decreases. To address this limitation, Pan (Biometrika 2001; 88(3):901-906) and Mancl and DeRouen (Biometrics 2001; 57(1):126-134) investigated two modifications to the original sandwich variance estimator. Motivated by the ideas underlying these two modifications, we propose a novel robust variance estimator that combines the strengths of these estimators. Our theoretical and numerical results show that the proposed estimator attains better efficiency and achieves better finite sample performance compared with existing estimators. In particular, when the sample size or cluster size is small, our proposed estimator exhibits lower bias and the resulting confidence intervals for GEE estimates achieve better coverage rates performance. We illustrate the proposed method using data from a dental study.  相似文献   

11.
Propensity scores have been used widely as a bias reduction method to estimate the treatment effect in nonrandomized studies. Since many covariates are generally included in the model for estimating the propensity scores, the proportion of subjects with at least one missing covariate could be large. While many methods have been proposed for propensity score‐based estimation in the presence of missing covariates, little has been published comparing the performance of these methods. In this article we propose a novel method called multiple imputation missingness pattern (MIMP) and compare it with the naive estimator (ignoring propensity score) and three commonly used methods of handling missing covariates in propensity score‐based estimation (separate estimation of propensity scores within each pattern of missing data, multiple imputation and discarding missing data) under different mechanisms of missing data and degree of correlation among covariates. Simulation shows that all adjusted estimators are much less biased than the naive estimator. Under certain conditions MIMP provides benefits (smaller bias and mean‐squared error) compared with existing alternatives. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

12.
The attributable fraction (AF) is often used to explore the policy implications of an association between a disease and an exposure. To date, there have been no proposed estimators of AF in the context of partial questionnaire designs (PQD). The PQD, first proposed in a public health context by Wacholder is often used to enhance response rates in questionnaires. It involves eliciting responses from each subject on preassigned subsets of questions, thereby reducing the burden of response. We propose a computationally efficient method of estimating logistic (or more generally, binary) regression parameters from a PQD model where there is non-response to the questionnaire and the rates of non-response differ between sub-populations. Assuming a log-linear model for the distribution of missing covariates, we employ the methods of Wacholder to motivate consistent estimating equations, and weight each subject's contribution to the estimating function by the inverse probability of responding to the questionnaire. We also propose techniques for goodness-of-fit to assist in model selection. We then use the PQD regression estimates to derive an estimate of AF similar to that proposed by Bruzzi. Finally, we demonstrate our methods using data obtained from a study on adult occupational asthma, conducted within a Massachusetts HMO. Although we concentrate on a particular type of missing data mechanism, other missing data techniques can be incorporated into AF estimation in a similar manner.  相似文献   

13.
For longitudinal binary data with non‐monotone non‐ignorably missing outcomes over time, a full likelihood approach is complicated algebraically, and with many follow‐up times, maximum likelihood estimation can be computationally prohibitive. As alternatives, two pseudo‐likelihood approaches have been proposed that use minimal parametric assumptions. One formulation requires specification of the marginal distributions of the outcome and missing data mechanism at each time point, but uses an ‘independence working assumption,’ i.e. an assumption that observations are independent over time. Another method avoids having to estimate the missing data mechanism by formulating a ‘protective estimator.’ In simulations, these two estimators can be very inefficient, both for estimating time trends in the first case and for estimating both time‐varying and time‐stationary effects in the second. In this paper, we propose the use of the optimal weighted combination of these two estimators, and in simulations we show that the optimal weighted combination can be much more efficient than either estimator alone. Finally, the proposed method is used to analyze data from two longitudinal clinical trials of HIV‐infected patients. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

14.
There has been a growing interest in developing methodologies to combine information from public domains to improve efficiency in the analysis of relatively small-scale studies that collect more detailed patient-level information. The auxiliary information is usually given in the form of summary statistics or regression coefficients. Thus, the question arises as to how to incorporate the summary information in the model estimation procedure. In this article, we consider statistical analysis of right-censored survival data when additional information about the covariate effects evaluated in a reduced Cox model is available. Recognizing that such external information can be summarized using population moments, we present a unified framework by employing the generalized method of moments to combine information from different sources for the analysis of survival data. The proposed estimator can be shown to be consistent and asymptotically normal; moreover, it is more efficient than the maximum partial likelihood estimator. We also consider incorporating uncertainty of the external information in the inference procedure. Simulation studies show that, by incorporating the additional summary information, the proposed estimators enjoy a substantial gain in efficiency over the conventional approach. A data analysis of a pancreatic cancer cohort study is presented to illustrate the methods and theory.  相似文献   

15.
In the literature of statistical analysis with missing data there is a significant gap in statistical inference for missing data mechanisms especially for nonmonotone missing data, which has essentially restricted the use of the estimation methods which require estimating the missing data mechanisms. For example, the inverse probability weighting methods (Horvitz & Thompson, 1952; Little & Rubin, 2002), including the popular augmented inverse probability weighting (Robins et al, 1994), depend on sufficient models for the missing data mechanisms to reduce estimation bias while improving estimation efficiency. This research proposes a semiparametric likelihood method for estimating missing data mechanisms where an EM algorithm with closed form expressions for both E-step and M-step is used in evaluating the estimate (Zhao et al, 2009; Zhao, 2020). The asymptotic variance of the proposed estimator is estimated from the profile score function. The methods are general and robust. Simulation studies in various missing data settings are performed to examine the finite sample performance of the proposed method. Finally, we analysis the missing data mechanism of Duke cardiac catheterization coronary artery disease diagnostic data to illustrate the method.  相似文献   

16.
Correlation coefficient estimates are often attenuated for truncated samples in the sense that the estimates are biased towards zero. Motivated by real data collected in South Sudan, we consider correlation coefficient estimation with singly truncated bivariate data. By considering a linear regression model in which a truncated variable is used as an explanatory variable, a consistent estimator for the regression slope can be obtained from the ordinary least squares method. A consistent estimator of the correlation coefficient is then obtained by multiplying the regression slope estimator by the variance ratio of the two variables. Results from two limited simulation studies confirm the validity and robustness of the proposed method. The proposed method is applied to the South Sudanese children's anthropometric and nutritional data collected by World Vision. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

17.
In this paper we compare several methods for estimating population disease prevalence from data collected by two-phase sampling when there is non-response at the second phase. The traditional weighting type estimator requires the missing completely at random assumption and may yield biased estimates if the assumption does not hold. We review two approaches and propose one new approach to adjust for non-response assuming that the non-response depends on a set of covariates collected at the first phase: an adjusted weighting type estimator using estimated response probability from a response model; a modelling type estimator using predicted disease probability from a disease model; and a regression type estimator combining the adjusted weighting type estimator and the modelling type estimator. These estimators are illustrated using data from an Alzheimer's disease study in two populations.  相似文献   

18.
Hot-deck imputation is an intuitively simple and popular method of accommodating incomplete data. Users of the method will often use the usual multiple imputation variance estimator which is not appropriate in this case. However, no variance expression has yet been derived for this easily implemented method applied to missing covariates in regression models. The simple hot-deck method is in fact asymptotically equivalent to the mean-score method for the estimation of a regression model parameter, so that hot-deck can be understood in the context of likelihood methods. Both of these methods accommodate data where missingness may depend on the observed variables but not on the unobserved value of the incomplete covariate, that is, missing at random (MAR). The asymptotic properties of hot-deck are derived here for the case where the fully observed variables are categorical, though the incomplete covariate(s) may be continuous. Simulation studies indicate that the two methods compare well in small samples and for small numbers of imputations. Current users of hot-deck may now conduct their analysis using mean-score, which is a weighted likelihood method and can thus be implemented by a single pass through the data using any standard package which accommodates weighted regression models. Valid inference is now straightforward using the variance expression provided here. The equivalence of mean-score and hot-deck is illustrated using three clinical data sets where an important covariate is missing for a large number of study subjects. © 1997 by John Wiley & Sons, Ltd.  相似文献   

19.
In observational studies, estimation of average causal treatment effect on a patient's response should adjust for confounders that are associated with both treatment exposure and response. In addition, the response, such as medical cost, may have incomplete follow‐up. In this article, a double robust estimator is proposed for average causal treatment effect for right censored medical cost data. The estimator is double robust in the sense that it remains consistent when either the model for the treatment assignment or the regression model for the response is correctly specified. Double robust estimators increase the likelihood the results will represent a valid inference. Asymptotic normality is obtained for the proposed estimator, and an estimator for the asymptotic variance is also derived. Simulation studies show good finite sample performance of the proposed estimator and a real data analysis using the proposed method is provided as illustration. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

20.
In behavioral, biomedical, and social‐psychological sciences, it is common to encounter latent variables and heterogeneous data. Mixture structural equation models (SEMs) are very useful methods to analyze these kinds of data. Moreover, the presence of missing data, including both missing responses and missing covariates, is an important issue in practical research. However, limited work has been done on the analysis of mixture SEMs with non‐ignorable missing responses and covariates. The main objective of this paper is to develop a Bayesian approach for analyzing mixture SEMs with an unknown number of components, in which a multinomial logit model is introduced to assess the influence of some covariates on the component probability. Results of our simulation study show that the Bayesian estimates obtained by the proposed method are accurate, and the model selection procedure via a modified DIC is useful in identifying the correct number of components and in selecting an appropriate missing mechanism in the proposed mixture SEMs. A real data set related to a longitudinal study of polydrug use is employed to illustrate the methodology. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号