首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Linear regression is one of the most popular statistical techniques. In linear regression analysis, missing covariate data occur often. A recent approach to analyse such data is a weighted estimating equation. With weighted estimating equations, the contribution to the estimating equation from a complete observation is weighted by the inverse 'probability of being observed'. In this paper, we propose a weighted estimating equation in which we wrongly assume that the missing covariates are multivariate normal, but still produces consistent estimates as long as the probability of being observed is correctly modelled. In simulations, these weighted estimating equations appear to be highly efficient when compared to the most efficient weighted estimating equation as proposed by Robins et al. and Lipsitz et al. However, these weighted estimating equations, in which we wrongly assume that the missing covariates are multivariate normal, are much less computationally intensive than the weighted estimating equations given by Lipsitz et al. We compare the weighted estimating equations proposed in this paper to the efficient weighted estimating equations via an example and a simulation study. We only consider missing data which are missing at random; non-ignorably missing data are not addressed in this paper.  相似文献   

2.
In this paper we compare several methods for estimating population disease prevalence from data collected by two-phase sampling when there is non-response at the second phase. The traditional weighting type estimator requires the missing completely at random assumption and may yield biased estimates if the assumption does not hold. We review two approaches and propose one new approach to adjust for non-response assuming that the non-response depends on a set of covariates collected at the first phase: an adjusted weighting type estimator using estimated response probability from a response model; a modelling type estimator using predicted disease probability from a disease model; and a regression type estimator combining the adjusted weighting type estimator and the modelling type estimator. These estimators are illustrated using data from an Alzheimer's disease study in two populations.  相似文献   

3.
Missing covariate values are prevalent in regression applications. While an array of methods have been developed for estimating parameters in regression models with missing covariate data for a variety of response types, minimal focus has been given to validation of the response model and influence diagnostics. Previous research has mainly focused on estimating residuals for observations with missing covariates using expected values, after which specialized techniques are needed to conduct proper inference. We suggest a multiple imputation strategy that allows for the use of standard methods for residual analyses on the imputed data sets or a stacked data set. We demonstrate the suggested multiple imputation method by analyzing the Sleep in Mammals data in the context of a linear regression model and the New York Social Indicators Status data with a logistic regression model.  相似文献   

4.
BACKGROUND: Non-response is an important potential source of bias in survey research. With evidence of falling response rates from GPs, it is of increasing importance when undertaking postal questionnaire surveys of GPs to seek to maximize response rates and evaluate the potential for non-response bias. OBJECTIVES: Our aim was to investigate the effectiveness of follow-up procedures when undertaking a postal questionnaire study of GPs, the use of publicly available data in assessing non-response bias and the development of regression models predicting responder behaviour. METHOD: A postal questionnaire study was carried out of a random sample of 600 GPs in Wales concerning their training and knowledge in palliative care. RESULTS: A cumulative response rate graph permitted optimal timing of follow-up mailings: a final response rate of 67.6% was achieved. Differences were found between responders and non-responders on several parameters and between sample and population on some parameters: some of these may bias the sample data. Logistic regression analysis indicated medical school of qualification and current membership of the Royal College of General Practitioners to be the only significant predictors of responders. Late responders were significantly more likely to have been qualified for longer. CONCLUSIONS: This study has several implications for future postal questionnaire studies of GPs. The optimal timing of reminders may be judged from plotting the cumulative response rate: it is worth sending at least three reminders. There are few parameters that significantly predict GPs who are unlikely to respond; more of these may be included in the sample, or they may be targeted for special attention. Publicly available data may be used readily in the analysis of non-response bias and generalizability.  相似文献   

5.
The analysis of quality of life (QoL) data can be challenging due to the skewness of responses and the presence of missing data. In this paper, we propose a new weighted quantile regression method for estimating the conditional quantiles of QoL data with responses missing at random. The proposed method makes use of the correlation information within the same subject from an auxiliary mean regression model to enhance the estimation efficiency and takes into account of missing data mechanism. The asymptotic properties of the proposed estimator have been studied and simulations are also conducted to evaluate the performance of the proposed estimator. The proposed method has also been applied to the analysis of the QoL data from a clinical trial on early breast cancer, which motivated this study.  相似文献   

6.
In this article we consider the problem of making inferences about the parameter β0 indexing the conditional mean of an outcome given a vector of regressors when a subset of the variables (outcome or covariates) are missing for some study subjects and the probability of non-response depends upon both observed and unobserved data values, that is, non-response is non-ignorable. We propose a new class of inverse probability of censoring weighted estimators that are consistent and asymptotically normal (CAN) for estimating β0 when the non-response probabilities can be parametrically modelled and a CAN estimator exists. The proposed estimators do not require full specification of the likelihood and their computation does not require numerical integration. We show that the asymptotic variance of the optimal estimator in our class attains the semi-parametric variance bound for the model. In some models, no CAN estimator of β0 exists. We provide a general algorithm for determining when CAN estimators of β0 exist. Our results follow after specializing a general representation described in the article for the efficient score and the influence function of regular, asymptotically linear estimators in an arbitrary semi-parametric model with non-ignorable non-response in which the probability of observing complete data is bounded away from zero and the non-response probabilities can be parametrically modelled. © 1997 by John Wiley & Sons, Ltd.  相似文献   

7.
Jung SH  Ahn CW 《Statistics in medicine》2005,24(17):2583-2596
Controlled clinical trials often randomize subjects to two treatment groups and repeatedly evaluate them at baseline and intervals across a treatment period of fixed duration. A popular primary objective in these trials is to compare the change rates in the repeated measurements between treatment groups. Repeated measurements usually involve missing data and a serial correlation within each subject. The generalized estimating equation (GEE) method has been widely used to fit the time trend in repeated measurements because of its robustness to random missing and mispecification of the true correlation structure. In this paper, we propose a closed form sample size formula for comparing the change rates of binary repeated measurements using GEE for a two-group comparison. The sample size formula is derived incorporating missing patterns, such as independent missing and monotone missing, and correlation structures, such as AR(1) model. We also propose an algorithm to generate correlated binary data with arbitrary marginal means and a Markov dependency and use it in simulation studies.  相似文献   

8.
The generalized estimating equation (GEE), a distribution‐free, or semi‐parametric, approach for modeling longitudinal data, is used in a wide range of behavioral, psychotherapy, pharmaceutical drug safety, and healthcare‐related research studies. Most popular methods for assessing model fit are based on the likelihood function for parametric models, rendering them inappropriate for distribution‐free GEE. One rare exception is a score statistic initially proposed by Tsiatis for logistic regression (1980) and later extended by Barnhart and Willamson to GEE (1998). Because GEE only provides valid inference under the missing completely at random assumption and missing values arising in most longitudinal studies do not follow such a restricted mechanism, this GEE‐based score test has very limited applications in practice. We propose extensions of this goodness‐of‐fit test to address missing data under the missing at random assumption, a more realistic model that applies to most studies in practice. We examine the performance of the proposed tests using simulated data and demonstrate the utilities of such tests with data from a real study on geriatric depression and associated medical comorbidities. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Genotype-based likelihood-ratio tests (LRT) of association that examine maternal and parent-of-origin effects have been previously developed in the framework of log-linear and conditional logistic regression models. In the situation where parental genotypes are missing, the expectation-maximization (EM) algorithm has been incorporated in the log-linear approach to allow incomplete triads to contribute to the LRT. We present an extension to this model which we call the Combined_LRT that incorporates additional information from the genotypes of unaffected siblings to improve assignment of incompletely typed families to mating type categories, thereby improving inference of missing parental data. Using simulations involving a realistic array of family structures, we demonstrate the validity of the Combined_LRT under the null hypothesis of no association and provide power comparisons under varying levels of missing data and using sibling genotype data. We demonstrate the improved power of the Combined_LRT compared with the family-based association test (FBAT), another widely used association test. Lastly, we apply the Combined_LRT to a candidate gene analysis in Autism families, some of which have missing parental genotypes. We conclude that the proposed log-linear model will be an important tool for future candidate gene studies, for many complex diseases where unaffected siblings can often be ascertained and where epigenetic factors such as imprinting may play a role in disease etiology.  相似文献   

10.
Analysis of health care cost data is often complicated by a high level of skewness, heteroscedastic variances and the presence of missing data. Most of the existing literature on cost data analysis have been focused on modeling the conditional mean. In this paper, we study a weighted quantile regression approach for estimating the conditional quantiles health care cost data with missing covariates. The weighted quantile regression estimator is consistent, unlike the naive estimator, and asymptotically normal. Furthermore, we propose a modified BIC for variable selection in quantile regression when the covariates are missing at random. The quantile regression framework allows us to obtain a more complete picture of the effects of the covariates on the health care cost and is naturally adapted to the skewness and heterogeneity of the cost data. The method is semiparametric in the sense that it does not require to specify the likelihood function for the random error or the covariates. We investigate the weighted quantile regression procedure and the modified BIC via extensive simulations. We illustrate the application by analyzing a real data set from a health care cost study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

11.
Longitudinal binomial data are frequently generated from multiple questionnaires and assessments in various scientific settings for which the binomial data are often overdispersed. The standard generalized linear mixed effects model may result in severe underestimation of standard errors of estimated regression parameters in such cases and hence potentially bias the statistical inference. In this paper, we propose a longitudinal beta‐binomial model for overdispersed binomial data and estimate the regression parameters under a probit model using the generalized estimating equation method. A hybrid algorithm of the Fisher scoring and the method of moments is implemented for computing the method. Extensive simulation studies are conducted to justify the validity of the proposed method. Finally, the proposed method is applied to analyze functional impairment in subjects who are at risk of Huntington disease from a multisite observational study of prodromal Huntington disease. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

12.
Li L  Palta M  Shao J 《Statistics in medicine》2004,23(16):2527-2536
We study a linear model in which one of the covariates is measured with error. The surrogate for this covariate is the event count in unit time. We model the event count by a Poisson distribution, the rate of which is the unobserved true covariate. We show that ignoring the measurement error leads to inconsistent estimators of the regression coefficients and propose a set of unbiased estimating equations to correct the bias. The method is computationally simple and does not require using supplemental data as is often the case in other measurement error analyses. No distributional assumption is made for the unobserved covariate. The proposed method is illustrated with an example from the Wisconsin Sleep Cohort Study.  相似文献   

13.
Missing covariates in regression analysis are a pervasive problem in medical, social, and economic researches. We study empirical-likelihood confidence regions for unconstrained and constrained regression parameters in a nonignorable covariate-missing data problem. For an assumed conditional mean regression model, we assume that some covariates are fully observed but other covariates are missing for some subjects. By exploitation of a probability model of missingness and a working conditional score model from a semiparametric perspective, we build a system of unbiased estimating equations, where the number of equations exceeds the number of unknown parameters. Based on the proposed estimating equations, we introduce unconstrained and constrained empirical-likelihood ratio statistics to construct empirical-likelihood confidence regions for the underlying regression parameters without and with constraints. We establish the asymptotic distributions of the proposed empirical-likelihood ratio statistics. Simulation results show that the proposed empirical-likelihood methods have a better finite-sample performance than other competitors in terms of coverage probability and interval length. Finally, we apply the proposed empirical-likelihood methods to the analysis of a data set from the US National Health and Nutrition Examination Survey.  相似文献   

14.
Survival analysis has been conventionally performed on a continuous time scale. In practice, the survival time is often recorded or handled on a discrete scale; when this is the case, the discrete-time survival analysis would provide analysis results more relevant to the actual data scale. Besides, data on time-dependent covariates in the survival analysis are usually collected through intermittent follow-ups, resulting in the missing and mismeasured covariate data. In this work, we propose the sufficient discrete hazard (SDH) approach to discrete-time survival analysis with longitudinal covariates that are subject to missingness and mismeasurement. The SDH method employs the conditional score idea available for dealing with mismeasured covariates, and the penalized least squares for estimating the missing covariate value using the regression spline basis. The SDH method is developed for the single event analysis with the logistic discrete hazard model, and for the competing risks analysis with the multinomial logit model. Simulation results revel good finite-sample performances of the proposed estimator and the associated asymptotic theory. The proposed SDH method is applied to the scleroderma lung study data, where the time to medication withdrawal and time to death were recorded discretely in months, for illustration.  相似文献   

15.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

16.
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

17.
BACKGROUND: Health related quality of life is becoming of greater importance in the medical field. Nevertheless, methodological problems persist, and particularly when it comes to processing missing data on quality of life questionnaires. In fact, this leads to three difficulties: (i) loss of power; (ii) bias; (iii) choice of the most adequate method for treating missing data. Prevention is the best recommendation in order to avoid unanswered questions. Unfortunately, this does not guarantee the absence of missing data. Therefore, the treatment of missing data depends on: i) identification of the missing data mechanism and ii) choice of the most appropriate method to correct the data. The main objective of this article is to illustrate the identification of non-response items as described in the SF-36 questionnaire items in the SU.VI.MAX study. METHODS: A logistic regression on the characteristics of the subjects was used to distinguish between two missing data mechanisms: missing completely at random (MCAR) and missing at random (MAR). Two global Chi-2 tests on MCAR mechanism were proposed. The missing data not at random (MNAR) mechanism was also analysed considering the questionnaire features. RESULTS: The percentage of non-responses was small (1.7%), with a maximum equal to 3% for four questions of the General Health dimension (GH2 to GH5). Both global Chi-2 tests rejected the hypothesis that all SF-36 non-responses were MCAR. As to the 32 items with less than 2.3% of non-responses, the mechanisms were: MCAR for 29 items, MAR for 2 items, and probably MNAR for 1 item. The logistic regression indicates that the factors related to non-responses were gender (female), age (> or =50 years), attention problem, and number of children (> or =3). The hierarchical feature of item PF5 (climb one flight of stairs) in relation to PF4 (climb several flights of stairs) would be a generator MNAR non-responses. The "I don't know" response modality of bloc GH2 to GH5 would also be generator of non-responses of the MNAR type. CONCLUSION: The identification of missing data mechanisms through statistical analysis and through further reflection on the questionnaire's features is a necessary preliminary in the treatment of non-responses.  相似文献   

18.
Most longitudinal studies of elderly are characterized by substantial drop-out due to death and many other factors beyond the control of the investigators. In a two-phase longitudinal study of dementia, subjects with cognitive impairment skip the first phase survey in the next follow-up, leading to intermittent missing variables measured in that phase. In the context of analysing pre-dementia cognitive decline in an elderly population, both of the two causes of non-response can potentially be informative in the sense that the missingness is dependent on the unobserved outcome. To take these factors into account, mixed-effects models are constructed to allow the outcome and the multiple causes of missing values to share the same 'random parameter' or random effect. The crucial assumption of our model is that the random effects of the model for the outcome and that of the model for the missing-data indicators are linked in a deterministic manner. It can be thought of as an approximation of a more general and realistic situation, in which the two models have distinct, yet dependent, random effects. We conduct a simulation study to investigate possible deviations of the estimates under such a scenario. A second simulation illustrates the magnitude of the bias in estimating the difference of decline rate between two groups when the random effects are linked in different manners for the two groups.  相似文献   

19.
We investigated the non-response rates to the question “I am satisfied with my sex life” in the Functional Assessment of Cancer Therapy – General questionnaire in Chinese (n = 769), Malay (n = 41) and Indian (n = 33) patients in Singapore, a multi-ethnic society whose residents are said to have a conservative sexual attitude. Non-response rates to the question were 44%, 22% and 24% in the three groups respectively. The rates were much higher than that reported previously in a US study (7%) and used in the associated simulation study of the simple mean imputation method. We further examined the Chinese respondents in detail. The odds of non-response and the scores among the responders were associated with several demographic and clinical characteristics. Using the checklist proposed by Fayers et al. [Stat Med 1998; 17: 679–696] to assess the data patterns, we found that the application of the simple mean imputation is questionable. We employed an alternative (multiple) imputation procedure that took into account covariates that predicted the odds of non-response and the observed response scores. We compared the analytic results based on different approaches to handling missing values, and found that analysis based on the simple mean imputation gave results similar to that based on multiply imputed data even in this quite extreme example.  相似文献   

20.
Suppose we use generalized estimating equations to estimate a marginal regression model for repeated binary observations. There are no established summary statistics available for assessing the adequacy of the fitted model. In this paper we propose a goodness-of-fit test statistic which has an approximate chi-squared distribution when we have specified the model correctly. The proposed statistic can be viewed as an extension of the Hosmer and Lemeshow goodness-of-fit statistic for ordinary logistic regression to marginal regression models for repeated binary responses. We illustrate the methods using data from a study of mental health service utilization by children. The repeated responses are a set of binary measures of service use. We fit a marginal logistic regression model to the data using generalized estimating equations, and we apply the proposed goodness-of-fit statistic to assess the adequacy of the fitted model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号