首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Attrition threatens the internal validity of cohort studies. Epidemiologists use various imputation and weighting methods to limit bias due to attrition. However, the ability of these methods to correct for attrition bias has not been tested. We simulated a cohort of 300 subjects using 500 computer replications to determine whether regression imputation, individual weighting, or multiple imputation is useful to reduce attrition bias. We compared these results to a complete subject analysis. Our logistic regression model included a binary exposure and two confounders. We generated 10, 25, and 40% attrition through three missing data mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR), and used four covariance matrices to vary attrition. We compared true and estimated mean odds ratios (ORs), standard deviations (SDs), and coverage. With data MCAR and MAR for all attrition rates, the complete subject analysis produced results at least as valid as those from the imputation and weighting methods. With data MNAR, no method provided unbiased estimates of the OR at attrition rates of 25 or 40%. When observations are not MAR or MCAR, imputation and weighting methods may not effectively reduce attrition bias.  相似文献   

2.
Loss to follow-up is problematic in most cohort studies and often leads to bias. Although guidelines suggest acceptable follow-up rates, the authors are unaware of studies that test the validity of these recommendations. The objective of this study was to determine whether the recommended follow-up thresholds of 60-80% are associated with biased effects in cohort studies. A simulation study was conducted using 1000 computer replications of a cohort of 500 observations. The logistic regression model included a binary exposure and three confounders. Varied correlation structures of the data represented various levels of confounding. Differing levels of loss to follow-up were generated through three mechanisms: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). The authors found no important bias with levels of loss that varied from 5 to 60% when loss to follow-up was related to MCAR or MAR mechanisms. However, when observations were lost to follow-up based on a MNAR mechanism, the authors found seriously biased estimates of the odds ratios with low levels of loss to follow-up. Loss to follow-up in cohort studies rarely occurs randomly. Therefore, when planning a cohort study, one should assume that loss to follow-up is MNAR and attempt to achieve the maximum follow-up rate possible.  相似文献   

3.

Background  

Missing data is classified as missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). Knowing the mechanism is useful in identifying the most appropriate analysis. The first aim was to compare different methods for identifying this missing data mechanism to determine if they gave consistent conclusions. Secondly, to investigate whether the reminder-response data can be utilised to help identify the missing data mechanism.  相似文献   

4.
目的 比较在处理多种缺失机制共存的定量纵向缺失数据时,基于对照的模式混合模型(PMM)、重复测量的混合效应模型(MMRM)以及多重填补法(MI)的统计性能。方法 采用Monte Carlo技术模拟产生包含完全随机缺失、随机缺失和非随机缺失中两种或三种缺失机制的定量纵向缺失数据集,评价三类处理方法的统计性能。结果 基于对照的PMM控制Ⅰ类错误率在较低水平,检验效能最低。MMRM和MI的Ⅰ类错误率可控,检验效能高于基于对照的PMM。两组疗效无差异的情况下,所有方法的估计误差相当,基于对照的PMM方法的95%置信区间覆盖率最高;有差异的情况下,各方法受符合其缺失机制假设的缺失比例大小影响。含有非随机缺失数据时,基于对照的PMM基本不高估疗效差异,95%置信区间覆盖率最高,MMRM和MI高估疗效差异,95%置信区间覆盖率较低。所有方法的95%置信区间宽度相当。结论 分析多种缺失机制共存,特别是含有非随机缺失的纵向缺失数据时,MMRM和MI的统计性能有所降低,可采用基于对照的PMM进行敏感性分析,但需要注意其具体假设,防止估计过于保守。  相似文献   

5.
During drug development, a key step is the identification of relevant covariates predicting between-subject variations in drug response. The full random effects model (FREM) is one of the full-covariate approaches used to identify relevant covariates in nonlinear mixed effects models. Here we explore the ability of FREM to handle missing (both missing completely at random (MCAR) and missing at random (MAR)) covariate data and compare it to the full fixed-effects model (FFEM) approach, applied either with complete case analysis or mean imputation. A global health dataset (20 421 children) was used to develop a FREM describing the changes of height for age Z-score (HAZ) over time. Simulated datasets (n = 1000) were generated with variable rates of missing (MCAR) covariate data (0%-90%) and different proportions of missing (MAR) data condition on either observed covariates or predicted HAZ. The three methods were used to re-estimate model and compared in terms of bias and precision which showed that FREM had only minor increases in bias and minor loss of precision at increasing percentages of missing (MCAR) covariate data and performed similarly in the MAR scenarios. Conversely, the FFEM approaches either collapsed at 70% of missing (MCAR) covariate data (FFEM complete case analysis) or had large bias increases and loss of precision (FFEM with mean imputation). Our results suggest that FREM is an appropriate approach to covariate modeling for datasets with missing (MCAR and MAR) covariate data, such as in global health studies.  相似文献   

6.
Missing data arise in crossover trials, as they do in any form of clinical trial. Several papers have addressed the problems that missing data create, although almost all of these assume that the probability that a planned observation is missing does not depend on the value that would have been observed; that is, the data are missing at random (MAR). In many applications, this assumption is likely to be untenable; in which case, the data are missing not at random (MNAR). We investigate the effect on estimates of the treatment effect that assume data are MAR when data are actually MNAR. We also propose using the assumption of no carryover treatment effect, which is usually required for this design, to permit the estimation of a treatment effect when data are MNAR. The results are applied to a trial comparing two treatments for neuropathic pain and show that the estimate of treatment effect is sensitive to the assumption of MAR.  相似文献   

7.
BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.  相似文献   

8.
Standard implementations of multiple imputation (MI) approaches provide unbiased inferences based on an assumption of underlying missing at random (MAR) mechanisms. However, in the presence of missing data generated by missing not at random (MNAR) mechanisms, MI is not satisfactory. Originating in an econometric statistical context, Heckman's model, also called the sample selection method, deals with selected samples using two joined linear equations, termed the selection equation and the outcome equation. It has been successfully applied to MNAR outcomes. Nevertheless, such a method only addresses missing outcomes, and this is a strong limitation in clinical epidemiology settings, where covariates are also often missing. We propose to extend the validity of MI to some MNAR mechanisms through the use of the Heckman's model as imputation model and a two‐step estimation process. This approach will provide a solution that can be used in an MI by chained equation framework to impute missing (either outcomes or covariates) data resulting either from a MAR or an MNAR mechanism when the MNAR mechanism is compatible with a Heckman's model. The approach is illustrated on a real dataset from a randomised trial in patients with seasonal influenza. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

9.
We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.  相似文献   

10.
Fairclough  D.L.  Gagnon  D.D.  Zagari  M.J.  Marschner  N.  Dicato  M. 《Quality of life research》2003,12(8):1013-1027
Quality of life (QOL) endpoints from a randomized, placebo-controlled trial of anemic cancer patients treated with nonplatinum-containing chemotherapy who received epoetin alfa or placebo were subjected to a sensitivity analysis. Three QOL instruments were used: the Functional Assessment of Cancer Therapy-Anemia (FACT-An), the Cancer Linear Analog Scale (CLAS), and the Medical Outcomes Study Short Form-36 (SF-36). The seven primary endpoints chosen a priori for analysis were: the Functional Assessment of Cancer Therapy-General (FACT-G) Total, FACT-An fatigue subscale, CLAS energy, CLAS daily activities, CLAS overall QOL, and the SF-36 physical and mental component summary scales. Lower QOL scores were reported for patients who discontinued early, suggesting a nonrandom dropout process. Significant correlations (ranging from 0.37 to 0.77) between individual rates of change and the time to early termination of therapy or death supported this conclusion. Estimates of within-treatment-arm QOL change over time are more conservative with the missing not at random (MNAR) assumption as compared with the more optimistic estimates with the assumption that missing QOL data are missing at random (MAR). However, the between-treatment-arm comparisons were consistent across analyses, demonstrating statistically significant differences in favor of the epoetin alfa arm for four of the seven outcome measures.  相似文献   

11.
Out of sight, not out of mind: strategies for handling missing data   总被引:1,自引:0,他引:1  
OBJECTIVE: To describe and illustrate missing data mechanisms (MCAR, MAR, NMAR) and missing data techniques (MDTs) and offer recommended best practices for addressing missingness. METHOD: We simulated data sets and employed ad hoc MDTs (deletion techniques, mean substitution) and sophisticated MDTs (full information maximum likelihood, Bayesian estimation, multiple imputation) in linear regression analyses. RESULTS: MCAR data yielded unbiased parameter estimates across all MDTs, but loss of power with deletion methods. NMAR results were biased towards larger values and greater significance. Under MAR the sophisticated MDTs returned estimates closer to their original values. CONCLUSION: State-of-the-art, readily available MDTs outperform ad hoc techniques.  相似文献   

12.
Liu LC 《Statistics in medicine》2008,27(30):6299-6309
In studies where multiple outcome items are repeatedly measured over time, missing data often occur. A longitudinal item response theory model is proposed for analysis of multivariate ordinal outcomes that are repeatedly measured. Under the MAR assumption, this model accommodates missing data at any level (missing item at any time point and/or missing time point). It allows for multiple random subject effects and the estimation of item discrimination parameters for the multiple outcome items. The covariates in the model can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is described utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher-scoring solution, which provides standard errors for all model parameters, is used. A data set from a longitudinal prevention study is used to motivate the application of the proposed model. In this study, multiple ordinal items of health behavior are repeatedly measured over time. Because of a planned missing design, subjects answered only two-third of all items at a given point.  相似文献   

13.
Many cohort studies and clinical trials are designed to compare rates of change over time in one or more disease markers in several groups. One major problem in such longitudinal studies is missing data due to patient drop-out. The bias and efficiency of six different methods to estimate rates of changes in longitudinal studies with incomplete observations were compared: generalized estimating equation estimates (GEE) proposed by Liang and Zeger (1986); unweighted average of ordinary least squares (OLSE) of individual rates of change (UWLS); weighted average of OLSE (WLS); conditional linear model estimates (CLE), a covariate type estimates proposed by Wu and Bailey (1989); random effect (RE), and joint multivariate RE (JMRE) estimates. The latter method combines a linear RE model for the underlying pattern of the marker with a log-normal survival model for informative drop-out process. The performance of these methods in the presence of missing data completely at random (MCAR), at random (MAR) and non-ignorable (NIM) were compared in simulation studies. Data for the disease marker were generated under the linear random effects model with parameter values derived from realistic examples in HIV infection. Rates of drop-out, assumed to increase over time, were allowed to be independent of marker values or to depend either only on previous marker values or on both previous and current marker values. Under MACR all six methods yielded unbiased estimates of both group mean rates and between-group difference. However, the cross-sectional view of the data in the GEE method resulted in seriously biased estimates under MAR and NIM drop-out process. The bias in the estimates ranged from 30 per cent to 50 per cent. The degree of bias in the GEE estimates increases with the severity of non-randomness and with the proportion of MAR data. Under MCAR and MAR all the other five methods performed relatively well. RE and JMRE estimates were more efficient(that is, had smaller variance) than UWLS, WLS and CL estimates. Under NIM, WLS and particularly RE estimates tended to underestimate the average rate of marker change (bias approximately 10 per cent). Under NIM, UWLS, CL and JMRE performed better in terms of bias (3-5 per cent) with the JMRE giving the most efficient estimates. Given that markers are key variables related to disease progression, missing marker data are likely to be at least MAR. Thus, the GEE method may not be appropriate for analysing such longitudinal marker data. The potential biases due to incomplete data require greater recognition in reports of longitudinal studies. Sensitivity analyses to assess the effect of drop-outs on inferences about the target parameters are important.  相似文献   

14.
Aims Missing health-related quality of life (HRQOL) data in clinical trials can impact conclusions but the effect has not been thoroughly studied in HIV clinical trials. Despite repeated recommendations to avoid complete case (CC) analysis and last observation carried forward (LOCF), these approaches are commonly used to handle missing data. The goal of this investigation is to describe the use of different analytic methods under assumptions of missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) using HIV as an empirical example. Methods Medical Outcomes Study HIV (MOS-HIV) Health Survey data were combined from two large open-label multinational HIV clinical trials comparing treatments A and B over 48 weeks. Inclusion in the HRQOL analysis required completion of the MOS-HIV at baseline and at least one follow-up visit (weeks 8, 16, 24, 40, 48). Primary outcomes for the analysis were change from week 0 to 48 in mental health summary (MHS), physical health summary (PHS), pain and health distress scores analyzed using CC, LOCF, generalized estimating equations (GEE), direct likelihood and sensitivity analyses using joint mixed-effects model, and Markov chain Monte Carlo (MCMC) multiple imputation. Time and treatment were included in all models. Baseline and longitudinal variables (adverse event and reason for discontinuation) were only used in the imputation model. Results A total of 511 patients randomized to treatment A and 473 to treatment B completed the MOS-HIV at baseline and at least one follow-up visit. At week 48, 71% of patients on treatment A and 31% on treatment B completed the MOS-HIV survey. Examining changes within each treatment group, CC and MCMC generally produced the largest or most positive changes. The joint model was most conservative; direct likelihood and GEE produced intermediate results; LOCF showed no consistent trend. There was greater spread for within-group changes than between-group differences (within MHS scores for treatment A: −0.1 to 1.6, treatment B: 0.4 to 2.0; between groups: −0.7 to 0.4; within PHS scores for treatment A: −1.5 to 0.4, treatment B: −1.7 to −0.2; between groups: 0.1 to 1.1). The size of within-group changes and between-group differences was of similar magnitude for the pain and health distress scores. In all cases, the range of estimates was small <0.2 SD (less than 2 points for the summary scores and 5 points for the subscale scores). Conclusions Use of the recommended likelihood-based models that do not require assumptions of MCAR was very feasible. Sensitivity analyses using auxiliary information can help to investigate the potential effect that missing data have on results but require planning to ensure that relevant data are prospectively collected.  相似文献   

15.
In this paper, we develop methods to combine multiple biomarker trajectories into a composite diagnostic marker using functional data analysis (FDA) to achieve better diagnostic accuracy in monitoring disease recurrence in the setting of a prospective cohort study. In such studies, the disease status is usually verified only for patients with a positive test result in any biomarker and is missing in patients with negative test results in all biomarkers. Thus, the test result will affect disease verification, which leads to verification bias if the analysis is restricted only to the verified cases. We treat verification bias as a missing data problem. Under both missing at random (MAR) and missing not at random (MNAR) assumptions, we derive the optimal classification rules using the Neyman-Pearson lemma based on the composite diagnostic marker. We estimate thresholds adjusted for verification bias to dichotomize patients as test positive or test negative, and we evaluate the diagnostic accuracy using the verification bias corrected area under the ROC curves (AUCs). We evaluate the performance and robustness of the FDA combination approach and assess the consistency of the approach through simulation studies. In addition, we perform a sensitivity analysis of the dependency between the verification process and disease status for the approach under the MNAR assumption. We apply the proposed method on data from the Religious Orders Study and from a non-small cell lung cancer trial.  相似文献   

16.
Health economics studies with missing data are increasingly using approaches such as multiple imputation that assume that the data are “missing at random.” This assumption is often questionable, as—even given the observed data—the probability that data are missing may reflect the true, unobserved outcomes, such as the patients' true health status. In these cases, methodological guidelines recommend sensitivity analyses to recognise data may be “missing not at random” (MNAR), and call for the development of practical, accessible approaches for exploring the robustness of conclusions to MNAR assumptions. Little attention has been paid to the problem that data may be MNAR in health economics in general and in cost‐effectiveness analyses (CEA) in particular. In this paper, we propose a Bayesian framework for CEA where outcome or cost data are missing. Our framework includes a practical, accessible approach to sensitivity analysis that allows the analyst to draw on expert opinion. We illustrate the framework in a CEA comparing an endovascular strategy with open repair for patients with ruptured abdominal aortic aneurysm, and provide software tools to implement this approach.  相似文献   

17.
OBJECTIVE: To create an efficient imputation algorithm for imputing the SF-12 physical component summary (PCS) and mental component summary (MCS) scores when patients have one to eleven SF-12 items missing. STUDY SETTING: Primary data collection was performed between 1996 and 1998. STUDY DESIGN: Multi-pattern regression was conducted to impute the scores using only available SF-12 items (simple model), and then supplemented by demographics, smoking status and comorbidity (enhanced model) to increase the accuracy. A cut point of missing SF-12 items was determined for using the simple or the enhanced model. The algorithm was validated through simulation. DATA COLLECTION: Thirty-thousand-three-hundred and eight patients from 63 physician groups were surveyed for a quality of care study in 1996, which collected the SF-12 and other information. The patients were classified as "chronic" patients if they reported that they had diabetes, heart disease, asthma/chronic obstructive pulmonary disease, or low back pain. A follow-up survey was conducted in 1998. PRINCIPAL FINDINGS: Thirty-one percent of the patients missed at least one SF-12 item. Means of variance of prediction and standard errors of the mean imputed scores increased with the number of missing SF-12 items. Correlations between the observed and the imputed scores derived from the enhanced models were consistently higher than those derived from the simple model and the increments were significant for patients with > or =6 missing SF-12 items (p<.03). CONCLUSION: Missing SF-12 items are prevalent and lead to reduced analytical power. Regression-based multi-pattern imputation using the available SF-12 items is efficient and can produce good estimates of the scores. The enhancement from the additional patient information can significantly improve the accuracy of the imputed scores for patients with > or =6 items missing, leading to estimated scores that are as accurate as that of patients with <6 missing items.  相似文献   

18.
Missing data in medical research is a common problem that has long been recognised by statisticians and medical researchers alike. In general, if the effect of missing data is not taken into account the results of the statistical analyses will be biased and the amount of variability in the data will not be correctly estimated. There are three main types of missing data pattern: Missing Completely At Random (MCAR), Missing At Random (MAR) and Not Missing At Random (NMAR). The type of missing data that a researcher has in their dataset determines the appropriate method to use in handling the missing data before a formal statistical analysis begins. The aim of this practice note is to describe these patterns of missing data and how they can occur, as well describing the methods of handling them. Simple and more complex methods are described, including the advantages and disadvantages of each method as well as their availability in routine software. It is good practice to perform a sensitivity analysis employing different missing data techniques in order to assess the robustness of the conclusions drawn from each approach.  相似文献   

19.
Existing methods for power analysis for longitudinal study designs are limited in that they do not adequately address random missing data patterns. Although the pattern of missing data can be assessed during data analysis, it is unknown during the design phase of a study. The random nature of the missing data pattern adds another layer of complexity in addressing missing data for power analysis. In this paper, we model the occurrence of missing data with a two-state, first-order Markov process and integrate the modelling information into the power function to account for random missing data patterns. The Markov model is easily specified to accommodate different anticipated missing data processes. We develop this approach for the two most popular longitudinal models: the generalized estimating equations (GEE) and the linear mixed-effects model under the missing completely at random (MCAR) assumption. For GEE, we also limit our consideration to the working independence correlation model. The proposed methodology is illustrated with numerous examples that are motivated by real study designs.  相似文献   

20.
目的 评价中文版SF-36量表在老年人群健康生命质量评价中的信度和效度.方法 2007年10-12月统一受训的调查员使用含中文版SF-36量表的问卷对浙江省城乡4241名60周岁以上的老年人面对面询问式调查,采用相关分析、信度分析、因子分析、t检验和方差分析等统计学方法 评价量表的信度和效度.结果 中文版SF-36量表具有较好的分半信度(r=0.91,P<0.001),内部一致性信度α系数除生命活力(α=0.65)、社交功能(α=0.65)、心理健康(α=0.40)维度外,其余维度的α系数均>0.8.每个条目跟相关维度的相关系数均>0.4(条目9-2除外),且高于该条目与其他维度的相关系数(条目9-8除外),说明中文版SF-36量表有良好的集合效度和区分效度.35个条目在提取的6个公因子中的分布与量表的理论结构假设基本一致,累计贡献达67.04%.除心理健康维度外,各维度具有良好的判别效度.结论 中文版SF-36量表有较好的信度和效度,适用于老年人群健康生命质量评价,但量表心理健康维度的信度与效度较低,且其中的9-2、9-8以及躯体功能维度中的3-1条目不适合于中国老年人群.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号