首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
Many cohort studies and clinical trials have designs which involve repeated measurements of disease markers. One problem in such longitudinal studies, when the primary interest is to estimate and to compare the evolution of a disease marker, is that planned data are not collected because of missing data due to missing visits and/or withdrawal or attrition (for example, death). Several methods to analyse such data are available, provided that the data are missing at random. However, serious biases can occur when missingness is informative. In such cases, one needs to apply methods that simultaneously model the observed data and the missingness process. In this paper we consider the problem of estimation of the rate of change of a disease marker in longitudinal studies, in which some subjects drop out prematurely (informatively) due to attrition, while others experience a non-informative drop-out process (end of study, withdrawal). We propose a method which combines a linear random effects model for the underlying pattern of the marker with a log-normal survival model for the informative drop-out process. Joint estimates are obtained through the restricted iterative generalized least squares method which are equivalent to restricted maximum likelihood estimates. A nested EM algorithm is applied to deal with censored survival data. The advantages of this method are: it provides a unified approach to estimate all the model parameters; it can effectively deal with irregular data (that is, measured at irregular time points), a complicated covariance structure and a complex underlying profile of the response variable; it does not entail such complex computation as would be required to maximize the joint likelihood. The method is illustrated by modelling CD4 count data in a clinical trial in patients with advanced HIV infection while its performance is tested by simulation studies.  相似文献   

2.
This paper describes the problem of informative censoring in longitudinal studies where the primary outcome is rate of change in a continuous variable. Standard approaches based on the linear random effects model are valid only when the data are missing in a non-ignorable fashion. Informative censoring, which is a special type of non-ignorably missing data, occurs when the probability of early termination is related to an individual subject's true rate of change. When present, informative censoring causes bias in standard likelihood-based analyses, as well as in weighted averages of individual least-squares slopes. This paper reviews several methods proposed by others for analysis of informatively censored longitudinal data, and outlines a new approach based on a log-normal survival model. Maximum likelihood estimates may be obtained via the EM algorithm. Advantages of this approach are that it allows general unbalanced data caused by staggered entry and unequally-timed visits, it utilizes all available data, including data from patients with only a single measurement, and it provides a unified method for estimating all model parameters. Issues related to study design when informative censoring may occur are also discussed.  相似文献   

3.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

4.
In longitudinal studies, a quantitative outcome (such as blood pressure) may be altered during follow‐up by the administration of a non‐randomized, non‐trial intervention (such as anti‐hypertensive medication) that may seriously bias the study results. Current methods mainly address this issue for cross‐sectional studies. For longitudinal data, the current methods are either restricted to a specific longitudinal data structure or are valid only under special circumstances. We propose two new methods for estimation of covariate effects on the underlying (untreated) general longitudinal outcomes: a single imputation method employing a modified expectation–maximization (EM)‐type algorithm and a multiple imputation (MI) method utilizing a modified Monte Carlo EM‐MI algorithm. Each method can be implemented as one‐step, two‐step, and full‐iteration algorithms. They combine the advantages of the current statistical methods while reducing their restrictive assumptions and generalizing them to realistic scenarios. The proposed methods replace intractable numerical integration of a multi‐dimensionally censored MVN posterior distribution with a simplified, sufficiently accurate approximation. It is particularly attractive when outcomes reach a plateau after intervention due to various reasons. Methods are studied via simulation and applied to data from the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications study of treatment for type 1 diabetes. Methods proved to be robust to high dimensions, large amounts of censored data, low within‐subject correlation, and when subjects receive non‐trial intervention to treat the underlying condition only (with high Y), or for treatment in the majority of subjects (with high Y) in combination with prevention for a small fraction of subjects (with normal Y). Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
The rate of change in a continuous variable, measured serially over time, is often used as an outcome in longitudinal studies or clinical trials. When patients terminate the study before the scheduled end of the study, there is a potential for bias in estimation of rate of change using standard methods which ignore the missing data mechanism. These methods include the use of unweighted generalized estimating equations methods and likelihood-based methods assuming an ignorable missing data mechanism. We present a model for analysis of informatively censored data, based on an extension of the two-stage linear random effects model, where each subject's random intercept and slope are allowed to be associated with an underlying time to event. The joint distribution of the continuous responses and the time-to-event variable are then estimated via maximum likelihood using the EM algorithm, and using the bootstrap to calculate standard errors. We illustrate this methodology and compare it to simpler approaches and usual maximum likelihood using data from a multi-centre study of the effects of diet and blood pressure control on progression of renal disease, the Modification of Diet in Renal Disease (MDRD) Study. Sensitivity analyses and simulations are used to evaluate the performance of this methodology in the context of the MDRD data, under various scenarios where the drop-out mechanism is ignorable as well as non-ignorable.  相似文献   

6.
The use of random-effects models for the analysis of longitudinal data with missing responses has been discussed by several authors. In this paper, we extend the non-linear random-effects model for a single response to the case of multiple responses, allowing for arbitrary patterns of observed and missing data. Parameters for this model are estimated via the EM algorithm and by the first-order approximation available in SAS Proc NLMIXED. The set of equations for this estimation procedure is derived and these are appropriately modified to deal with missing data. The methodology is illustrated with an example using data coming from a study involving 161 pregnant women presenting to a private obstetrics clinic in Santiago, Chile.  相似文献   

7.
Recurrent event data are commonly encountered in health-related longitudinal studies. In this paper time-to-events models for recurrent event data are studied with non-informative and informative censorings. In statistical literature, the risk set methods have been confirmed to serve as an appropriate and efficient approach for analysing recurrent event data when censoring is non-informative. This approach produces biased results, however, when censoring is informative for the time-to-events outcome data. We compare the risk set methods with alternative non-parametric approaches which are robust subject to informative censoring. In particular, non-parametric procedures for the estimation of the cumulative occurrence rate function (CORF) and the occurrence rate function (ORF) are discussed in detail. Simulation and an analysis of data from the AIDS Link to Intravenous Experiences Cohort Study is presented.  相似文献   

8.
The recent biostatistical literature contains a number of methods for handling the bias caused by 'informative censoring', which refers to drop-out from a longitudinal study after a number of visits scheduled at predetermined intervals. The same or related methods can be extended to situations where the missing pattern is intermittent. The pattern of missingness is often assumed to be related to the outcome through random effects which represent unmeasured individual characteristics such as health awareness. To date there is only limited experience with applying the methods for informative censoring in practice, mostly because of complicated modelling and difficult computations. In this paper, we propose an estimation method based on grouping the data. The proposed estimator is asymptotically unbiased in various situations under informative missingness. Several existing methods are reviewed and compared in simulation studies. We apply the methods to data from the Wisconsin Diabetes Registry Project, a longitudinal study tracking glycaemic control and acute and chronic complications from the diagnosis of type I diabetes.  相似文献   

9.
In the literature of statistical analysis with missing data there is a significant gap in statistical inference for missing data mechanisms especially for nonmonotone missing data, which has essentially restricted the use of the estimation methods which require estimating the missing data mechanisms. For example, the inverse probability weighting methods (Horvitz & Thompson, 1952; Little & Rubin, 2002), including the popular augmented inverse probability weighting (Robins et al, 1994), depend on sufficient models for the missing data mechanisms to reduce estimation bias while improving estimation efficiency. This research proposes a semiparametric likelihood method for estimating missing data mechanisms where an EM algorithm with closed form expressions for both E-step and M-step is used in evaluating the estimate (Zhao et al, 2009; Zhao, 2020). The asymptotic variance of the proposed estimator is estimated from the profile score function. The methods are general and robust. Simulation studies in various missing data settings are performed to examine the finite sample performance of the proposed method. Finally, we analysis the missing data mechanism of Duke cardiac catheterization coronary artery disease diagnostic data to illustrate the method.  相似文献   

10.

Background

In molecular epidemiology studies biospecimen data are collected, often with the purpose of evaluating the synergistic role between a biomarker and another feature on an outcome. Typically, biomarker data are collected on only a proportion of subjects eligible for study, leading to a missing data problem. Missing data methods, however, are not customarily incorporated into analyses. Instead, complete-case (CC) analyses are performed, which can result in biased and inefficient estimates.

Methods

Through simulations, we characterized the performance of CC methods when interaction effects are estimated. We also investigated whether standard multiple imputation (MI) could improve estimation over CC methods when the data are not missing at random (NMAR) and auxiliary information may or may not exist.

Results

CC analyses were shown to result in considerable bias and efficiency loss. While MI reduced bias and increased efficiency over CC methods under specific conditions, it too resulted in biased estimates depending on the strength of the auxiliary data available and the nature of the missingness. In particular, CC performed better than MI when extreme values of the covariate were more likely to be missing, while MI outperformed CC when missingness of the covariate related to both the covariate and outcome. MI always improved performance when strong auxiliary data were available. In a real study, MI estimates of interaction effects were attenuated relative to those from a CC approach.

Conclusions

Our findings suggest the importance of incorporating missing data methods into the analysis. If the data are MAR, standard MI is a reasonable method. Auxiliary variables may make this assumption more reasonable even if the data are NMAR. Under NMAR we emphasize caution when using standard MI and recommend it over CC only when strong auxiliary data are available. MI, with the missing data mechanism specified, is an alternative when the data are NMAR. In all cases, it is recommended to take advantage of MI's ability to account for the uncertainty of these assumptions.  相似文献   

11.
When competing risks data arise, information on the actual cause of failure for some subjects might be missing. Therefore, a cause-specific proportional hazards model together with multiple imputation (MI) methods have been used to analyze such data. Modelling the cumulative incidence function is also of interest, and thus we investigate the proportional subdistribution hazards model (Fine and Gray model) together with MI methods as a modelling approach for competing risks data with missing cause of failure. Possible strategies for analyzing such data include the complete case analysis as well as an analysis where the missing causes are classified as an additional failure type. These approaches, however, may produce misleading results in clinical settings. In the present work we investigate the bias of the parameter estimates when fitting the Fine and Gray model in the above modelling approaches. We also apply the MI method and evaluate its comparative performance under various missing data scenarios. Results from simulation experiments showed that there is substantial bias in the estimates when fitting the Fine and Gray model with naive techniques for missing data, under missing at random cause of failure. Compared to those techniques the MI-based method gave estimates with much smaller biases and coverage probabilities of 95 per cent confidence intervals closer to the nominal level. All three methods were also applied on real data modelling time to AIDS or non-AIDS cause of death in HIV-1 infected individuals.  相似文献   

12.
Cancer studies frequently yield multiple event times that correspond to landmarks in disease progression, including non‐terminal events (i.e., cancer recurrence) and an informative terminal event (i.e., cancer‐related death). Hence, we often observe semi‐competing risks data. Work on such data has focused on scenarios in which the cause of the terminal event is known. However, in some circumstances, the information on cause for patients who experience the terminal event is missing; consequently, we are not able to differentiate an informative terminal event from a non‐informative terminal event. In this article, we propose a method to handle missing data regarding the cause of an informative terminal event when analyzing the semi‐competing risks data. We first consider the nonparametric estimation of the survival function for the terminal event time given missing cause‐of‐failure data via the expectation–maximization algorithm. We then develop an estimation method for semi‐competing risks data with missing cause of the terminal event, under a pre‐specified semiparametric copula model. We conduct simulation studies to investigate the performance of the proposed method. We illustrate our methodology using data from a study of early‐stage breast cancer. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

13.

Objective

The Mini-Mental State Examination (MMSE) is used to estimate current cognitive status and as a screen for possible dementia. Missing item-level data are commonly reported. Attention to missing data is particularly important. However, there are concerns that common procedures for dealing with missing data, for example, listwise deletion and mean item substitution, are inadequate.

Study Design and Setting

We used multiple imputation (MI) to estimate missing MMSE data in 17,303 participants who were drawn from the Dynamic Analyses to Optimize Aging project, a harmonization project of nine Australian longitudinal studies of aging.

Results

Our results indicated differences in mean MMSE scores between those participants with and without missing data, a pattern consistent over age and gender levels. MI inflated MMSE scores, but differences between those imputed and those without missing data still existed. A simulation model supported the efficacy of MI to estimate missing item level, although serious decrements in estimation occurred when 50% or more of item-level data were missing, particularly for the oldest participants.

Conclusions

Our adaptation of MI to obtain a probable estimate for missing MMSE item level data provides a suitable method when the proportion of missing item-level data is not excessive.  相似文献   

14.
Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight-gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.  相似文献   

15.
目的 以HIV/AIDS血液样品检测数据为来源,探索最为准确、高效、方便的填充方法.方法 利用SPSS17.0和SAS 9.1分析数据的缺失机制和缺失模式,采用期望最大化法(EM)、回归法和多重填补法(MI)3种方法对缺失数据进行填充,比较不同填充方法填充后数据的分布、精确度和准确度.结果 该研究缺失机制为随机缺失(x2=1141.21,P <0.001);缺失模式为任意缺失.MI填补10次的效果最优.缺失率在10%以下时,EM和回归法填充后准确度高于MI填充10次的准确度,除了血红蛋白外,EM法均比回归法填充后的准确度高;缺失率在20%左右时,MI法填充10次后的准确度高于EM法和回归法,对于血小板和血肌酐2个指标,采用EM法填充后的准确度高于回归法.EM法和回归法填充后的精确度优于MI法,EM法填充后精确度更高.EM法、回归法和MI法填充后数据的偏度系数和峰度系数很接近.结论 对于缺失率<10%的指标,采用EM法或回归法更方便、准确和精确;对于缺失率在20%左右的指标,采用MI填补更合适.  相似文献   

16.
When missing data occur in one or more covariates in a regression model, multiple imputation (MI) is widely advocated as an improvement over complete‐case analysis (CC). We use theoretical arguments and simulation studies to compare these methods with MI implemented under a missing at random assumption. When data are missing completely at random, both methods have negligible bias, and MI is more efficient than CC across a wide range of scenarios. For other missing data mechanisms, bias arises in one or both methods. In our simulation setting, CC is biased towards the null when data are missing at random. However, when missingness is independent of the outcome given the covariates, CC has negligible bias and MI is biased away from the null. With more general missing data mechanisms, bias tends to be smaller for MI than for CC. Since MI is not always better than CC for missing covariate problems, the choice of method should take into account what is known about the missing data mechanism in a particular substantive application. Importantly, the choice of method should not be based on comparison of standard errors. We propose new ways to understand empirical differences between MI and CC, which may provide insights into the appropriateness of the assumptions underlying each method, and we propose a new index for assessing the likely gain in precision from MI: the fraction of incomplete cases among the observed values of a covariate (FICO). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

17.
Missing single nucleotide polymorphisms (SNPs) are quite common in genetic association studies. Subjects with missing SNPs are often discarded in analyses, which may seriously undermine the inference of SNP-disease association. In this article, we develop two haplotype-based imputation approaches and one tree-based imputation approach for association studies. The emphasis is to evaluate the impact of imputation on parameter estimation, compared to the standard practice of ignoring missing data. Haplotype-based approaches build on haplotype reconstruction by the expectation-maximization (EM) algorithm or a weighted EM (WEM) algorithm, depending on whether case-control status is taken into account. The tree-based approach uses a Gibbs sampler to iteratively sample from a full conditional distribution, which is obtained from the classification and regression tree (CART) algorithm. We employ a standard multiple imputation procedure to account for the uncertainty of imputation. We apply the methods to simulated data as well as a case-control study on developmental dyslexia. Our results suggest that imputation generally improves efficiency over the standard practice of ignoring missing data. The tree-based approach performs comparably well as haplotype-based approaches, but the former has a computational advantage. The WEM approach yields the smallest bias at a price of increased variance.  相似文献   

18.
Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight–gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.  相似文献   

19.
目的 比较在处理多种缺失机制共存的定量纵向缺失数据时,基于对照的模式混合模型(PMM)、重复测量的混合效应模型(MMRM)以及多重填补法(MI)的统计性能。方法 采用Monte Carlo技术模拟产生包含完全随机缺失、随机缺失和非随机缺失中两种或三种缺失机制的定量纵向缺失数据集,评价三类处理方法的统计性能。结果 基于对照的PMM控制Ⅰ类错误率在较低水平,检验效能最低。MMRM和MI的Ⅰ类错误率可控,检验效能高于基于对照的PMM。两组疗效无差异的情况下,所有方法的估计误差相当,基于对照的PMM方法的95%置信区间覆盖率最高;有差异的情况下,各方法受符合其缺失机制假设的缺失比例大小影响。含有非随机缺失数据时,基于对照的PMM基本不高估疗效差异,95%置信区间覆盖率最高,MMRM和MI高估疗效差异,95%置信区间覆盖率较低。所有方法的95%置信区间宽度相当。结论 分析多种缺失机制共存,特别是含有非随机缺失的纵向缺失数据时,MMRM和MI的统计性能有所降低,可采用基于对照的PMM进行敏感性分析,但需要注意其具体假设,防止估计过于保守。  相似文献   

20.
Wu L 《Statistics in medicine》2007,26(17):3342-3357
In recent years HIV viral dynamic models have received great attention in AIDS studies. Often, subjects in these studies may drop out for various reasons such as drug intolerance or drug resistance, and covariates may also contain missing data. Statistical analyses ignoring informative dropouts and missing covariates may lead to misleading results. We consider appropriate methods for HIV viral dynamic models with informative dropouts and missing covariates and evaluate these methods via simulations. A real data set is analysed, and the results show that the initial viral decay rate, which may reflect the efficacy of the anti-HIV treatment, may be over-estimated if dropout patients are ignored. We also find that the current or immediate previous viral load values may be most predictive for patients' dropout. These results may be important for HIV/AIDS studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号