首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Missing data in medical research is a common problem that has long been recognised by statisticians and medical researchers alike. In general, if the effect of missing data is not taken into account the results of the statistical analyses will be biased and the amount of variability in the data will not be correctly estimated. There are three main types of missing data pattern: Missing Completely At Random (MCAR), Missing At Random (MAR) and Not Missing At Random (NMAR). The type of missing data that a researcher has in their dataset determines the appropriate method to use in handling the missing data before a formal statistical analysis begins. The aim of this practice note is to describe these patterns of missing data and how they can occur, as well describing the methods of handling them. Simple and more complex methods are described, including the advantages and disadvantages of each method as well as their availability in routine software. It is good practice to perform a sensitivity analysis employing different missing data techniques in order to assess the robustness of the conclusions drawn from each approach.  相似文献   

3.
PURPOSE: In a cohort in which racial data are unknown for some persons, race-specific persons and person-years are imputed using a model-based iterative allocation algorithm (IAA). METHODS: An EM algorithm-based approach to address misclassification in a censored data regression setting can be adapted to estimate the probability that a person of unknown race is white. The corresponding race-specific person-years are obtained as a by-product of the estimation procedure. Variance estimates are computed using the bootstrap. The proposed approach is compared with the proportional allocation method (PAM). RESULTS: In an occupational cohort where racial data were missing for 41% of the workers, the age-time-race-specific person-years were estimated within a relative variation of approximately 20%, using the IAA. The deaths were less reliably estimated. The standardized mortality ratios (SMRs) for all-cause mortality estimated using the IAA and the PAM were more similar for the non-white workers than for a smaller subgroup of white workers. CONCLUSIONS: The IAA provides a method to reliably estimate race-specific person-year denominators in cohort studies with missing racial data. This method is applicable to other incompletely observed non-time-dependent categorical covariates. Internal cohort rates or SMRs can be computed and modeled, with bootstrap confidence intervals that account for the uncertainty in the determination of race.  相似文献   

4.
目的 数据缺失是队列研究中几乎无法避免的问题。本文旨在通过模拟研究,比较当前常见的8种缺失数据处理方法在纵向缺失数据中的填补效果,为纵向缺失数据的处理提供有价值的参考。方法 模拟研究基于R语言编程实现,通过Monte Carlo方法产生纵向缺失数据,通过比较不同填补方法的平均绝对偏差、平均相对偏差和回归分析的Ⅰ类错误,评价不同填补方法对于纵向缺失数据的填补效果及对后续多因素分析的影响。结果 均值填补、k近邻填补(KNN)、回归填补和随机森林的填补效果接近,且表现稳定;多重插补和热卡填充次于以上填补方法;K均值聚类和EM算法填补效果最差,表现也最不稳定。均值填补、EM算法、随机森林、KNN和回归填补可较好地控制Ⅰ类错误,多重插补、热卡填充和K均值聚类不能有效控制Ⅰ类错误。结论 对于纵向缺失数据,在随机缺失机制下,均值填补、KNN、回归填补和随机森林均可作为较好的填补方法,当缺失比例不太大时,多重插补和热卡填充也表现较好,不推荐K均值聚类和EM算法。  相似文献   

5.
Because of current techniques of determining gene mutation, investigators are now interested in estimating the odds ratio between genetic status (mutation, no mutation) and an outcome variable such as disease cell type (A, B). In this paper we consider the mutation of the RAS genetic family. To determine if the genes have mutated, investigators look at five specific locations on the RAS gene. RAS mutated is a mutation in at least one of the five gene locations and RAS non-mutated is no mutation in any of the five locations. Owing to limited time and financial resources, one cannot obtain a complete genetic evaluation of all five locations on the gene for all patients. We propose the use of maximum likelihood (ML) with a 2(6) multinomial distribution formed by cross-classifying the binary mutation status at five locations by binary disease cell type. This ML method includes all patients regardless of completeness of data, treats the locations not evaluated as missing data, and uses the EM algorithm to estimate the odds ratio between genetic mutation status and the disease type. We compare the ML method to complete case estimates, and a method used by clinical investigators, which excludes patients with data on less than five locations who have no mutations on these sites.  相似文献   

6.
  目的  研究基于bootstrap抽样的期望最大化算法(EMB)的多重填补方法在横断面健康体检定量变量缺失数据的填补效果,为健康体检数据选择恰当的多重填补方法提供相关依据。  方法  基于人群横断面健康体检实测数据,采用EMB法多重填补法,应用R 3.5.0统计软件中的Amelia II程序包对2013年1 — 12 月在陕西省西安市西京医院健康体检中心进行常规体检的1 634名员工的健康体检数据进行多重填补分析。  结果  对于横断面定量健康体检资料,在单变量缺失率分别为 < 10 %、20 %和 70 % 3种随机缺失情况下,EMB多重填补法相对于列表删除法其估计误差均降低;基于相同数据,EMB多重填补次数不同,资料的填补效果不同,本研究资料较为合适的填补次数为m = 10次;填补前后概率密度曲线分布图显示,填补次数m = 10时多重填补值与实际观察值的概率密度曲线图吻合程度较好;变量过拟合诊断图进一步显示,填补次数m = 10时各变量大多数观测值的90 % CI包含了其最佳拟合线,且其可信区间较窄;基于列表删除法和EMB多重填补法处理后的2个不同分析数据集分别构建的多因素回归模型中包含的变量不同。  结论  对于不同缺失率随机缺失的定量变量,EMB多重填补法的填补效果均优于列表删除法;不同缺失资料的最优填补次数不同。  相似文献   

7.
Existing methods for power analysis for longitudinal study designs are limited in that they do not adequately address random missing data patterns. Although the pattern of missing data can be assessed during data analysis, it is unknown during the design phase of a study. The random nature of the missing data pattern adds another layer of complexity in addressing missing data for power analysis. In this paper, we model the occurrence of missing data with a two-state, first-order Markov process and integrate the modelling information into the power function to account for random missing data patterns. The Markov model is easily specified to accommodate different anticipated missing data processes. We develop this approach for the two most popular longitudinal models: the generalized estimating equations (GEE) and the linear mixed-effects model under the missing completely at random (MCAR) assumption. For GEE, we also limit our consideration to the working independence correlation model. The proposed methodology is illustrated with numerous examples that are motivated by real study designs.  相似文献   

8.
Liang and Zeger proposed a generalized estimating equations approach to the analysis of longitudinal data. Their models assume that missing observations are missing completely at random in the sense of Rubin. However, when this assumption does not hold, their analysis may yield biased results. In this paper, we develop a simple and practical procedure for testing this assumption. The proposed procedure is related to that of Park and Davis. © 1997 John Wiley & Sons, Ltd.  相似文献   

9.
Specific age‐related hypotheses are tested in population‐based longitudinal studies. At specific time intervals, both the outcomes of interest and the time‐varying covariates are measured. When participants are approached for follow‐up, some participants do not provide data. Investigations may show that many have died before the time of follow‐up whereas others refused to participate. Some of these non‐participants do not provide data at later follow‐ups. Few statistical methods for missing data distinguish between ‘non‐participation’ and ‘death’ among study participants. The augmented inverse probability‐weighted estimators are most commonly used in marginal structure models when data are missing at random. Treating non‐participation and death as the same, however, may lead to biased estimates and invalid inferences. To overcome this limitation, a multiple inverse probability‐weighted approach is presented to account for two types of missing data, non‐participation and death, when using a marginal mean model. Under certain conditions, the multiple weighted estimators are consistent and asymptotically normal. Simulation studies will be used to study the finite sample efficiency of the multiple weighted estimators. The proposed method will be applied to study the risk factors associated with the cognitive decline among the aging adults, using data from the Chicago Health and Aging Project (CHAP). Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

10.
BACKGROUND: Using an application and a simulation study we show the bias induced by missing data in the outcome in longitudinal studies and discuss suitable statistical methods according to the type of missing responses when the variable under study is gaussian. Method: The model used for the analysis of gaussian longitudinal data is the mixed effects linear model. When the probability of response does not depend on the missing values of the outcome and on the parameters of the linear model, missing data are ignorable, and parameters of the mixed effects linear model may be estimated by the maximum likelihood method with classical softwares. When the missing data are non ignorable, several methods have been proposed. We describe the method proposed by Diggle and Kenward (1994) (DK method) for which a software is available. This model consists in the combination of a linear mixed effects model for the outcome variable and a logistic model for the probability of response which depends on the outcome variable. RESULTS: A simulation study shows the efficacy of this method and its limits when the data are not normal. In this case, estimators obtained by the DK approach may be more biased than estimators obtained under the hypothesis of ignorable missing data even if the data are non ignorable. Data of the Paquid cohort about the evolution of the scores to a neuropsychological test among elderly subjects show the bias of a naive analysis using all available data. Although missing responses are not ignorable in this study, estimates of the linear mixed effects model are not very different using the DK approach and the hypothesis of ignorable missing data. CONCLUSION: Statistical methods for longitudinal data including non ignorable missing responses are sensitive to hypotheses difficult to verify. Thus, it will be better in practical applications to perform an analysis under the hypothesis of ignorable missing responses and compare the results obtained with several approaches for non ignorable missing data. However, such a strategy requires development of new softwares.  相似文献   

11.
12.
Despite the need for sensitivity analysis to nonignorable missingness in intensive longitudinal data (ILD), such analysis is greatly hindered by novel ILD features, such as large data volume and complex nonmonotonic missing-data patterns. Likelihood of alternative models permitting nonignorable missingness often involves very high-dimensional integrals, causing curse of dimensionality and rendering solutions computationally prohibitive to obtain. We aim to overcome this challenge by developing a computationally feasible method, nonlinear indexes of local sensitivity to nonignorability (NISNI). We use linear mixed effects models for the incomplete outcome and covariates. We use Markov multinomial models to describe complex missing-data patterns and mechanisms in ILD, thereby permitting missingness probabilities to depend directly on missing data. Using a second-order Taylor series to approximate likelihood under nonignorability, we develop formulas and closed-form expressions for NISNI. Our approach permits the outcome and covariates to be missing simultaneously, as is often the case in ILD, and can capture U-shaped impact of nonignorability in the neighborhood of the missing at random model without fitting alternative models or evaluating integrals. We evaluate performance of this method using simulated data and real ILD collected by the ecological momentary assessment method.  相似文献   

13.
Su L  Hogan JW 《Statistics in medicine》2008,27(17):3247-3268
Longitudinal studies with binary repeated measures are widespread in biomedical research. Marginal regression approaches for balanced binary data are well developed, whereas for binary process data, where measurement times are irregular and may differ by individuals, likelihood-based methods for marginal regression analysis are less well developed. In this article, we develop a Bayesian regression model for analyzing longitudinal binary process data, with emphasis on dealing with missingness. We focus on the settings where data are missing at random (MAR), which require a correctly specified joint distribution for the repeated measures in order to draw valid likelihood-based inference about the marginal mean. To provide maximum flexibility, the proposed model specifies both the marginal mean and serial dependence structures using nonparametric smooth functions. Serial dependence is allowed to depend on the time lag between adjacent outcomes as well as other relevant covariates. Inference is fully Bayesian. Using simulations, we show that adequate modeling of the serial dependence structure is necessary for valid inference of the marginal mean when the binary process data are MAR. Longitudinal viral load data from the HIV Epidemiology Research Study are analyzed for illustration.  相似文献   

14.
The multivariate linear mixed model (MLMM) has emerged as an important analytical tool for longitudinal data with multiple outcomes. However, the analysis of multivariate longitudinal data could be complicated by the presence of censored measurements because of a detection limit of the assay in combination with unavoidable missing values arising when subjects miss some of their scheduled visits intermittently. This paper presents a generalization of the MLMM approach, called the MLMM‐CM, for a joint analysis of the multivariate longitudinal data with censored and intermittent missing responses. A computationally feasible expectation maximization–based procedure is developed to carry out maximum likelihood estimation within the MLMM‐CM framework. Moreover, the asymptotic standard errors of fixed effects are explicitly obtained via the information‐based method. We illustrate our methodology by using simulated data and a case study from an AIDS clinical trial. Experimental results reveal that the proposed method is able to provide more satisfactory performance as compared with the traditional MLMM approach.  相似文献   

15.
Many cohort studies and clinical trials are designed to compare rates of change over time in one or more disease markers in several groups. One major problem in such longitudinal studies is missing data due to patient drop-out. The bias and efficiency of six different methods to estimate rates of changes in longitudinal studies with incomplete observations were compared: generalized estimating equation estimates (GEE) proposed by Liang and Zeger (1986); unweighted average of ordinary least squares (OLSE) of individual rates of change (UWLS); weighted average of OLSE (WLS); conditional linear model estimates (CLE), a covariate type estimates proposed by Wu and Bailey (1989); random effect (RE), and joint multivariate RE (JMRE) estimates. The latter method combines a linear RE model for the underlying pattern of the marker with a log-normal survival model for informative drop-out process. The performance of these methods in the presence of missing data completely at random (MCAR), at random (MAR) and non-ignorable (NIM) were compared in simulation studies. Data for the disease marker were generated under the linear random effects model with parameter values derived from realistic examples in HIV infection. Rates of drop-out, assumed to increase over time, were allowed to be independent of marker values or to depend either only on previous marker values or on both previous and current marker values. Under MACR all six methods yielded unbiased estimates of both group mean rates and between-group difference. However, the cross-sectional view of the data in the GEE method resulted in seriously biased estimates under MAR and NIM drop-out process. The bias in the estimates ranged from 30 per cent to 50 per cent. The degree of bias in the GEE estimates increases with the severity of non-randomness and with the proportion of MAR data. Under MCAR and MAR all the other five methods performed relatively well. RE and JMRE estimates were more efficient(that is, had smaller variance) than UWLS, WLS and CL estimates. Under NIM, WLS and particularly RE estimates tended to underestimate the average rate of marker change (bias approximately 10 per cent). Under NIM, UWLS, CL and JMRE performed better in terms of bias (3-5 per cent) with the JMRE giving the most efficient estimates. Given that markers are key variables related to disease progression, missing marker data are likely to be at least MAR. Thus, the GEE method may not be appropriate for analysing such longitudinal marker data. The potential biases due to incomplete data require greater recognition in reports of longitudinal studies. Sensitivity analyses to assess the effect of drop-outs on inferences about the target parameters are important.  相似文献   

16.
T Park  S Y Lee 《Statistics in medicine》1999,18(21):2933-2941
In longitudinal studies each subject is observed at several different times. Longitudinal studies are rarely balanced and complete due to occurrence of missing data. Little proposed pattern-mixture models for the analysis of incomplete multivariate normal data. Later, Little proposed an approach to modelling the drop-out mechanism based on the pattern-mixture models. We advocate the pattern-mixture models for analysing the longitudinal data with binary or Poisson responses in which the generalized estimating equations formulation of Liang and Zeger is sensible. The proposed method is illustrated with a real data set.  相似文献   

17.
Missing data are ubiquitous in longitudinal studies. In this paper, we propose an imputation procedure to handle dropouts in longitudinal studies. By taking advantage of the monotone missing pattern resulting from dropouts, our imputation procedure can be carried out sequentially, which substantially reduces the computation complexity. In addition, at each step of the sequential imputation, we set up a model selection mechanism that chooses between a parametric model and a nonparametric model to impute each missing observation. Unlike usual model selection procedures that aim at finding a single model fitting the entire data set well, our model selection procedure is customized to find a suitable model for the prediction of each missing observation. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

18.
Missing data in longitudinal studies   总被引:11,自引:0,他引:11  
When observations are made repeatedly over time on the same experimental units, unbalanced patterns of observations are a common occurrence. This complication makes standard analyses more difficult or inappropriate to implement, means loss of efficiency, and may introduce bias into the results as well. Some possible approaches to dealing with missing data include complete case analyses, univariate analyses with adjustments for variance estimates, two-step analyses, and likelihood based approaches. Likelihood approaches can be further categorized as to whether or not an explicit model is introduced for the non-response mechanism. This paper will review the use of likelihood based analyses for longitudinal data with missing responses, both from the point of view of ease of implementation and appropriateness in view of the non-response mechanism. Models for both measured and dichotomous outcome data will be discussed. The appropriateness of some non-likelihood based analyses is briefly considered.  相似文献   

19.
The statistical analysis of longitudinal quality of life data in the presence of missing data is discussed. In cancer trials missing data are generated due to the fact that patients die, drop out, or are censored. These missing data are problematic in the monitoring of the quality of life during the trial. However, by means of assuming that the cause of the missing data lies in the observed history of the patients and not in their unobserved future, the missing data are ignorable. Consequently, all available data can be used to estimate quality of life change patterns with time. The computations that are required are illustrated with real quality of life data and three commonly used computer packages for statistical analysis.Paper read at meeting of the EORTC Quality of Life Study Group, November 1991, Leicester, UK.This research was supported by a grant from the Dutch Science Foundation (NWO).  相似文献   

20.
Yang X  Li J  Shoptaw S 《Statistics in medicine》2008,27(15):2826-2849
Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号