期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Power analyses for longitudinal study designs with missing data

Tu XM Zhang J Kowalski J Shults J Feng C Sun W Tang W 《Statistics in medicine》2007,26(15):2958-2981

Existing methods for power analysis for longitudinal study designs are limited in that they do not adequately address random missing data patterns. Although the pattern of missing data can be assessed during data analysis, it is unknown during the design phase of a study. The random nature of the missing data pattern adds another layer of complexity in addressing missing data for power analysis. In this paper, we model the occurrence of missing data with a two-state, first-order Markov process and integrate the modelling information into the power function to account for random missing data patterns. The Markov model is easily specified to accommodate different anticipated missing data processes. We develop this approach for the two most popular longitudinal models: the generalized estimating equations (GEE) and the linear mixed-effects model under the missing completely at random (MCAR) assumption. For GEE, we also limit our consideration to the working independence correlation model. The proposed methodology is illustrated with numerous examples that are motivated by real study designs. 相似文献

2.

The impact of exposure categorisation for grouped analyses of cohort data

Richardson DB Loomis D 《Occupational and environmental medicine》2004,61(11):930-935

Background: Poisson regression is routinely used in occupational and environmental epidemiology. For typical Poisson regression analyses, person-time and events are tabulated by categorising predictor variables that were originally measured on a continuous scale. In order to estimate a dose-response trend, a researcher must decide how to categorise exposures and how to assign scores to exposure groups. Aims: To investigate the impact on regression results of decisions about exposure categorisation and score assignment. Methods: Cohort data were generated by Monte Carlo simulation methods. Exposure categories were defined by quintiles or deciles of the exposure distribution. Scores were assigned to exposure groups based on category midpoint and mean exposure levels. Estimated exposure-disease trends derived via Poisson regression were compared to the "true" association specified for the simulation. Results: Under the assumption that exposures conform to a lognormal or exponential distribution, trend estimates tend to be negatively biased when scores are assigned based on category midpoints and positively biased when scores are assigned based on cell specific mean values. The degree of bias was greater when exposure categories were defined by quintiles of the exposure distribution than when categories were defined by deciles of the exposure distribution. Conclusions: The routine practice of exposure categorisation and score assignment introduces exposure misclassification that may be differential with respect to disease status and, consequently, lead to biased exposure-disease trend estimates. When using the Poisson regression method to evaluate exposure-disease trends, such problems can be minimised (but not necessarily eliminated) by forming relatively refined exposure categories based on percentiles of the exposure distribution among cases, and by assigning scores to exposure categories that reflect person-time weighted mean exposure levels. 相似文献

3.

队列研究中纵向缺失数据填补方法的模拟研究

下载免费PDF全文

李业棉赵芃杨嵛惠王静娴颜虹陈方尧《中华流行病学杂志》2021,42(10):1889-1894

目的数据缺失是队列研究中几乎无法避免的问题。本文旨在通过模拟研究,比较当前常见的8种缺失数据处理方法在纵向缺失数据中的填补效果,为纵向缺失数据的处理提供有价值的参考。方法模拟研究基于R语言编程实现,通过Monte Carlo方法产生纵向缺失数据,通过比较不同填补方法的平均绝对偏差、平均相对偏差和回归分析的Ⅰ类错误,评价不同填补方法对于纵向缺失数据的填补效果及对后续多因素分析的影响。结果均值填补、k近邻填补（KNN）、回归填补和随机森林的填补效果接近,且表现稳定;多重插补和热卡填充次于以上填补方法;K均值聚类和EM算法填补效果最差,表现也最不稳定。均值填补、EM算法、随机森林、KNN和回归填补可较好地控制Ⅰ类错误,多重插补、热卡填充和K均值聚类不能有效控制Ⅰ类错误。结论对于纵向缺失数据,在随机缺失机制下,均值填补、KNN、回归填补和随机森林均可作为较好的填补方法,当缺失比例不太大时,多重插补和热卡填充也表现较好,不推荐K均值聚类和EM算法。相似文献

4.

多种缺失机制共存的定量纵向缺失数据处理方法的模拟比较研究

下载免费PDF全文

陈丽嫦衡明莉王骏陈平雁《现代预防医学》2020,(20):3684-3687

目的比较在处理多种缺失机制共存的定量纵向缺失数据时,基于对照的模式混合模型（PMM）、重复测量的混合效应模型（MMRM）以及多重填补法(MI)的统计性能。方法采用Monte Carlo技术模拟产生包含完全随机缺失、随机缺失和非随机缺失中两种或三种缺失机制的定量纵向缺失数据集,评价三类处理方法的统计性能。结果基于对照的PMM控制Ⅰ类错误率在较低水平,检验效能最低。MMRM和MI的Ⅰ类错误率可控,检验效能高于基于对照的PMM。两组疗效无差异的情况下,所有方法的估计误差相当,基于对照的PMM方法的95%置信区间覆盖率最高;有差异的情况下,各方法受符合其缺失机制假设的缺失比例大小影响。含有非随机缺失数据时,基于对照的PMM基本不高估疗效差异,95%置信区间覆盖率最高,MMRM和MI高估疗效差异,95%置信区间覆盖率较低。所有方法的95%置信区间宽度相当。结论分析多种缺失机制共存,特别是含有非随机缺失的纵向缺失数据时,MMRM和MI的统计性能有所降低,可采用基于对照的PMM进行敏感性分析,但需要注意其具体假设,防止估计过于保守。相似文献

5.

The impact of handling missing data on alcohol consumption estimates in the UK women cohort study

U. Nur N. T. Longford J. E. Cade D. C. Greenwood 《European journal of epidemiology》2009,24(10):589-595

We discuss methods for dealing with incomplete-data in the United Kingdom Women’s Cohort Study. We demonstrate by example how important it is to address the issues related to missing data with statistical integrity, illustrate the deficiencies of a data-reduction and a single-imputation method, and discuss how the method of multiple imputation overcomes them. Although the method entails some complexity, the computational activities can be organized in such a way that efficient analyses can be conducted by analysts who are not acquainted with all the details of the imputation method and who wish to rely on software they use and regard as standard. 相似文献

6.

Impact of missing data due to drop-outs on estimators for rates of change in longitudinal studies: a simulation study.

G Touloumi A G Babiker S J Pocock J H Darbyshire 《Statistics in medicine》2001,20(24):3715-3728

Many cohort studies and clinical trials are designed to compare rates of change over time in one or more disease markers in several groups. One major problem in such longitudinal studies is missing data due to patient drop-out. The bias and efficiency of six different methods to estimate rates of changes in longitudinal studies with incomplete observations were compared: generalized estimating equation estimates (GEE) proposed by Liang and Zeger (1986); unweighted average of ordinary least squares (OLSE) of individual rates of change (UWLS); weighted average of OLSE (WLS); conditional linear model estimates (CLE), a covariate type estimates proposed by Wu and Bailey (1989); random effect (RE), and joint multivariate RE (JMRE) estimates. The latter method combines a linear RE model for the underlying pattern of the marker with a log-normal survival model for informative drop-out process. The performance of these methods in the presence of missing data completely at random (MCAR), at random (MAR) and non-ignorable (NIM) were compared in simulation studies. Data for the disease marker were generated under the linear random effects model with parameter values derived from realistic examples in HIV infection. Rates of drop-out, assumed to increase over time, were allowed to be independent of marker values or to depend either only on previous marker values or on both previous and current marker values. Under MACR all six methods yielded unbiased estimates of both group mean rates and between-group difference. However, the cross-sectional view of the data in the GEE method resulted in seriously biased estimates under MAR and NIM drop-out process. The bias in the estimates ranged from 30 per cent to 50 per cent. The degree of bias in the GEE estimates increases with the severity of non-randomness and with the proportion of MAR data. Under MCAR and MAR all the other five methods performed relatively well. RE and JMRE estimates were more efficient(that is, had smaller variance) than UWLS, WLS and CL estimates. Under NIM, WLS and particularly RE estimates tended to underestimate the average rate of marker change (bias approximately 10 per cent). Under NIM, UWLS, CL and JMRE performed better in terms of bias (3-5 per cent) with the JMRE giving the most efficient estimates. Given that markers are key variables related to disease progression, missing marker data are likely to be at least MAR. Thus, the GEE method may not be appropriate for analysing such longitudinal marker data. The potential biases due to incomplete data require greater recognition in reports of longitudinal studies. Sensitivity analyses to assess the effect of drop-outs on inferences about the target parameters are important. 相似文献

7.

Imputation of missing longitudinal data: a comparison of methods 总被引：1，自引：0，他引：1

Engels JM Diehr P 《Journal of clinical epidemiology》2003,56(10):968-976

BACKGROUND AND OBJECTIVES: Missing information is inevitable in longitudinal studies, and can result in biased estimates and a loss of power. One approach to this problem is to impute the missing data to yield a more complete data set. Our goal was to compare the performance of 14 methods of imputing missing data on depression, weight, cognitive functioning, and self-rated health in a longitudinal cohort of older adults. METHODS: We identified situations where a person had a known value following one or more missing values, and treated the known value as a "missing value." This "missing value" was imputed using each method and compared to the observed value. Methods were compared on the root mean square error, mean absolute deviation, bias, and relative variance of the estimates. RESULTS: Most imputation methods were biased toward estimating the "missing value" as too healthy, and most estimates had a variance that was too low. Imputed values based on a person's values before and after the "missing value" were superior to other methods, followed by imputations based on a person's values before the "missing value." Imputations that used no information specific to the person, such as using the sample mean, had the worst performance. CONCLUSIONS: We conclude that, in longitudinal studies where the overall trend is for worse health over time and where missing data can be assumed to be primarily related to worse health, missing data in a longitudinal sequence should be imputed from the available longitudinal data for that person. 相似文献

8.

The impact of missing data on estimation of health-related quality of life outcomes: an analysis of a randomized longitudinal clinical trial

Hongyan Du Elizabeth A. Hahn David Cella 《Health services & outcomes research methodology》2011,11(3-4):134-144

Missing responses for health-related quality of life (HRQL) outcomes are common in clinical trials and may introduce bias as such data are often not missing at random. To evaluate the missingness (dropout) effect when comparing two treatment groups in a longitudinal randomized trial, we analyzed the Functional Assessment of Cancer Therapy Trial Outcome Index (TOI) change over 12 months for newly diagnosed patients with chronic myeloid leukemia. HRQL assessment was expected at baseline and months 1, 2, 3, 4, 5, 6, 9 and 12. We defined completers as those with baseline and month 12 TOI, and dropouts as all others as long as they had a baseline score. We defined censoring time as the time interval between baseline and the scheduled month 12 visit dates and approximate time-to-dropout as the time interval from baseline to the midpoint between date of the last reported TOI and the scheduled next visit date. A mixed-effects model was first built to assess treatment effect; a pattern-mixture model and a joint model were then built to account for non-ignorable dropout. Intermittent missing data were assumed to be missing at random. A square root transformation of TOI scores was taken to fulfill the normality and homogeneity assumption at each time point in all the models. The mixed-effects model revealed significant (P < 0.001) between-group differences at each visit except for baseline. The joint model generated similar parameter estimates as the separate longitudinal and survival sub-models with a significant association parameter (P = 0.039) indicating negative association between slope of TOI and hazard of dropout and thus non-ignorable dropout. The pattern-mixture model parameter estimates were fairly similar to those generated from the joint model. When non-ignorable missing data exist in longitudinal studies, a joint model is useful to quantify the relationship between dropout and outcome. In addition, it is important to examine underlying assumptions and utilize multiple missing data models including the pattern mixture model to assess sensitivity of model based inference to assumptions about missing mechanisms. 相似文献

9.

Quantifying the impact of drug exposure misclassification due to restrictive drug coverage in administrative databases: a simulation cohort study

Gamble JM McAlister FA Johnson JA Eurich DT 《Value in health》2012,15(1):191-197

相似文献

10.

The analysis of binary longitudinal data with time-dependent covariates

Guerra MW Shults J Amsterdam J Ten-Have T 《Statistics in medicine》2012,31(10):931-948

We consider longitudinal studies with binary outcomes that are measured repeatedly on subjects over time. The goal of our analysis was to fit a logistic model that relates the expected value of the outcomes with explanatory variables that are measured on each subject. However, additional care must be taken to adjust for the association between the repeated measurements on each subject. We propose a new maximum likelihood method for covariates that may be fixed or time varying. We also implement and make comparisons with two other approaches: generalized estimating equations, which may be more robust to misspecification of the true correlation structure, and alternating logistic regression, which models association via odds ratios that are subject to less restrictive constraints than are correlations. The proposed estimation procedure will yield consistent and asymptotically normal estimates of the regression and correlation parameters if the correlation on consecutive measurements on a subject is correctly specified. Simulations demonstrate that our approach can yield improved efficiency in estimation of the regression parameter; for equally spaced and complete data, the gains in efficiency were greatest for the parameter associated with a time-by-group interaction term and for stronger values of the correlation. For unequally spaced data and with dropout according to a missing-at-random mechanism, MARK1ML with correctly specified consecutive correlations yielded substantial improvements in terms of both bias and efficiency. We present an analysis to demonstrate application of the methods we consider. We also offer an R function for easy implementation of our approach. 相似文献

11.

Sensitivity analysis to investigate the impact of a missing covariate on survival analyses using cancer registry data

Brian L. Egleston Yu‐Ning Wong 《Statistics in medicine》2009,28(10):1498-1511

Having substantial missing data is a common problem in administrative and cancer registry data. We propose a sensitivity analysis to evaluate the impact of a covariate that is potentially missing not at random in survival analyses using Weibull proportional hazards regressions. We apply the method to an investigation of the impact of missing grade on post‐surgical mortality outcomes in individuals with metastatic kidney cancer. Data came from the Surveillance Epidemiology and End Results (SEER) registry which provides population‐based information on those undergoing cytoreductive nephrectomy. Tumor grade is an important component of risk stratification for patients with both localized and metastatic kidney cancer. Many individuals in SEER with metastatic kidney cancer are missing tumor grade information. We found that surgery was protective, but that the magnitude of the effect depended on assumptions about the relationship of grade with missingness. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

12.

Sources of data for a longitudinal birth cohort 总被引：5，自引：5，他引：0

Jean Golding Richard Jones 《Paediatric and perinatal epidemiology》2009,23(S1):51-62

This paper outlines the variety of data sources which can be utilised in a longitudinal study. Although a longitudinal study could be carried out using just one type of data, greater depth and accuracy can be achieved by including a variety of different sources of information. 相似文献

13.

Latency and time-dependent exposure in a case-control study

L H Moulton M G Lê 《Journal of clinical epidemiology》1991,44(9):915-923

Detailed historical data are elicited often from subjects in retrospective studies, yielding time-dependent measures of exposures. Investigation of a hypothesized period of latency can be made by examining disease/exposure relationships in multiple time windows, either along the age or time-before diagnosis axes. We suggest splitting the data into many time intervals and separately fitting regression models to the available data in each interval. Covariances between estimated coefficients from different intervals are empirically estimated, and used for assessing variability of specified functions of the time-specific coefficients. Alternative methods of interval formation and their consequences are discussed. We apply these methods to a French case-control study of oral contraceptive use and cervical cancer incidence, and compare the results to those of standard analyses. 相似文献

14.

Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study 总被引：1，自引：0，他引：1

Thiébaut AC Bénichou J 《Statistics in medicine》2004,23(24):3803-3820

Cox's regression model is widely used for assessing associations between potential risk factors and disease occurrence in epidemiologic cohort studies. Although age is often a strong determinant of disease risk, authors have frequently used time-on-study instead of age as the time-scale, as for clinical trials. Unless the baseline hazard is an exponential function of age, this approach can yield different estimates of relative hazards than using age as the time-scale, even when age is adjusted for. We performed a simulation study in order to investigate the existence and magnitude of bias for different degrees of association between age and the covariate of interest. Age to disease onset was generated from exponential, Weibull or piecewise Weibull distributions, and both fixed and time-dependent dichotomous covariates were considered. We observed no bias upon using age as the time-scale. Upon using time-on-study, we verified the absence of bias for exponentially distributed age to disease onset. For non-exponential distributions, we found that bias could occur even when the covariate of interest was independent from age. It could be severe in case of substantial association with age, especially with time-dependent covariates. These findings were illustrated on data from a cohort of 84,329 French women followed prospectively for breast cancer occurrence. In view of our results, we strongly recommend not using time-on-study as the time-scale for analysing epidemiologic cohort data. 相似文献

15.

Analysis of longitudinal Gaussian data with missing data on the response variable

Jacqmin-Gadda H Commenges D Dartigues J 《Revue d'épidémiologie et de santé publique》1999,47(6):525-534

BACKGROUND: Using an application and a simulation study we show the bias induced by missing data in the outcome in longitudinal studies and discuss suitable statistical methods according to the type of missing responses when the variable under study is gaussian. Method: The model used for the analysis of gaussian longitudinal data is the mixed effects linear model. When the probability of response does not depend on the missing values of the outcome and on the parameters of the linear model, missing data are ignorable, and parameters of the mixed effects linear model may be estimated by the maximum likelihood method with classical softwares. When the missing data are non ignorable, several methods have been proposed. We describe the method proposed by Diggle and Kenward (1994) (DK method) for which a software is available. This model consists in the combination of a linear mixed effects model for the outcome variable and a logistic model for the probability of response which depends on the outcome variable. RESULTS: A simulation study shows the efficacy of this method and its limits when the data are not normal. In this case, estimators obtained by the DK approach may be more biased than estimators obtained under the hypothesis of ignorable missing data even if the data are non ignorable. Data of the Paquid cohort about the evolution of the scores to a neuropsychological test among elderly subjects show the bias of a naive analysis using all available data. Although missing responses are not ignorable in this study, estimates of the linear mixed effects model are not very different using the DK approach and the hypothesis of ignorable missing data. CONCLUSION: Statistical methods for longitudinal data including non ignorable missing responses are sensitive to hypotheses difficult to verify. Thus, it will be better in practical applications to perform an analysis under the hypothesis of ignorable missing responses and compare the results obtained with several approaches for non ignorable missing data. However, such a strategy requires development of new softwares. 相似文献

16.

The impact of culture change on elders' behavioral symptoms: a longitudinal study

Burack OR Weiner AS Reinhardt JP 《Journal of the American Medical Directors Association》2012,13(6):522-528

ObjectivesDistressing behavioral symptoms often associated with dementia are not uncommon in the long term care setting. Culture change with its “person-centered approach to care” provides a potential nonpharmacological intervention to reduce these symptoms. The purpose of this study was to examine the relationship between a culture change initiative and nursing home elders’ behavioral symptoms.DesignSeven long term care communities (nursing units in 3 skilled nursing facilities) participated in a culture change intervention designed to transform the nursing home experience from a traditional hospital-model of care to one that is person-centered. Six comparison communities were matched to the intervention communities and continued to function along the typical nursing home organizational structure. Data were collected at baseline and 2 years later.MethodsSubjects were 101 elders (intervention group n = 50, comparison group n = 51). Each elder’s primary day certified nursing assistant completed the Cohen-Mansfield Agitation Inventory, examining frequency of behavioral symptoms, including verbal and physical agitation as well as more forceful behaviors (eg, hitting, kicking) at both data collection periods.ResultsAfter controlling for functional status and race, a significant condition by time interaction was found for physical agitation and forceful behaviors with the person-centered group maintaining levels of behavioral symptoms as compared with a significant increase over time among the comparison group. A trend with the same pattern was found for verbal agitation.ConclusionsPerson-centered care demonstrated potential as a nonpharmacological intervention for distressing behavioral symptoms. The positive impact of culture change appears to extend to elders with cognitive impairment who are less obvious beneficiaries of this model, featuring the central principals of autonomy and person-centered care. 相似文献

17.

A method for imputing missing data in longitudinal studies

Youk AO Stone RA Marsh GM 《Annals of epidemiology》2004,14(5):354-361

PURPOSE: In a cohort in which racial data are unknown for some persons, race-specific persons and person-years are imputed using a model-based iterative allocation algorithm (IAA). METHODS: An EM algorithm-based approach to address misclassification in a censored data regression setting can be adapted to estimate the probability that a person of unknown race is white. The corresponding race-specific person-years are obtained as a by-product of the estimation procedure. Variance estimates are computed using the bootstrap. The proposed approach is compared with the proportional allocation method (PAM). RESULTS: In an occupational cohort where racial data were missing for 41% of the workers, the age-time-race-specific person-years were estimated within a relative variation of approximately 20%, using the IAA. The deaths were less reliably estimated. The standardized mortality ratios (SMRs) for all-cause mortality estimated using the IAA and the PAM were more similar for the non-white workers than for a smaller subgroup of white workers. CONCLUSIONS: The IAA provides a method to reliably estimate race-specific person-year denominators in cohort studies with missing racial data. This method is applicable to other incompletely observed non-time-dependent categorical covariates. Internal cohort rates or SMRs can be computed and modeled, with bootstrap confidence intervals that account for the uncertainty in the determination of race. 相似文献

18.

Modelling the rate of change in a longitudinal study with missing data, adjusting for contact attempts

Akacha M Hutton JL 《Statistics in medicine》2011,30(10):1072-1089

The Collaborative Ankle Support Trial (CAST) is a longitudinal trial of treatments for severe ankle sprains in which interest lies in the rate of improvement, the effectiveness of reminders and potentially informative missingness. A model is proposed for continuous longitudinal data with non-ignorable or informative missingness, taking into account the nature of attempts made to contact initial non-responders. The model combines a non-linear mixed model for the outcome model with logistic regression models for the reminder processes. A sensitivity analysis is used to contrast this model with the traditional selection model, where we adjust for missingness by modelling the missingness process. The conclusions that recovery is slower, and less satisfactory with age and more rapid with below knee cast than with a tubular bandage do not alter materially across all models investigated. The results also suggest that phone calls are most effective in retrieving questionnaires. 相似文献

19.

Factors encouraging cohort maintenance in a longitudinal study

J K Marmor S A Oliveria R P Donahue E J Garrahie M J White L L Moore R C Ellison 《Journal of clinical epidemiology》1991,44(6):531-535

Maintenance of the cohort is one of the primary challenges of a longitudinal study. At the end of 3 years of follow up in the Framingham Children's Study, a longitudinal study of young children and their parents, 100 of the original 106 families (94.3%) have remained in the study. A questionnaire was administered to identify factors contributing to the high rate of follow up to this point in the study. The attitudes of the staff, feedback to the subjects, the staff's handling of questions and problems, and association with the Framingham Heart Study emerged as the most important factors influencing continued participation in the study. In addition, 99% of the subjects stated that they believed the medical research to be important. We conclude that the quality of the communication with study participants and the subjects' perceived importance of the research have been the key factors in maintaining the cohort in this longitudinal epidemiologic study. 相似文献

20.

The iBerry study: a longitudinal cohort study of adolescents at high risk of psychopathology

Grootendorst-van Mil Nina H. Bouter Diandra C. Hoogendijk Witte J. G. van Jaarsveld Stefanie F. L. M. Tiemeier Henning Mulder Cornelis L. Roza Sabine J. 《European journal of epidemiology》2021,36(4):453-464

European Journal of Epidemiology - The iBerry study is a population-based cohort study designed to investigate the transition from subclinical symptoms to a psychiatric disorder. Adolescents were... 相似文献