首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A variable is ‘systematically missing’ if it is missing for all individuals within particular studies in an individual participant data meta‐analysis. When a systematically missing variable is a potential confounder in observational epidemiology, standard methods either fail to adjust the exposure–disease association for the potential confounder or exclude studies where it is missing. We propose a new approach to adjust for systematically missing confounders based on multiple imputation by chained equations. Systematically missing data are imputed via multilevel regression models that allow for heterogeneity between studies. A simulation study compares various choices of imputation model. An illustration is given using data from eight studies estimating the association between carotid intima media thickness and subsequent risk of cardiovascular events. Results are compared with standard methods and also with an extension of a published method that exploits the relationship between fully adjusted and partially adjusted estimated effects through a multivariate random effects meta‐analysis model. We conclude that multiple imputation provides a practicable approach that can handle arbitrary patterns of systematic missingness. Bias is reduced by including sufficient between‐study random effects in the imputation model. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

2.
The use of individual participant data (IPD) from multiple studies is an increasingly popular approach when developing a multivariable risk prediction model. Corresponding datasets, however, typically differ in important aspects, such as baseline risk. This has driven the adoption of meta‐analytical approaches for appropriately dealing with heterogeneity between study populations. Although these approaches provide an averaged prediction model across all studies, little guidance exists about how to apply or validate this model to new individuals or study populations outside the derivation data. We consider several approaches to develop a multivariable logistic regression model from an IPD meta‐analysis (IPD‐MA) with potential between‐study heterogeneity. We also propose strategies for choosing a valid model intercept for when the model is to be validated or applied to new individuals or study populations. These strategies can be implemented by the IPD‐MA developers or future model validators. Finally, we show how model generalizability can be evaluated when external validation data are lacking using internal–external cross‐validation and extend our framework to count and time‐to‐event data. In an empirical evaluation, our results show how stratified estimation allows study‐specific model intercepts, which can then inform the intercept to be used when applying the model in practice, even to a population not represented by included studies. In summary, our framework allows the development (through stratified estimation), implementation in new individuals (through focused intercept choice), and evaluation (through internal–external validation) of a single, integrated prediction model from an IPD‐MA in order to achieve improved model performance and generalizability. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

3.
Recently, multiple imputation has been proposed as a tool for individual patient data meta‐analysis with sporadically missing observations, and it has been suggested that within‐study imputation is usually preferable. However, such within study imputation cannot handle variables that are completely missing within studies. Further, if some of the contributing studies are relatively small, it may be appropriate to share information across studies when imputing. In this paper, we develop and evaluate a joint modelling approach to multiple imputation of individual patient data in meta‐analysis, with an across‐study probability distribution for the study specific covariance matrices. This retains the flexibility to allow for between‐study heterogeneity when imputing while allowing (i) sharing information on the covariance matrix across studies when this is appropriate, and (ii) imputing variables that are wholly missing from studies. Simulation results show both equivalent performance to the within‐study imputation approach where this is valid, and good results in more general, practically relevant, scenarios with studies of very different sizes, non‐negligible between‐study heterogeneity and wholly missing variables. We illustrate our approach using data from an individual patient data meta‐analysis of hypertension trials. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

4.
Multiple imputation is a strategy for the analysis of incomplete data such that the impact of the missingness on the power and bias of estimates is mitigated. When data from multiple studies are collated, we can propose both within‐study and multilevel imputation models to impute missing data on covariates. It is not clear how to choose between imputation models or how to combine imputation and inverse‐variance weighted meta‐analysis methods. This is especially important as often different studies measure data on different variables, meaning that we may need to impute data on a variable which is systematically missing in a particular study. In this paper, we consider a simulation analysis of sporadically missing data in a single covariate with a linear analysis model and discuss how the results would be applicable to the case of systematically missing data. We find in this context that ensuring the congeniality of the imputation and analysis models is important to give correct standard errors and confidence intervals. For example, if the analysis model allows between‐study heterogeneity of a parameter, then we should incorporate this heterogeneity into the imputation model to maintain the congeniality of the two models. In an inverse‐variance weighted meta‐analysis, we should impute missing data and apply Rubin's rules at the study level prior to meta‐analysis, rather than meta‐analyzing each of the multiple imputations and then combining the meta‐analysis estimates using Rubin's rules. We illustrate the results using data from the Emerging Risk Factors Collaboration. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

5.
During the recent decades, interest in prediction models has substantially increased, but approaches to synthesize evidence from previously developed models have failed to keep pace. This causes researchers to ignore potentially useful past evidence when developing a novel prediction model with individual participant data (IPD) from their population of interest. We aimed to evaluate approaches to aggregate previously published prediction models with new data. We consider the situation that models are reported in the literature with predictors similar to those available in an IPD dataset. We adopt a two‐stage method and explore three approaches to calculate a synthesis model, hereby relying on the principles of multivariate meta‐analysis. The former approach employs a naive pooling strategy, whereas the latter accounts for within‐study and between‐study covariance. These approaches are applied to a collection of 15 datasets of patients with traumatic brain injury, and to five previously published models for predicting deep venous thrombosis. Here, we illustrated how the generally unrealistic assumption of consistency in the availability of evidence across included studies can be relaxed. Results from the case studies demonstrate that aggregation yields prediction models with an improved discrimination and calibration in a vast majority of scenarios, and result in equivalent performance (compared with the standard approach) in a small minority of situations. The proposed aggregation approaches are particularly useful when few participant data are at hand. Assessing the degree of heterogeneity between IPD and literature findings remains crucial to determine the optimal approach in aggregating previous evidence into new prediction models. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

6.
Missing data often occur in cross-sectional surveys and longitudinal and experimental studies. The purpose of this study was to compare the prediction of self-rated health (SRH), a robust predictor of morbidity and mortality among diverse populations, before and after imputation of the missing variable “yearly household income.” We reviewed data from 4,162 participants of Mexican origin recruited from July 1, 2002, through December 31, 2005, and who were enrolled in a population-based cohort study. Missing yearly income data were imputed using three different single imputation methods and one multiple imputation under a Bayesian approach. Of 4,162 participants, 3,121 were randomly assigned to a training set (to derive the yearly income imputation methods and develop the health-outcome prediction models) and 1,041 to a testing set (to compare the areas under the curve (AUC) of the receiver-operating characteristic of the resulting health-outcome prediction models). The discriminatory powers of the SRH prediction models were good (range, 69–72%) and compared to the prediction model obtained after no imputation of missing yearly income, all other imputation methods improved the prediction of SRH (P < 0.05 for all comparisons) with the AUC for the model after multiple imputation being the highest (AUC = 0.731). Furthermore, given that yearly income was imputed using multiple imputation, the odds of SRH as good or better increased by 11% for each $5,000 increment in yearly income. This study showed that although imputation of missing data for a key predictor variable can improve a risk health-outcome prediction model, further work is needed to illuminate the risk factors associated with SRH.  相似文献   

7.
Meta‐analysis of individual participant data (IPD) is increasingly utilised to improve the estimation of treatment effects, particularly among different participant subgroups. An important concern in IPD meta‐analysis relates to partially or completely missing outcomes for some studies, a problem exacerbated when interest is on multiple discrete and continuous outcomes. When leveraging information from incomplete correlated outcomes across studies, the fully observed outcomes may provide important information about the incompleteness of the other outcomes. In this paper, we compare two models for handling incomplete continuous and binary outcomes in IPD meta‐analysis: a joint hierarchical model and a sequence of full conditional mixed models. We illustrate how these approaches incorporate the correlation across the multiple outcomes and the between‐study heterogeneity when addressing the missing data. Simulations characterise the performance of the methods across a range of scenarios which differ according to the proportion and type of missingness, strength of correlation between outcomes and the number of studies. The joint model provided confidence interval coverage consistently closer to nominal levels and lower mean squared error compared with the fully conditional approach across the scenarios considered. Methods are illustrated in a meta‐analysis of randomised controlled trials comparing the effectiveness of implantable cardioverter‐defibrillator devices alone to implantable cardioverter‐defibrillator combined with cardiac resynchronisation therapy for treating patients with chronic heart failure. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

8.

Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.

  相似文献   

9.
Individual participant data (IPD) meta-analysis is a meta-analysis in which the individual-level data for each study are obtained and used for synthesis. A common challenge in IPD meta-analysis is when variables of interest are measured differently in different studies. The term harmonization has been coined to describe the procedure of placing variables on the same scale in order to permit pooling of data from a large number of studies. Using data from an IPD meta-analysis of 19 adolescent depression trials, we describe a multiple imputation approach for harmonizing 10 depression measures across the 19 trials by treating those depression measures that were not used in a study as missing data. We then apply diagnostics to address the fit of our imputation model. Even after reducing the scale of our application, we were still unable to produce accurate imputations of the missing values. We describe those features of the data that made it difficult to harmonize the depression measures and provide some guidelines for using multiple imputation for harmonization in IPD meta-analysis.  相似文献   

10.
There are many advantages to individual participant data meta‐analysis for combining data from multiple studies. These advantages include greater power to detect effects, increased sample heterogeneity, and the ability to perform more sophisticated analyses than meta‐analyses that rely on published results. However, a fundamental challenge is that it is unlikely that variables of interest are measured the same way in all of the studies to be combined. We propose that this situation can be viewed as a missing data problem in which some outcomes are entirely missing within some trials and use multiple imputation to fill in missing measurements. We apply our method to five longitudinal adolescent depression trials where four studies used one depression measure and the fifth study used a different depression measure. None of the five studies contained both depression measures. We describe a multiple imputation approach for filling in missing depression measures that makes use of external calibration studies in which both depression measures were used. We discuss some practical issues in developing the imputation model including taking into account treatment group and study. We present diagnostics for checking the fit of the imputation model and investigate whether external information is appropriately incorporated into the imputed values. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

11.
Multiple imputation is a popular method for addressing missing data, but its implementation is difficult when data have a multilevel structure and one or more variables are systematically missing. This systematic missing data pattern may commonly occur in meta‐analysis of individual participant data, where some variables are never observed in some studies, but are present in other hierarchical data settings. In these cases, valid imputation must account for both relationships between variables and correlation within studies. Proposed methods for multilevel imputation include specifying a full joint model and multiple imputation with chained equations (MICE). While MICE is attractive for its ease of implementation, there is little existing work describing conditions under which this is a valid alternative to specifying the full joint model. We present results showing that for multilevel normal models, MICE is rarely exactly equivalent to joint model imputation. Through a simulation study and an example using data from a traumatic brain injury study, we found that in spite of theoretical differences, MICE imputations often produce results similar to those obtained using the joint model. We also assess the influence of prior distributions in MICE imputation methods and find that when missingness is high, prior choices in MICE models tend to affect estimation of across‐study variability more than compatibility of conditional likelihoods. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

12.
In this paper, we consider fitting semiparametric additive hazards models for case‐cohort studies using a multiple imputation approach. In a case‐cohort study, main exposure variables are measured only on some selected subjects, but other covariates are often available for the whole cohort. We consider this as a special case of a missing covariate by design. We propose to employ a popular incomplete data method, multiple imputation, for estimation of the regression parameters in additive hazards models. For imputation models, an imputation modeling procedure based on a rejection sampling is developed. A simple imputation modeling that can naturally be applied to a general missing‐at‐random situation is also considered and compared with the rejection sampling method via extensive simulation studies. In addition, a misspecification aspect in imputation modeling is investigated. The proposed procedures are illustrated using a cancer data example. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

13.
BackgroundChronic kidney disease (CKD) is a global health concern that is increasing mainly as the result of increasing incidences of diabetes and hypertension. Furthermore, if left untreated, individuals with CKD may progress to end-stage kidney failure. Identifying individuals with undiagnosed CKD or those who are at an increased risk of developing CKD or progressing to end-stage kidney disease (ESKD) is therefore an important challenge. We sought to systematically review and critically assess the conduct and reporting of methods used to develop risk prediction models for predicting the risk of having undiagnosed (prevalent) or future risk of developing (incident) CKD or end-stage kidney failure in adults.MethodsWe conducted a systematic search of PubMed database to identify studies published up until September 2011 that describe the development of models combining two or more variables to predict the risk of prevalent or incident CKD or ESKD. We extracted key information that describes aspects of developing a prediction model, including the study design, data quality, sample size and number of events, outcome definition, risk predictor selection and coding, missing data, model-building strategies, and aspects of performance.ResultsEleven studies describing the development of 14 prediction models were included. Eight studies reported the development of 11 models to predict incident CKD or ESKD, whereas 3 studies developed models for prevalent CKD. A total of 97 candidate risk predictors were considered, and 43 different risk predictors featured in the 14 prediction models. A method, not recommended to select risk predictors for inclusion in the multivariate model, using statistical significance from univariate screening was carried out in six studies. Missing data were frequently poorly handled and reported with no mention of missing data in four studies; 4 studies explicitly excluded individuals with missing data, and only 2 studies used multiple imputation to replace missing values.ConclusionWe found that prediction models for chronic kidney were often developed using inappropriate methods and were generally poorly reported. Using poor methods can affect the predictive ability of the models, whereas inadequate reporting hinders an objective evaluation of the potential usefulness of the model.  相似文献   

14.
BACKGROUND AND OBJECTIVES: To illustrate the effects of different methods for handling missing data--complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (MI)--in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. METHODS: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. RESULTS: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. CONCLUSION: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values.  相似文献   

15.
We consider a study‐level meta‐analysis with a normally distributed outcome variable and possibly unequal study‐level variances, where the object of inference is the difference in means between a treatment and control group. A common complication in such an analysis is missing sample variances for some studies. A frequently used approach is to impute the weighted (by sample size) mean of the observed variances (mean imputation). Another approach is to include only those studies with variances reported (complete case analysis). Both mean imputation and complete case analysis are only valid under the missing‐completely‐at‐random assumption, and even then the inverse variance weights produced are not necessarily optimal. We propose a multiple imputation method employing gamma meta‐regression to impute the missing sample variances. Our method takes advantage of study‐level covariates that may be used to provide information about the missing data. Through simulation studies, we show that multiple imputation, when the imputation model is correctly specified, is superior to competing methods in terms of confidence interval coverage probability and type I error probability when testing a specified group difference. Finally, we describe a similar approach to handling missing variances in cross‐over studies. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

16.
It is widely acknowledged that the predictive performance of clinical prediction models should be studied in patients that were not part of the data in which the model was derived. Out-of-sample performance can be hampered when predictors are measured differently at derivation and external validation. This may occur, for instance, when predictors are measured using different measurement protocols or when tests are produced by different manufacturers. Although such heterogeneity in predictor measurement between derivation and validation data is common, the impact on the out-of-sample performance is not well studied. Using analytical and simulation approaches, we examined out-of-sample performance of prediction models under various scenarios of heterogeneous predictor measurement. These scenarios were defined and clarified using an established taxonomy of measurement error models. The results of our simulations indicate that predictor measurement heterogeneity can induce miscalibration of prediction and affects discrimination and overall predictive accuracy, to extents that the prediction model may no longer be considered clinically useful. The measurement error taxonomy was found to be helpful in identifying and predicting effects of heterogeneous predictor measurements between settings of prediction model derivation and validation. Our work indicates that homogeneity of measurement strategies across settings is of paramount importance in prediction research.  相似文献   

17.
BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.  相似文献   

18.
There is growing interest in developing clinical prediction models (CPMs) to aid local healthcare decision‐making. Frequently, these CPMs are developed in isolation across different populations, with repetitive de novo derivation a common modelling strategy. However, this fails to utilise all available information and does not respond to changes in health processes through time and space. Alternatively, model updating techniques have previously been proposed that adjust an existing CPM to suit the new population, but these techniques are restricted to a single model. Therefore, we aimed to develop a generalised method for updating and aggregating multiple CPMs. The proposed “hybrid method” re‐calibrates multiple CPMs using stacked regression while concurrently revising specific covariates using individual participant data (IPD) under a penalised likelihood. The performance of the hybrid method was compared with existing methods in a clinical example of mortality risk prediction after transcatheter aortic valve implantation, and in 2 simulation studies. The simulation studies explored the effect of sample size and between‐population‐heterogeneity on the method, with each representing a situation of having multiple distinct CPMs and 1 set of IPD. When the sample size of the IPD was small, stacked regression and the hybrid method had comparable but highest performance across modelling methods. Conversely, in large IPD samples, development of a new model and the hybrid method gave the highest performance. Hence, the proposed strategy can inform the choice between utilising existing CPMs or developing a model de novo, thereby incorporating IPD, existing research, and prior (clinical) knowledge into the modelling strategy.  相似文献   

19.
Pattern‐mixture models provide a general and flexible framework for sensitivity analyses of nonignorable missing data. The placebo‐based pattern‐mixture model (Little and Yau, Biometrics 1996; 52 :1324–1333) treats missing data in a transparent and clinically interpretable manner and has been used as sensitivity analysis for monotone missing data in longitudinal studies. The standard multiple imputation approach (Rubin, Multiple Imputation for Nonresponse in Surveys, 1987) is often used to implement the placebo‐based pattern‐mixture model. We show that Rubin's variance estimate of the multiple imputation estimator of treatment effect can be overly conservative in this setting. As an alternative to multiple imputation, we derive an analytic expression of the treatment effect for the placebo‐based pattern‐mixture model and propose a posterior simulation or delta method for the inference about the treatment effect. Simulation studies demonstrate that the proposed methods provide consistent variance estimates and outperform the imputation methods in terms of power for the placebo‐based pattern‐mixture model. We illustrate the methods using data from a clinical study of major depressive disorders. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

20.
ObjectiveTo illustrate the sequence of steps needed to develop and validate a clinical prediction model, when missing predictor values have been multiply imputed.Study Design and SettingWe used data from consecutive primary care patients suspected of deep venous thrombosis (DVT) to develop and validate a diagnostic model for the presence of DVT. Missing values were imputed 10 times with the MICE conditional imputation method. After the selection of predictors and transformations for continuous predictors according to three different methods, we estimated regression coefficients and performance measures.ResultsThe three methods to select predictors and transformations of continuous predictors showed similar results. Rubin's rules could easily be applied to estimate regression coefficients and performance measures, once predictors and transformations were selected.ConclusionWe provide a practical approach for model development and validation with multiply imputed data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号