首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The translation of human genome discoveries into health practice is one of the major challenges in the coming decades. The use of emerging genetic knowledge for early disease prediction, prevention, and pharmacogenetics will advance genome medicine and lead to more effective prevention/treatment strategies. For this reason, studies to assess the combined role of genetic and environmental discoveries in early disease prediction represent high priority research projects, as manifested in the multiple risk prediction studies now underway. However, the risk prediction models formed to date lack sufficient accuracy for clinical use. Converging evidence suggests that diseases with the same or similar clinical manifestations could have different pathophysiological and etiological processes. When heterogeneous subphenotypes are treated as a single entity, the effect size of predictors can be reduced substantially, leading to a low‐accuracy risk prediction model. The use of more refined subphenotypes facilitates the identification of new predictors and leads to improved risk prediction models. To account for the phenotypic heterogeneity, we have developed a multiclass likelihood‐ratio approach, which simultaneously determines the optimum number of subphenotype groups and builds a risk prediction model for each group. Simulation results demonstrated that the new approach had more accurate and robust performance than existing approaches under various underlying disease models. The empirical study of type II diabetes (T2D) by using data from the Genes and Environment Initiatives suggested heterogeneous etiology underlying obese and nonobese T2D patients. Considering phenotypic heterogeneity in the analysis leads to improved risk prediction models for both obese and nonobese T2D subjects.  相似文献   

2.
Calibration, that is, whether observed outcomes agree with predicted risks, is important when evaluating risk prediction models. For dichotomous outcomes, several tools exist to assess different aspects of model calibration, such as calibration‐in‐the‐large, logistic recalibration, and (non‐)parametric calibration plots. We aim to extend these tools to prediction models for polytomous outcomes. We focus on models developed using multinomial logistic regression (MLR): outcome Y with k categories is predicted using k ? 1 equations comparing each category i (i = 2, … ,k) with reference category 1 using a set of predictors, resulting in k ? 1 linear predictors. We propose a multinomial logistic recalibration framework that involves an MLR fit where Y is predicted using the k ? 1 linear predictors from the prediction model. A non‐parametric alternative may use vector splines for the effects of the linear predictors. The parametric and non‐parametric frameworks can be used to generate multinomial calibration plots. Further, the parametric framework can be used for the estimation and statistical testing of calibration intercepts and slopes. Two illustrative case studies are presented, one on the diagnosis of malignancy of ovarian tumors and one on residual mass diagnosis in testicular cancer patients treated with cisplatin‐based chemotherapy. The risk prediction models were developed on data from 2037 and 544 patients and externally validated on 1107 and 550 patients, respectively. We conclude that calibration tools can be extended to polytomous outcomes. The polytomous calibration plots are particularly informative through the visual summary of the calibration performance. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
Published clinical prediction models are often ignored during the development of novel prediction models despite similarities in populations and intended usage. The plethora of prediction models that arise from this practice may still perform poorly when applied in other populations. Incorporating prior evidence might improve the accuracy of prediction models and make them potentially better generalizable. Unfortunately, aggregation of prediction models is not straightforward, and methods to combine differently specified models are currently lacking. We propose two approaches for aggregating previously published prediction models when a validation dataset is available: model averaging and stacked regressions. These approaches yield user‐friendly stand‐alone models that are adjusted for the new validation data. Both approaches rely on weighting to account for model performance and between‐study heterogeneity but adopt a different rationale (averaging versus combination) to combine the models. We illustrate their implementation in a clinical example and compare them with established methods for prediction modeling in a series of simulation studies. Results from the clinical datasets and simulation studies demonstrate that aggregation yields prediction models with better discrimination and calibration in a vast majority of scenarios, and results in equivalent performance (compared to developing a novel model from scratch) when validation datasets are relatively large. In conclusion, model aggregation is a promising strategy when several prediction models are available from the literature and a validation dataset is at hand. The aggregation methods do not require existing models to have similar predictors and can be applied when relatively few data are at hand. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
During the recent decades, interest in prediction models has substantially increased, but approaches to synthesize evidence from previously developed models have failed to keep pace. This causes researchers to ignore potentially useful past evidence when developing a novel prediction model with individual participant data (IPD) from their population of interest. We aimed to evaluate approaches to aggregate previously published prediction models with new data. We consider the situation that models are reported in the literature with predictors similar to those available in an IPD dataset. We adopt a two‐stage method and explore three approaches to calculate a synthesis model, hereby relying on the principles of multivariate meta‐analysis. The former approach employs a naive pooling strategy, whereas the latter accounts for within‐study and between‐study covariance. These approaches are applied to a collection of 15 datasets of patients with traumatic brain injury, and to five previously published models for predicting deep venous thrombosis. Here, we illustrated how the generally unrealistic assumption of consistency in the availability of evidence across included studies can be relaxed. Results from the case studies demonstrate that aggregation yields prediction models with an improved discrimination and calibration in a vast majority of scenarios, and result in equivalent performance (compared with the standard approach) in a small minority of situations. The proposed aggregation approaches are particularly useful when few participant data are at hand. Assessing the degree of heterogeneity between IPD and literature findings remains crucial to determine the optimal approach in aggregating previous evidence into new prediction models. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
This paper proposes a risk prediction model using semi‐varying coefficient multinomial logistic regression. We use a penalized local likelihood method to do the model selection and estimate both functional and constant coefficients in the selected model. The model can be used to improve predictive modelling when non‐linear interactions between predictors are present. We conduct a simulation study to assess our method's performance, and the results show that the model selection procedure works well with small average numbers of wrong‐selection or missing‐selection. We illustrate the use of our method by applying it to classify the patients with early rheumatoid arthritis at baseline into different risk groups in future disease progression. We use a leave‐one‐out cross‐validation method to assess its correct prediction rate and propose a recalibration framework to evaluate how reliable are the predicted risks. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

6.
Semicompeting risks data arise when two types of events, non‐terminal and terminal, are observed. When the terminal event occurs first, it censors the non‐terminal event, but not vice versa. To account for possible dependent censoring of the non‐terminal event by the terminal event and to improve prediction of the terminal event using the non‐terminal event information, it is crucial to model their association properly. Motivated by a breast cancer clinical trial data analysis, we extend the well‐known illness–death models to allow flexible random effects to capture heterogeneous association structures in the data. Our extension also represents a generalization of the popular shared frailty models that usually assume that the non‐terminal event does not affect the hazards of the terminal event beyond a frailty term. We propose a unified Bayesian modeling approach that can utilize existing software packages for both model fitting and individual‐specific event prediction. The approach is demonstrated via both simulation studies and a breast cancer data set analysis. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

7.
Bivariate multinomial data such as the left and right eyes retinopathy status data are analyzed either by using a joint bivariate probability model or by exploiting certain odds ratio‐based association models. However, the joint bivariate probability model yields marginal probabilities, which are complicated functions of marginal and association parameters for both variables, and the odds ratio‐based association model treats the odds ratios involved in the joint probabilities as ‘working’ parameters, which are consequently estimated through certain arbitrary ‘working’ regression models. Also, this later odds ratio‐based model does not provide any easy interpretations of the correlations between two categorical variables. On the basis of pre‐specified marginal probabilities, in this paper, we develop a bivariate normal type linear conditional multinomial probability model to understand the correlations between two categorical variables. The parameters involved in the model are consistently estimated using the optimal likelihood and generalized quasi‐likelihood approaches. The proposed model and the inferences are illustrated through an intensive simulation study as well as an analysis of the well‐known Wisconsin Diabetic Retinopathy status data. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
This paper proposes a new statistical approach for predicting postoperative morbidity such as intensive care unit length of stay and number of complications after cardiac surgery in children. In a recent multi‐center study sponsored by the National Institutes of Health, 311 children undergoing cardiac surgery were enrolled. Morbidity data are count data in which the observations take only nonnegative integer values. Often, the number of zeros in the sample cannot be accommodated properly by a simple model, thus requiring a more complex model such as the zero‐inflated Poisson regression model. We are interested in identifying important risk factors for postoperative morbidity among many candidate predictors. There is only limited methodological work on variable selection for the zero‐inflated regression models. In this paper, we consider regularized zero‐inflated Poisson models through penalized likelihood function and develop a new expectation–maximization algorithm for numerical optimization. Simulation studies show that the proposed method has better performance than some competing methods. Using the proposed methods, we analyzed the postoperative morbidity, which improved the model fitting and identified important clinical and biomarker risk factors. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
An increasingly important data source for the development of clinical risk prediction models is electronic health records (EHRs). One of their key advantages is that they contain data on many individuals collected over time. This allows one to incorporate more clinical information into a risk model. However, traditional methods for developing risk models are not well suited to these irregularly collected clinical covariates. In this paper, we compare a range of approaches for using longitudinal predictors in a clinical risk model. Using data from an EHR for patients undergoing hemodialysis, we incorporate five different clinical predictors into a risk model for patient mortality. We consider different approaches for treating the repeated measurements including use of summary statistics, machine learning methods, functional data analysis, and joint models. We follow up our empirical findings with a simulation study. Overall, our results suggest that simple approaches perform just as well, if not better, than more complex analytic approaches. These results have important implication for development of risk prediction models with EHRs. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

10.
In long‐term follow‐up studies, irregular longitudinal data are observed when individuals are assessed repeatedly over time but at uncommon and irregularly spaced time points. Modeling the covariance structure for this type of data is challenging, as it requires specification of a covariance function that is positive definite. Moreover, in certain settings, careful modeling of the covariance structure for irregular longitudinal data can be crucial in order to ensure no bias arises in the mean structure. Two common settings where this occurs are studies with ‘outcome‐dependent follow‐up’ and studies with ‘ignorable missing data’. ‘Outcome‐dependent follow‐up’ occurs when individuals with a history of poor health outcomes had more follow‐up measurements, and the intervals between the repeated measurements were shorter. When the follow‐up time process only depends on previous outcomes, likelihood‐based methods can still provide consistent estimates of the regression parameters, given that both the mean and covariance structures of the irregular longitudinal data are correctly specified and no model for the follow‐up time process is required. For ‘ignorable missing data’, the missing data mechanism does not need to be specified, but valid likelihood‐based inference requires correct specification of the covariance structure. In both cases, flexible modeling approaches for the covariance structure are essential. In this paper, we develop a flexible approach to modeling the covariance structure for irregular continuous longitudinal data using the partial autocorrelation function and the variance function. In particular, we propose semiparametric non‐stationary partial autocorrelation function models, which do not suffer from complex positive definiteness restrictions like the autocorrelation function. We describe a Bayesian approach, discuss computational issues, and apply the proposed methods to CD4 count data from a pediatric AIDS clinical trial. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

11.
Hong Zhu 《Statistics in medicine》2014,33(14):2467-2479
Regression methods for survival data with right censoring have been extensively studied under semiparametric transformation models such as the Cox regression model and the proportional odds model. However, their practical application could be limited because of possible violation of model assumption or lack of ready interpretation for the regression coefficients in some cases. As an alternative, in this paper, the proportional likelihood ratio model introduced by Luo and Tsai is extended to flexibly model the relationship between survival outcome and covariates. This model has a natural connection with many important semiparametric models such as generalized linear model and density ratio model and is closely related to biased sampling problems. Compared with the semiparametric transformation model, the proportional likelihood ratio model is appealing and practical in many ways because of its model flexibility and quite direct clinical interpretation. We present two likelihood approaches for the estimation and inference on the target regression parameters under independent and dependent censoring assumptions. Based on a conditional likelihood approach using uncensored failure times, a numerically simple estimation procedure is developed by maximizing a pairwise pseudo‐likelihood. We also develop a full likelihood approach, and the most efficient maximum likelihood estimator is obtained by a profile likelihood. Simulation studies are conducted to assess the finite‐sample properties of the proposed estimators and compare the efficiency of the two likelihood approaches. An application to survival data for bone marrow transplantation patients of acute leukemia is provided to illustrate the proposed method and other approaches for handling non‐proportionality. The relative merits of these methods are discussed in concluding remarks. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
Most existing coronary risk assessment methods are based on baseline data only. The authors compared the predictive ability of coronary multivariable risk scores based on updated versus baseline risk factors and investigated the optimal frequency of updating. Data from 16 biennial examinations of 4,962 subjects from the original Framingham Heart Study (1948-1978) were used. The predictive ability of three multivariable risk scores was evaluated through 10-fold cross-validation. The baseline-only multivariable risk score was computed using baseline values of coronary risk factors applied to a Cox model estimated from baseline data. The two other approaches relied on updated risk factors and included them in the models estimated from, respectively, baseline and updated data. All analyses were stratified by sex and age. For 30, 14, and 10 years of follow-up, the predictive ability of the baseline-only multivariable risk score was substantially poorer than that of the models using updated risk factors. Between the two latter models, the one estimated from updated data ensured better prediction than the one estimated from baseline data for 30 years of follow-up among younger subjects only. The results suggest that coronary risk assessment can be improved by utilizing updated risk factors and that the optimal frequency of updating may vary across subpopulations.  相似文献   

13.
The area under the receiver operating characteristics curve (AUC of ROC) is a widely used measure of discrimination in risk prediction models. Routinely, the Mann–Whitney statistics is used as an estimator of AUC, while the change in AUC is tested by the DeLong test. However, very often, in settings where the model is developed and tested on the same dataset, the added predictor is statistically significantly associated with the outcome but fails to produce a significant improvement in the AUC. No conclusive resolution exists to explain this finding. In this paper, we will show that the reason lies in the inappropriate application of the DeLong test in the setting of nested models. Using numerical simulations and a theoretical argument based on generalized U‐statistics, we show that if the added predictor is not statistically significantly associated with the outcome, the null distribution is non‐normal, contrary to the assumption of DeLong test. Our simulations of different scenarios show that the loss of power because of such a misuse of the DeLong test leads to a conservative test for small and moderate effect sizes. This problem does not exist in cases of predictors that are associated with the outcome and for non‐nested models. We suggest that for nested models, only the test of association be performed for the new predictors, and if the result is significant, change in AUC be estimated with an appropriate confidence interval, which can be based on the DeLong approach. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

14.
An important question in the evaluation of an additional risk prediction marker is how to interpret a small increase in the area under the receiver operating characteristic curve (AUC). Many researchers believe that a change in AUC is a poor metric because it increases only slightly with the addition of a marker with a large odds ratio. Because it is not possible on purely statistical grounds to choose between the odds ratio and AUC, we invoke decision analysis, which incorporates costs and benefits. For example, a timely estimate of the risk of later non‐elective operative delivery can help a woman in labor decide if she wants an early elective cesarean section to avoid greater complications from possible later non‐elective operative delivery. A basic risk prediction model for later non‐elective operative delivery involves only antepartum markers. Because adding intrapartum markers to this risk prediction model increases AUC by 0.02, we questioned whether this small improvement is worthwhile. A key decision‐analytic quantity is the risk threshold, here the risk of later non‐elective operative delivery at which a patient would be indifferent between an early elective cesarean section and usual care. For a range of risk thresholds, we found that an increase in the net benefit of risk prediction requires collecting intrapartum marker data on 68 to 124 women for every correct prediction of later non‐elective operative delivery. Because data collection is non‐invasive, this test tradeoff of 68 to 124 is clinically acceptable, indicating the value of adding intrapartum markers to the risk prediction model. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

15.
In this paper, we investigate how the correlation structure of independent variables affects the discrimination of risk prediction model. Using multivariate normal data and binary outcome, we prove that zero correlation among predictors is often detrimental for discrimination in a risk prediction model and negatively correlated predictors with positive effect sizes are beneficial. A very high multiple R‐squared from regressing the new predictor on the old ones can also be beneficial. As a practical guide to new variable selection, we recommend to select predictors that have negative correlation with the risk score based on the existing variables. This step is easy to implement even when the number of new predictors is large. We illustrate our results by using real‐life Framingham data suggesting that the conclusions hold outside of normality. The findings presented in this paper might be useful for preliminary selection of potentially important predictors, especially is situations where the number of predictors is large. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
Risk prediction models have been widely applied for the prediction of long‐term incidence of disease. Several parameters have been identified and estimators developed to quantify the predictive ability of models and to compare new models with traditional models. These estimators have not generally accounted for censoring in the survival data normally available for fitting the models. This paper remedies that problem. The primary parameters considered are net reclassification improvement (NRI) and integrated discrimination improvement (IDI). We have previously similarly considered a primary measure of concordance, area under the ROC curve (AUC), also called the c‐statistic. We also include here consideration of population attributable risk (PAR) and ratio of predicted risk in the top quintile of risk to that in the bottom quintile. We evaluated estimators of these various parameters both with simulation studies and also as applied to a prospective study of coronary heart disease (CHD). Our simulation studies showed that in general our estimators had little bias, and less bias and smaller variances than the traditional estimators. We have applied our methods to assessing improvement in risk prediction for each traditional CHD risk factor compared to a model without that factor. These traditional risk factors are considered valuable, yet when adding any of them to a risk prediction model that has omitted the one factor, the improvement is generally small for any of the parameters. This experience should prepare us to not expect large values of the risk prediction improvement evaluation parameters for any new risk factor to be discovered. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

17.
This paper develops interval censoring likelihood methods in the context of parametric proportional hazard (PH) and non‐PH regression models in the longitudinal study setting to reanalyze the medical research council's randomized controlled trial of teletherapy in age‐related macular degeneration. We compare the performance of the interval censoring likelihood with proxy likelihoods that were used to analyze the original data. It is shown, analytically, that the use of such proxy likelihoods in selected PH models leads to biased estimators. Such estimators are artificially precise; further, the magnitude of their percentage bias is quantified in a data‐directed simulation study. For non‐PH models, we demonstrate that these results obtained from PH models do not hold uniformly and explain the implications of this finding for the reanalysis of proxy likelihood trial data. Our final analysis, of the age‐related macular degeneration trial data, based on fitting PH and non‐PH models, reassuringly confirms the published findings from the original trial. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
When fitting an econometric model, it is well known that we pick up part of the idiosyncratic characteristics of the data along with the systematic relationship between dependent and explanatory variables. This phenomenon is known as overfitting and generally occurs when a model is excessively complex relative to the amount of data available. Overfitting is a major threat to regression analysis in terms of both inference and prediction. We start by showing that the Copas measure becomes confounded by shrinkage or expansion arising from in‐sample bias when applied to the untransformed scale of nonlinear models, which is typically the scale of interest when assessing behaviors or analyzing policies. We then propose a new measure of overfitting that is both expressed on the scale of interest and immune to this problem. We also show how to measure the respective contributions of in‐sample bias and overfitting to the overall predictive bias when applying an estimated model to new data. We finally illustrate the properties of our new measure through both a simulation study and a real‐data illustration based on inpatient healthcare expenditure data, which shows that the distinctions can be important. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

19.
For cost‐effectiveness and efficiency, many large‐scale general‐purpose cohort studies are being assembled within large health‐care providers who use electronic health records. Two key features of such data are that incident disease is interval‐censored between irregular visits and there can be pre‐existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan–Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval‐censored incident disease that we call prevalence–incidence models. Parameters for parametric prevalence–incidence models, such as the logistic regression and Weibull survival (logistic–Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non‐parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan–Meier, logistic–Weibull, and non‐parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan–Meier provided poor estimates while the logistic–Weibull model was a close fit to the non‐parametric. Our findings support our use of logistic–Weibull models to develop the risk estimates that underlie current US risk‐based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.  相似文献   

20.
Continuous predictors are routinely encountered when developing a prognostic model. Investigators, who are often non‐statisticians, must decide how to handle continuous predictors in their models. Categorising continuous measurements into two or more categories has been widely discredited, yet is still frequently done because of its simplicity, investigator ignorance of the potential impact and of suitable alternatives, or to facilitate model uptake. We examine three broad approaches for handling continuous predictors on the performance of a prognostic model, including various methods of categorising predictors, modelling a linear relationship between the predictor and outcome and modelling a nonlinear relationship using fractional polynomials or restricted cubic splines. We compare the performance (measured by the c‐index, calibration and net benefit) of prognostic models built using each approach, evaluating them using separate data from that used to build them. We show that categorising continuous predictors produces models with poor predictive performance and poor clinical usefulness. Categorising continuous predictors is unnecessary, biologically implausible and inefficient and should not be used in prognostic model development. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号