首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Survival regression is commonly applied in biomedical studies or clinical trials, and evaluating their predictive performance plays an essential role for model diagnosis and selection. The presence of censored data, particularly if informative, may pose more challenges for the assessment of predictive accuracy. Existing literature mainly focuses on prediction for survival probabilities with limitation work for survival time. In this work, we focus on accuracy measures of predicted survival times adjusted for a potentially informative censoring mechanism (ie, coarsening at random (CAR); non-CAR) by adopting the technique of inverse probability of censoring weighting. Our proposed predictive metric can be adaptive to various survival regression frameworks including but not limited to accelerated failure time models and proportional hazards models. Moreover, we provide the asymptotic properties of the inverse probability of censoring weighting estimators under CAR. We consider the settings of high-dimensional data under CAR or non-CAR for extensions. The performance of the proposed method is evaluated through extensive simulation studies and analysis of real data from the Critical Assessment of Microarray Data Analysis.  相似文献   

2.
This study developed and evaluated a method for ascertaining a newly diagnosed breast cancer case using multiple sources of data from the Medicare claims system. Predictors of an incident case were operationally defined as codes for breast cancer-related diagnoses and procedures from hospital inpatient, hospital outpatient, and physician claims. The optimal combination of predictors was then determined from a logistic regression model using 1992 data from the linked SEER registries-Medicare claims data base and a sample of noncancer controls drawn from the SEER areas. While the ROC curve demonstrates that the model can produce levels of sensitivity and specificity above 90%, the positive predictive value is comparatively low (67-70%). This low predictive value is largely the result of the model's limitation in distinguishing recurrent and secondary malignancies from incident cases and possibly from the model identifying true incident cases not identified by SEER. Nevertheless, the logistic regression approach is a useful method for ascertaining incident cases because it allows for greater flexibility in changing the performance characteristics by selecting different cut-points depending on the application (e.g., high sensitivity for registry validation, high specificity for outcomes research). It also allows us to make specific adjustments to population based estimates of breast cancer incidence with claims.  相似文献   

3.
Disaggregation regression has become an important tool in spatial disease mapping for making fine-scale predictions of disease risk from aggregated response data. By including high resolution covariate information and modeling the data generating process on a fine scale, it is hoped that these models can accurately learn the relationships between covariates and response at a fine spatial scale. However, validating these high resolution predictions can be a challenge, as often there is no data observed at this spatial scale. In this study, disaggregation regression was performed on simulated data in various settings and the resulting fine-scale predictions are compared to the simulated ground truth. Performance was investigated with varying numbers of data points, sizes of aggregated areas and levels of model misspecification. The effectiveness of cross validation on the aggregate level as a measure of fine-scale predictive performance was also investigated. Predictive performance improved as the number of observations increased and as the size of the aggregated areas decreased. When the model was well-specified, fine-scale predictions were accurate even with small numbers of observations and large aggregated areas. Under model misspecification predictive performance was significantly worse for large aggregated areas but remained high when response data was aggregated over smaller regions. Cross-validation correlation on the aggregate level was a moderately good predictor of fine-scale predictive performance. While these simulations are unlikely to capture the nuances of real-life response data, this study gives insight into the effectiveness of disaggregation regression in different contexts.  相似文献   

4.
The analysis of quality of life (QoL) data can be challenging due to the skewness of responses and the presence of missing data. In this paper, we propose a new weighted quantile regression method for estimating the conditional quantiles of QoL data with responses missing at random. The proposed method makes use of the correlation information within the same subject from an auxiliary mean regression model to enhance the estimation efficiency and takes into account of missing data mechanism. The asymptotic properties of the proposed estimator have been studied and simulations are also conducted to evaluate the performance of the proposed estimator. The proposed method has also been applied to the analysis of the QoL data from a clinical trial on early breast cancer, which motivated this study.  相似文献   

5.
The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable for all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model.  相似文献   

6.
目的 分析肝癌患者外科切除术后感染的危险因素,建立肝切除术后感染的预测模型.方法 回顾性分析2017年2月—2019年10月苏州大学附属第一医院普外科收治的施行肝癌外科切除术的患者.调查患者的一般资料、实验室资料、手术资料、术后感染情况等.通过单因素χ2检验和多因素logistic回归确定患者术后发生感染的独立危险因素...  相似文献   

7.
Clinicians and health service researchers are frequently interested in predicting patient-specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.  相似文献   

8.
After chemotherapy for metastatic non-seminomatous testicular cancer, surgical resection is a generally accepted treatment to remove remnants of the initial metastases, since residual tumour may still be present (mature teratoma or viable cancer cells). In this paper, we review the development and external validation of a logistic regression model to predict the absence of residual tumour. Three sources of information were used. A quantitative review identified six relevant predictors from 19 published studies (996 resections). Second, a development data set included individual data of 544 patients from six centres. This data set was used to assess the predictive relationships of five continuous predictors, which resulted in dichotomization for two, and a log, square root, and linear transformation for three other predictors. The multiple logistic regression coefficients were reduced with a shrinkage factor (0.95) to improve calibration, based on a bootstrapping procedure. Third, a validation data set included 172 more recently treated patients. The model showed adequate calibration and good discrimination in the development and in the validation sample (areas under the ROC curve 0.83 and 0.82). This study illustrates that a careful modelling strategy may result in an adequate predictive model. Further study of model validity may stimulate application in clinical practice.  相似文献   

9.
The generalized odds‐rate model is a class of semiparametric regression models, which includes the proportional hazards and proportional odds models as special cases. There are few works on estimation of the generalized odds‐rate model with interval censored data because of the challenges in maximizing the complex likelihood function. In this paper, we propose a gamma‐Poisson data augmentation approach to develop an Expectation Maximization algorithm, which can be used to fit the generalized odds‐rate model to interval censored data. The proposed Expectation Maximization algorithm is easy to implement and is computationally efficient. The performance of the proposed method is evaluated by comprehensive simulation studies and illustrated through applications to datasets from breast cancer and hemophilia studies. In order to make the proposed method easy to use in practice, an R package ‘ICGOR’ was developed. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

10.
Multivariate interval‐censored failure time data arise commonly in many studies of epidemiology and biomedicine. Analysis of these type of data is more challenging than the right‐censored data. We propose a simple multiple imputation strategy to recover the order of occurrences based on the interval‐censored event times using a conditional predictive distribution function derived from a parametric gamma random effects model. By imputing the interval‐censored failure times, the estimation of the regression and dependence parameters in the context of a gamma frailty proportional hazards model using the well‐developed EM algorithm is made possible. A robust estimator for the covariance matrix is suggested to adjust for the possible misspecification of the parametric baseline hazard function. The finite sample properties of the proposed method are investigated via simulation. The performance of the proposed method is highly satisfactory, whereas the computation burden is minimal. The proposed method is also applied to the diabetic retinopathy study (DRS) data for illustration purpose and the estimates are compared with those based on other existing methods for bivariate grouped survival data. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
Additive regression models are preferred over multiplicative models in the analysis of relative survival data. Such preferences are mainly grounded in practical experience with mostly cancer registries data, where the basic assumption of the additivity of hazards is more likely to be met. Also, the interpretation of coefficients is more meaningful in additive than in multiplicative models. Nonetheless, the question of goodness of fit of the assumed model must still be addressed, and while there is an abundance of methods to check the goodness of fit of multiplicative models, the respective arsenal for additive models is almost empty. We propose here a variety of procedures for testing the null hypothesis of a good fit. These are based on partial residuals defined similarly to Schoenfeld residuals familiar for Cox model diagnostics. The tests have appropriate sizes under the null hypothesis, and good power under different alternatives. We investigate their performance through simulations and apply the methods to data from a study into survival of colon cancer patients.  相似文献   

12.
The inverse probability weighted estimator is often applied to two-phase designs and regression with missing covariates. Inverse probability weighted estimators typically are less efficient than likelihood-based estimators but, in general, are more robust against model misspecification. In this paper, we propose a best linear inverse probability weighted estimator for two-phase designs and missing covariate regression. Our proposed estimator is the projection of the SIPW onto the orthogonal complement of the score space based on a working regression model of the observed covariate data. The efficiency gain is from the use of the association between the outcome variable and the available covariates, which is the working regression model. One advantage of the proposed estimator is that there is no need to calculate the augmented term of the augmented weighted estimator. The estimator can be applied to general missing data problems or two-phase design studies in which the second phase data are obtained in a subcohort. The method can also be applied to secondary trait case-control genetic association studies. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via extensive simulation studies. The methods are applied to a bladder cancer case-control study.  相似文献   

13.
ObjectivesTo develop a mapping algorithm for estimating EuroQol five-dimensional (EQ-5D) questionnaire values from the prostate cancer–specific health-related quality-of-life (HRQOL) instrument Functional Assessment of Cancer Therapy–Prostate (FACT-P) instrument.MethodsThe EQ-5D questionnaire and FACT-P instrument data were collected for a subset of patients with metastatic castration-resistant prostate cancer in a multicenter, randomized, double-blind, placebo-controlled phase 3 trial. We compared three statistical techniques to estimate patients’ EQ-5D questionnaire index scores determined by using the UK tariff: 1) generalized estimating equations, 2) two-part model combining logistic regression and generalized estimating equation, and 3) separate mapping algorithms for patients with poor health defined as a FACT-P score of 76 or less (group-specific model). Four different sets of explanatory variables were compared. The models were cross-validated by using a 10-fold in-sample cross-validation.ResultsValues for both instruments were available for 236 patients with metastatic castration-resistant prostate cancer. The group-specific model including the FACT-P subscale scores and baseline variables had the best predictive performance with R2 0.718, root mean square error 0.162, and mean absolute error 0.117. The two-part model and the generalized estimating equation model including the FACT-P subdomain scores and baseline variables also had good predictive performance.ConclusionsThe developed algorithms for mapping the FACT-P instrument to the EQ-5D questionnaire enable the estimation of preference-based health-related quality-of-life scores for use in cost-effectiveness analyses when directly elicited EQ-5D questionnaire data are missing.  相似文献   

14.
We develop a new genetic prediction method, smooth‐threshold multivariate genetic prediction, using single nucleotide polymorphisms (SNPs) data in genome‐wide association studies (GWASs). Our method consists of two stages. At the first stage, unlike the usual discontinuous SNP screening as used in the gene score method, our method continuously screens SNPs based on the output from standard univariate analysis for marginal association of each SNP. At the second stage, the predictive model is built by a generalized ridge regression simultaneously using the screened SNPs with SNP weight determined by the strength of marginal association. Continuous SNP screening by the smooth thresholding not only makes prediction stable but also leads to a closed form expression of generalized degrees of freedom (GDF). The GDF leads to the Stein's unbiased risk estimation (SURE), which enables data‐dependent choice of optimal SNP screening cutoff without using cross‐validation. Our method is very rapid because computationally expensive genome‐wide scan is required only once in contrast to the penalized regression methods including lasso and elastic net. Simulation studies that mimic real GWAS data with quantitative and binary traits demonstrate that the proposed method outperforms the gene score method and genomic best linear unbiased prediction (GBLUP), and also shows comparable or sometimes improved performance with the lasso and elastic net being known to have good predictive ability but with heavy computational cost. Application to whole‐genome sequencing (WGS) data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) exhibits that the proposed method shows higher predictive power than the gene score and GBLUP methods.  相似文献   

15.
16.
In clinical trials, patients with different biomarker features may respond differently to the new treatments or drugs. In personalized medicine, it is important to study the interaction between treatment and biomarkers in order to clearly identify patients that benefit from the treatment. With the local partial‐likelihood estimation (LPLE) method proposed by Fan J, Lin H, Zhou Y. Local partial‐likelihood estimation for lifetime data. The Annals of Statistics 2006; 34 (1):290?325, the treatment effect can be modeled as a flexible function of the biomarker. In this paper, we propose a bootstrap test method for survival outcome data based on the LPLE, for assessing whether the treatment effect is a constant among all patients or varies as a function of the biomarker. The test method is called local partial‐likelihood bootstrap (LPLB) and is developed by bootstrapping the martingale residuals. The test statistic measures the amount of change in treatment effects across the entire range of the biomarker and is derived based on asymptotic theories for martingales. The LPLB method is nonparametric and is shown in simulations and data analysis examples to be flexible enough to identify treatment effects in a biomarker‐defined subset and more powerful to detect treatment‐biomarker interaction of complex forms than the Cox regression model with a simple interaction. We use data from a breast cancer and a prostate cancer clinical trial to illustrate the proposed LPLB test. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

17.
18.
BACKGROUND AND OBJECTIVE: The utility of predictive models depends on their external validity, that is, their ability to maintain accuracy when applied to patients and settings different from those on which the models were developed. We report a simulation study that compared the external validity of standard logistic regression (LR1), logistic regression with piecewise-linear and quadratic terms (LR2), classification trees, and neural networks (NNETs). METHODS: We developed predictive models on data simulated from a specified population and on data from perturbed forms of the population not representative of the original distribution. All models were tested on new data generated from the population. RESULTS: The performance of LR2 was superior to that of the other model types when the models were developed on data sampled from the population (mean receiver operating characteristic [ROC] areas 0.769, 0.741, 0.724, and 0.682, for LR2, LR1, NNETs, and trees, respectively) and when they were developed on nonrepresentative data (mean ROC areas 0.734, 0.713, 0.703, and 0.667). However, when the models developed using nonrepresentative data were compared with models developed from data sampled from the population, LR2 had the greatest loss in performance. CONCLUSION: Our results highlight the necessity of external validation to test the transportability of predictive models.  相似文献   

19.
Objective. To develop and validate a clinically informed algorithm that uses solely Medicare claims to identify, with a high positive predictive value, incident breast cancer cases.
Data Source. Population-based Surveillance, Epidemiology, and End Results (SEER) Tumor Registry data linked to Medicare claims, and Medicare claims from a 5 percent random sample of beneficiaries in SEER areas.
Study Design. An algorithm was developed using claims from 1995 breast cancer patients from the SEER-Medicare database, as well as 1995 claims from Medicare control subjects. The algorithm was validated on claims from breast cancer subjects and controls from 1994. The algorithm development process used both clinical insight and logistic regression methods.
Data Extraction. Training set: Claims from 7,700 SEER-Medicare breast cancer subjects diagnosed in 1995, and 124,884 controls. Validation set: Claims from 7,607 SEER-Medicare breast cancer subjects diagnosed in 1994, and 120,317 controls.
Principal Findings. A four-step prediction algorithm was developed and validated. It has a positive predictive value of 89 to 93 percent, and a sensitivity of 80 percent for identifying incident breast cancer. The sensitivity is 82–87 percent for stage I or II, and lower for other stages. The sensitivity is 82–83 percent for women who underwent either breast-conserving surgery or mastectomy, and is similar across geographic sites. A cohort identified with this algorithm will have 89–93 percent incident breast cancer cases, 1.5–6 percent cancer-free cases, and 4–5 percent prevalent breast cancer cases.
Conclusions. This algorithm has better performance characteristics than previously proposed algorithms. The ability to examine national patterns of breast cancer care using Medicare claims data would open new avenues for the assessment of quality of care.  相似文献   

20.
Many countries, including the USA, publish predicted numbers of cancer incidence and death in current and future years for the whole country. These predictions provide important information on the cancer burden for cancer control planners, policymakers and the general public. Based on evidence from several empirical studies, the joinpoint (segmented‐line linear regression) model (JPM) has been adopted by the American Cancer Society to estimate the number of new cancer cases in the USA and in individual states since 2007. Recently, cancer incidence in smaller geographic regions such as counties, and local policy makers are increasingly interested with Federal Information Processing Standard code regions. The natural extension is to directly apply the JPM to county‐level cancer incidence data. The direct application has several drawbacks and its performance has not been evaluated. To address the concerns, we developed a spatial random‐effects JPM for county‐level cancer incidence data. The proposed model was used to predict both cancer incidence rates and counts at the county level. The standard JPM and the proposed method were compared through a validation study. The proposed method outperformed the standard JPM for almost all cancer sites, especially for moderate or rare cancer sites and for counties with small population sizes. As an application, we predicted county‐level prostate cancer incidence rates and counts for the year 2011 in Connecticut. Published 2013. This article is a US Government work and is in the public domain in the USA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号