首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
OBJECTIVE: Models that predict mortality after acute myocardial infarction (AMI) contain different predictors and are based on different populations. We studied the agreement and validity of predictions for individual patients. STUDY DESIGN AND SETTING: We compared predictions from five predictive logistic regression models for short-term mortality after AMI. Three models were developed previously, and two models were developed in the GUSTO-I data, where all five models were applied (n =40,830, 7.0% 30-day mortality). Agreement was studied with weighted kappa statistics of categorized predictions. Validity was assessed by comparing observed frequencies with predictions (indicating calibration) and by the area under the receiver operating characteristic curve (AUC), indicating discriminative ability. RESULTS: The predictions from the five models varied considerably for individual patients, with low agreement between most (kappa <0.6). Risk predictions from the three previously developed models were on average too high, which could be corrected by re-calibration of the model intercept. The AUC ranged from 0.76-0.78 and increased to 0.78-0.79 with re-estimated regression coefficients that were optimal for the GUSTO-I patients. The two more detailed GUSTO-I based models performed better (AUC approximately 0.82). CONCLUSION: Models with different predictors may have a similar validity while the agreement between predictions for individual patients is poor. The main concerns in the applicability of predictive models for AMI should relate to the selected predictors and average calibration.  相似文献   

2.
目的 比较L1正则化、L2正则化和弹性网三种惩罚logistic回归对SNPs数据的变量筛选能力。 方法 根据所设置的参数生成不同条件的SNPs仿真数据,利用正确率、错误率和正确指数从三个方面评价三种惩罚logistic回归的变量筛选能力。 结果 正确率表现为L2正则化惩罚logistic回归>弹性网惩罚logistic回归>L1正则化惩罚logistic回归;错误率表现为L2正则化惩罚logistic回归>弹性网惩罚logistic回归>L1正则化惩罚logistic回归;正确指数则表现为弹性网惩罚logistic回归>L1正则化惩罚logistic回归>L2正则化惩罚logistic回归。 结论 综合来看弹性网的筛选能力更优,弹性网融合L1、L2两种正则化的思想,在高维数据分析中既能保证模型的稀疏性,便于结果的解释,又解决了具有相关性自变量不能同时进入模型的问题。  相似文献   

3.
Clinicians and health service researchers are frequently interested in predicting patient-specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.  相似文献   

4.
Analysis of proportionate mortality data using logistic regression models   总被引:1,自引:0,他引:1  
When only proportionate mortality data are available to an investigator studying the effect of an exposure on a particular cause of death, controls must be selected from among persons dying of other causes believed to be uninfluenced by the exposure under study. When qualitative or quantitative estimates of exposure history can be obtained for the deceased individuals, it is shown that one can use logistic regression models for the mortality odds to efficiently estimate the effect of exposure while controlling for relevant confounding factors by incorporating a priori information on baseline mortality rates available from US life tables. The proposed method is used to reanalyze data from a cohort of arsenic-exposed workers in a Montana copper smelter.  相似文献   

5.
An empirical screening level approach was developed to assess the probability of toxicity to benthic organisms associated with contaminated sediment exposure. The study was based on simple logistic regression models (LRMs) of matching sediment chemistry and toxicity data retrieved from a large database of field-collected sediment samples contaminated with multiple chemicals. Three decisions were made to simplify the application of LRMs to sediment samples contaminated with multiple chemicals. First, percent mortality information associated with each sediment sample was condensed into a dichotomous response (i.e., toxic or nontoxic). Second, each LRM assumed that toxicity was attributable to a single contaminant. Third, individual contaminants present at low concentrations were excluded from toxic sediment samples. Based on an analysis of the National Sediment Inventory database, the LRM approach classified 55% of nontoxic sediments as toxic (i.e., false-positives). Because this approach has been used to assess the probability of benthic toxicity as reported by the U.S. Environmental Protection Agency (U.S. EPA), the resultant estimates of potential toxicity convey a misleading impression of the increased hazard that sediments pose to the health of aquatic organisms at many sites in the United States. This could result in important resources needlessly being diverted from truly contaminated sites to evaluate and possibly remediate sediments at uncontaminated sites.  相似文献   

6.
One way to monitor patient access to emergent health care services is to use patient characteristics to predict arrival time at the hospital after onset of symptoms. This predicted arrival time can then be compared with actual arrival time to allow monitoring of access to services. Predicted arrival time could also be used to estimate potential effects of changes in health care service availability, such as closure of an emergency department or an acute care hospital. Our goal was to determine the best statistical method for prediction of arrival intervals for patients with acute myocardial infarction (AMI) symptoms. We compared the performance of multinomial logistic regression (MLR) and discriminant analysis (DA) models. Models for MLR and DA were developed using a dataset of 3,566 male veterans hospitalized with AMI in 81 VA Medical Centers in 1994-1995 throughout the United States. The dataset was randomly divided into a training set (n = 1,846) and a test set (n = 1,720). Arrival times were grouped into three intervals on the basis of treatment considerations: <6 hours, 6-12 hours, and >12 hours. One model for MLR and two models for DA were developed using the training dataset. One DA model had equal prior probabilities, and one DA model had proportional prior probabilities. Predictive performance of the models was compared using the test (n = 1,720) dataset. Using the test dataset, the proportions of patients in the three arrival time groups were 60.9% for <6 hours, 10.3% for 6-12 hours, and 28.8% for >12 hours after symptom onset. Whereas the overall predictive performance by MLR and DA with proportional priors was higher, the DA models with equal priors performed much better in the smaller groups. Correct classifications were 62.6% by MLR, 62.4% by DA using proportional prior probabilities, and 48.1% using equal prior probabilities of the groups. The misclassifications by MLR for the three groups were 9.5%, 100.0%, 74.2% for each time interval, respectively. Misclassifications by DA models were 9.8%, 100.0%, and 74.4% for the model with proportional priors and 47.6%, 79.5%, and 51.0% for the model with equal priors. The choice of MLR or DA with proportional priors, or DA with equal priors for monitoring time intervals of predicted hospital arrival time for a population should depend on the consequences of misclassification errors.  相似文献   

7.
Armstrong and Sloan have reviewed two types of ordinal logistic models for epidemiologic data: the cumulative-odds model and the continuation-ratio model. I review here certain aspects of these models not emphasized previously, and describe a third type, the stereotype model, which in certain situations offers greater flexibility coupled with interpretational advantages. I illustrate the models in an analysis of pneumoconiosis among coal miners.  相似文献   

8.
Validation techniques for logistic regression models   总被引:4,自引:0,他引:4  
This paper presents a comprehensive approach to the validation of logistic prediction models. It reviews measures of overall goodness-of-fit, and indices of calibration and refinement. Using a model-based approach developed by Cox, we adapt logistic regression diagnostic techniques for use in model validation. This allows identification of problematic predictor variables in the prediction model as well as influential observations in the validation data that adversely affect the fit of the model. In appropriate situations, recommendations are made for correction of models that provide poor fit.  相似文献   

9.
When modeling the risk of a disease, the very act of selecting the factors to be included can heavily impact the results. This study compares the performance of several variable selection techniques applied to logistic regression. We performed realistic simulation studies to compare five methods of variable selection: (1) a confidence interval (CI) approach for significant coefficients, (2) backward selection, (3) forward selection, (4) stepwise selection, and (5) Bayesian stochastic search variable selection (SSVS) using both informed and uniformed priors. We defined our simulated diseases mimicking odds ratios for cancer risk found in the literature for environmental factors, such as smoking; dietary risk factors, such as fiber; genetic risk factors, such as XPD; and interactions. We modeled the distribution of our covariates, including correlation, after the reported empirical distributions of these risk factors. We also used a null data set to calibrate the priors of the Bayesian method and evaluate its sensitivity. Of the standard methods (95 per cent CI, backward, forward, and stepwise selection) the CI approach resulted in the highest average per cent of correct associations and the lowest average per cent of incorrect associations. SSVS with an informed prior had a higher average per cent of correct associations and a lower average per cent of incorrect associations than the CI approach. This study shows that the Bayesian methods offer a way to use prior information to both increase power and decrease false-positive results when selecting factors to model complex disease risk.  相似文献   

10.
11.
We consider a general model for anomaly detection in a longitudinal cohort mortality pattern based on logistic joinpoint regression with unknown joinpoints. We discuss backward and forward sequential procedures for selecting both the locations and the number of joinpoints. Estimation of the model parameters and the selection algorithms are illustrated with longitudinal data on cancer mortality in a cohort of chemical workers.  相似文献   

12.
Logistic regression is the standard method for assessing predictors of diseases. In logistic regression analyses, a stepwise strategy is often adopted to choose a subset of variables. Inference about the predictors is then made based on the chosen model constructed of only those variables retained in that model. This method subsequently ignores both the variables not selected by the procedure, and the uncertainty due to the variable selection procedure. This limitation may be addressed by adopting a Bayesian model averaging approach, which selects a number of all possible such models, and uses the posterior probabilities of these models to perform all inferences and predictions. This study compares the Bayesian model averaging approach with the stepwise procedures for selection of predictor variables in logistic regression using simulated data sets and the Framingham Heart Study data. The results show that in most cases Bayesian model averaging selects the correct model and out-performs stepwise approaches at predicting an event of interest.  相似文献   

13.
Although a wide variety of change-point models are available for continuous outcomes, few models are available for dichotomous outcomes. This paper introduces transition methods for logistic regression models in which the dose-response relationship follows two different straight lines, which may intersect or may present a jump at an unknown change-point. In these models, the logit includes a differentiable transition function that provides parametric control of the sharpness of the transition at the change-point, allowing for abrupt changes or more gradual transitions between the two different linear trends, as well as for estimation of the location of the change-point. Linear-linear logistic models are particular cases of the proposed transition models. We present a modified iteratively reweighted least squares algorithm to estimate model parameters, and we provide inference procedures including a test for the existence of the change-point. These transition models are explored in a simulation study, and they are used to evaluate the existence of a change-point in the association between plasma glucose after an oral glucose tolerance test and mortality using data from the Mortality Follow-up of the Second National Health and Nutrition Examination Survey.  相似文献   

14.
Correlation is inherent in longitudinal studies due to the repeated measurements on subjects, as well as due to time-dependent covariates in the study. In the National Longitudinal Study of Adolescent to Adult Health (Add Health), data were repeatedly collected on children in grades 7-12 across four waves. Thus, observations obtained on the same adolescent were correlated, while predictors were correlated with current and future outcomes such as obesity status, among other health issues. Previous methods, such as the generalized method of moments (GMM) approach have been proposed to estimate regression coefficients for time-dependent covariates. However, these approaches combined all valid moment conditions to produce an averaged parameter estimate for each covariate and thus assumed that the effect of each covariate on the response was constant across time. This assumption is not necessarily optimal in applications such as Add Health or health-related data. Thus, we depart from this assumption and instead use the Partitioned GMM approach to estimate multiple coefficients for the data based on different time periods. These extra regression coefficients are obtained using a partitioning of the moment conditions pertaining to each respective relationship. This approach offers a deeper understanding and appreciation into the effect of each covariate on the response. We conduct simulation studies, as well as analyses of obesity in Add Health, rehospitalization in Medicare data, and depression scores in a clinical study. The Partitioned GMM methods exhibit benefits over previously proposed models with improved insight into the nonconstant relationships realized when analyzing longitudinal data.  相似文献   

15.
We examine the properties of several tests for goodness-of-fit for multinomial logistic regression. One test is based on a strategy of sorting the observations according to the complement of the estimated probability for the reference outcome category and then grouping the subjects into g equal-sized groups. A g x c contingency table, where c is the number of values of the outcome variable, is constructed. The test statistic, denoted as Cg, is obtained by calculating the Pearson chi2 statistic where the estimated expected frequencies are the sum of the model-based estimated logistic probabilities. Simulations compare the properties of Cg with those of the ungrouped Pearson chi2 test (X2) and its normalized test (z). The null distribution of Cg is well approximated by the chi2 distribution with (g-2) x (c-1) degrees of freedom. The sampling distribution of X2 is compared with a chi2 distribution with n x (c-1) degrees of freedom but shows erratic behavior. With a few exceptions, the sampling distribution of z adheres reasonably well to the standard normal distribution. Power simulations show that Cg has low power for a sample of 100 observations, but satisfactory power for a sample of 400. The tests are illustrated using data from a study of cytological criteria for the diagnosis of breast tumors.  相似文献   

16.
There is increasing interest in the identification of predictors of risk for in-hospital mortality due to acute myocardial infarction (AMI). This study identified significant predictors of in-hospital mortality among AMI patients using a patient level clinical database. The study population consisted of 4167 cases admitted between October 1999 and April 2001 with a principal diagnosis of AMI to 36 hospitals in three US states. Of the 182 available variables in the clinical data set, 30 variables were used as candidate predictors, and 19 showed significant univariate association with AMI in-hospital mortality. By applying multiple logistic regression and stepwise selection, a final prediction model for AMI in-hospital mortality was developed. Variables included in the final model were age, arrived from cardiac rehabilitation centre, cardiopulmonary resuscitation (CPR) on arrival, Killip class, AMI with co-morbid conditions, AMI with complications, percutaneous transluminal coronary angioplasty (PTCA) performed, beta-blockers given, angiotensin-converting enzyme (ACE) inhibitors given, Plavix given. A 10-variable in-hospital mortality prediction model for AMI patients, which includes both risk factors and beneficial treatment procedures, was developed. chi(2) goodness of fit test suggested a good fit for the model.  相似文献   

17.
This paper considers the problem of selecting a set of regressors when the response variable is distributed according to a specified parametric model and observations are censored. Under a Bayesian perspective, the most widely used tools are Bayes factors (BFs), which are undefined when improper priors are used. In order to overcome this issue, fractional (FBF) and intrinsic (IBF) BFs have become common tools for model selection. Both depend on the size, Nt, of a minimal training sample (MTS), while the IBF also depends on the specific MTS used. In the case of regression with censored data, the definition of an MTS is problematic because only uncensored data allow to turn the improper prior into a proper posterior and also because full exploration of the space of the MTSs, which includes also censored observations, is needed to avoid bias in model selection. To address this concern, a sequential MTS was proposed, but it has the drawback of an increase of the number of possible MTSs as Nt becomes random. For this reason, we explore the behaviour of the FBF, contextualizing its definition to censored data. We show that these are consistent, providing also the corresponding fractional prior. Finally, a large simulation study and an application to real data are used to compare IBF, FBF and the well‐known Bayesian information criterion. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

18.
Variable selection in regression with very big numbers of variables is challenging both in terms of model specification and computation. We focus on genetic studies in the field of survival, and we present a Bayesian-inspired penalized maximum likelihood approach appropriate for high-dimensional problems. In particular, we employ a simple, efficient algorithm that seeks maximum a posteriori (MAP) estimates of regression coefficients. The latter are assigned a Laplace prior with a sharp mode at zero, and non-zero posterior mode estimates correspond to significant single nucleotide polymorphisms (SNPs). Using the Laplace prior reflects a prior belief that only a small proportion of the SNPs significantly influence the response. The method is fast and can handle datasets arising from imputation or resequencing. We demonstrate the localization performance, power and false-positive rates of our method in large simulation studies of dense-SNP datasets and sequence data, and we compare the performance of our method to the univariate Cox regression and to a recently proposed stochastic search approach. In general, we find that our approach improves localization and power slightly, while the biggest advantage is in false-positive counts and computing times. We also apply our method to a real prospective study, and we observe potential association between candidate ABC transporter genes and epilepsy treatment outcomes.  相似文献   

19.
Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis, it is often necessary to discriminate between influential and noninfluential markers. It is common to perform univariate screening using Cox scores, which quantify the associations between survival and each of the markers to provide a ranking. Since Cox scores do not account for dependencies between the markers, their use is suboptimal in the presence of highly correlated markers. Methods: As an alternative to the Cox score, we propose the correlation-adjusted regression survival (CARS) score for right-censored survival outcomes. By removing the correlations between the markers, the CARS score quantifies the associations between the outcome and the set of “decorrelated” marker values. Estimation of the scores is based on inverse probability weighting, which is applied to log-transformed event times. For high-dimensional data, estimation is based on shrinkage techniques. Results: The consistency of the CARS score is proven under mild regularity conditions. In simulations with high correlations, survival models based on CARS score rankings achieved higher areas under the precision-recall curve than competing methods. Two example applications on prostate and breast cancer confirmed these results. CARS scores are implemented in the R package carSurv. Conclusions: In research applications involving high-dimensional genetic data, the use of CARS scores for marker selection is a favorable alternative to Cox scores even when correlations between covariates are low. Having a straightforward interpretation and low computational requirements, CARS scores are an easy-to-use screening tool in personalized medicine research.  相似文献   

20.
The question of whether two disorders cluster together, or coaggregate, within families often arises. This paper considers how to analyze familial aggregation of two disorders and presents two multivariate logistic regression methods that model both disorder outcomes simultaneously. The first, a proband predictive model, predicts a relative's outcomes (the presence or absence of each of the two disorders) by using the proband's disorder status. The second, a family predictive model derived from the quadratic exponential model, predicts a family member's outcomes by using all of the remaining family members' disorder statuses. The models are more realistic, flexible, and powerful than univariate models. Methods for estimation and testing account for the correlation of outcomes among family members and can be implemented by using commercial software.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号