首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Validation techniques for logistic regression models   总被引:4,自引:0,他引:4  
This paper presents a comprehensive approach to the validation of logistic prediction models. It reviews measures of overall goodness-of-fit, and indices of calibration and refinement. Using a model-based approach developed by Cox, we adapt logistic regression diagnostic techniques for use in model validation. This allows identification of problematic predictor variables in the prediction model as well as influential observations in the validation data that adversely affect the fit of the model. In appropriate situations, recommendations are made for correction of models that provide poor fit.  相似文献   

2.
Quality of life has been increasingly emphasized in public health research in recent years. Typically, the results of quality of life are measured by means of ordinal scales. In these situations, specific statistical methods are necessary because procedures such as either dichotomization or misinformation on the distribution of the outcome variable may complicate the inferential process. Ordinal logistic regression models are appropriate in many of these situations. This article presents a review of the proportional odds model, partial proportional odds model, continuation ratio model, and stereotype model. The fit, statistical inference, and comparisons between models are illustrated with data from a study on quality of life in 273 patients with schizophrenia. All tested models showed good fit, but the proportional odds or partial proportional odds models proved to be the best choice due to the nature of the data and ease of interpretation of the results. Ordinal logistic models perform differently depending on categorization of outcome, adequacy in relation to assumptions, goodness-of-fit, and parsimony.  相似文献   

3.
Conditional logistic regression was developed to avoid "sparse-data" biases that can arise in ordinary logistic regression analysis. Nonetheless, it is a large-sample method that can exhibit considerable bias when certain types of matched sets are infrequent or when the model contains too many parameters. Sparse-data bias can cause misleading inferences about confounding, effect modification, dose response, and induction periods, and can interact with other biases. In this paper, the authors describe these problems in the context of matched case-control analysis and provide examples from a study of electrical wiring and childhood leukemia and a study of diet and glioma. The same problems can arise in any likelihood-based analysis, including ordinary logistic regression. The problems can be detected by careful inspection of data and by examining the sensitivity of estimates to category boundaries, variables in the model, and transformations of those variables. One can also apply various bias corrections or turn to methods less sensitive to sparse data than conditional likelihood, such as Bayesian and empirical-Bayes (hierarchical regression) methods.  相似文献   

4.
5.
6.
A logistic regression model may be used to provide predictions of outcome for individual patients at another centre than where the model was developed. When empirical data are available from this centre, the validity of predictions can be assessed by comparing observed outcomes and predicted probabilities. Subsequently, the model may be updated to improve predictions for future patients. As an example, we analysed 30-day mortality after acute myocardial infarction in a large data set (GUSTO-I, n = 40 830). We validated and updated a previously published model from another study (TIMI-II, n = 3339) in validation samples ranging from small (200 patients, 14 deaths) to large (10,000 patients, 700 deaths). Updated models were tested on independent patients. Updating methods included re-calibration (re-estimation of the intercept or slope of the linear predictor) and more structural model revisions (re-estimation of some or all regression coefficients, model extension with more predictors). We applied heuristic shrinkage approaches in the model revision methods, such that regression coefficients were shrunken towards their re-calibrated values. Parsimonious updating methods were found preferable to more extensive model revisions, which should only be attempted with relatively large validation samples in combination with shrinkage.  相似文献   

7.
8.
9.
Although a wide variety of change-point models are available for continuous outcomes, few models are available for dichotomous outcomes. This paper introduces transition methods for logistic regression models in which the dose-response relationship follows two different straight lines, which may intersect or may present a jump at an unknown change-point. In these models, the logit includes a differentiable transition function that provides parametric control of the sharpness of the transition at the change-point, allowing for abrupt changes or more gradual transitions between the two different linear trends, as well as for estimation of the location of the change-point. Linear-linear logistic models are particular cases of the proposed transition models. We present a modified iteratively reweighted least squares algorithm to estimate model parameters, and we provide inference procedures including a test for the existence of the change-point. These transition models are explored in a simulation study, and they are used to evaluate the existence of a change-point in the association between plasma glucose after an oral glucose tolerance test and mortality using data from the Mortality Follow-up of the Second National Health and Nutrition Examination Survey.  相似文献   

10.
Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validation-study data. These methods are readily applicable under random cross-sectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.  相似文献   

11.
OBJECTIVE: Ordinal scales often generate scores with skewed data distributions. The optimal method of analyzing such data is not entirely clear. The objective was to compare four statistical multivariable strategies for analyzing skewed health-related quality of life (HRQOL) outcome data. HRQOL data were collected at 1 year following catheterization using the Seattle Angina Questionnaire (SAQ), a disease-specific quality of life and symptom rating scale. STUDY DESIGN AND SETTING: In this methodological study, four regression models were constructed. The first model used linear regression. The second and third models used logistic regression with two different cutpoints and the fourth model used ordinal regression. To compare the results of these four models, odds ratios, 95% confidence intervals, and 95% confidence interval widths (i.e., ratios of upper to lower confidence interval endpoints) were assessed. RESULTS: Relative to the two logistic regression analysis, the linear regression model and the ordinal regression model produced more stable parameter estimates with smaller confidence interval widths. CONCLUSION: A combination of analysis results from both of these models (adjusted SAQ scores and odds ratios) provides the most comprehensive interpretation of the data.  相似文献   

12.
BACKGROUND/OBJECTIVES: A population-based retrospective cohort study of triplet pregnancies was conducted to estimate individual probabilities of neonatal mortality (death within 28 days of birth) conditional on the number of neonatal deaths experienced by other infants in the triplet set. METHODS: Data on 4,697 triplet sets (14,091 births) were derived from the U.S. 1995-1997 matched multiple birth file assembled by the National Center for Health Statistics. Response conditional multivariate logistic regression was used to model the association of neonatal mortality among cotriplets. To account for the correlation of the outcomes among cotriplets, regression parameters were estimated by the methodology of generalized estimating equations with robust variance estimates. RESULTS: Compared with a triplet where both cotriplets survived the neonatal period, the adjusted odds ratio and 95% confidence interval (CI) for a neonatal death associated with one and two cotriplet neonatal deaths were 1.80 (95% CI 1.06, 3.04), and 13.41 (95% CI 2.31, 77.7), respectively, after adjusting for birthweight and gestational age. CONCLUSIONS: These results show strong evidence of clustering of neonatal deaths in triplet pregnancies.  相似文献   

13.
食管病变影响因素的多项式Logistic回归分析   总被引:1,自引:0,他引:1  
目的探讨食管癌家族史、吸烟和饮酒与食管炎症、增生、早期癌间的关联。方法在食管癌高发社区40-69岁队列人群中,将内镜检查正常者作为对照组,炎症、增生、早期癌患者分别作为病例组,所有病例均经染色内镜和病理活检确诊。采用多项式Logistic回归模型,将OR作为评价关联强度的指标。结果确诊早期癌71例、增生266例、炎症144例,食管黏膜正常2818例。有食管癌家族史与食管早期癌、增生、炎症均呈显著性相关;其中一级血亲患食管癌在3组的OR值分别为2.6(95%CI=1.36~4.85).2.1(95%CI=1.43~2.99),1.8(95%CI=1.13~3.03),但二级血亲患食管癌在3组的OR值差异均无统计学意义。有食管癌家族史尤其是一级血亲患有食管癌者吸烟或/和饮酒能显著增加食管癌的危险性。结论在食管癌高发区,有食管癌家族史者应戒烟酒,以预防食管良恶性疾病的发生。  相似文献   

14.
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.  相似文献   

15.
We compare parameter estimates from the proportional hazards model, the cumulative logistic model and a new modified logistic model (referred to as the person-time logistic model), with the use of simulated data sets and with the following quantities varied: disease incidence, risk factor strength, length of follow-up, the proportion censored, non-proportional hazards, and sample size. Parameter estimates from the person-time logistic regression model closely approximated those from the Cox model when the survival time distribution was close to exponential, but could differ substantially in other situations. We found parameter estimates from the cumulative logistic model similar to those from the Cox and person-time logistic models when the disease was rare, the risk factor moderate, and censoring rates similar across the covariates. We also compare the models with analysis of a real data set that involves the relationship of age, race, sex, blood pressure, and smoking to subsequent mortality. In this example, the length of follow-up among survivors varied from 5 to 14 years and the Cox and person-time logistic approaches gave nearly identical results. The cumulative logistic results had somewhat larger p-values but were substantively similar for all but one coefficient (the age-race interaction). The latter difference reflects differential censoring rates by age, race and sex.  相似文献   

16.
17.
Analysis of proportionate mortality data using logistic regression models   总被引:1,自引:0,他引:1  
When only proportionate mortality data are available to an investigator studying the effect of an exposure on a particular cause of death, controls must be selected from among persons dying of other causes believed to be uninfluenced by the exposure under study. When qualitative or quantitative estimates of exposure history can be obtained for the deceased individuals, it is shown that one can use logistic regression models for the mortality odds to efficiently estimate the effect of exposure while controlling for relevant confounding factors by incorporating a priori information on baseline mortality rates available from US life tables. The proposed method is used to reanalyze data from a cohort of arsenic-exposed workers in a Montana copper smelter.  相似文献   

18.
Many diseases such as cancer and heart diseases are heterogeneous and it is of great interest to study the disease risk specific to the subtypes in relation to genetic and environmental risk factors. However, due to logistic and cost reasons, the subtype information for the disease is missing for some subjects. In this article, we investigate methods for multinomial logistic regression with missing outcome data, including a bootstrap hot deck multiple imputation (BHMI), simple inverse probability weighted (SIPW), augmented inverse probability weighted (AIPW), and expected estimating equation (EEE) estimators. These methods are important approaches for missing data regression. The BHMI modifies the standard hot deck multiple imputation method such that it can provide valid confidence interval estimation. Under the situation when the covariates are discrete, the SIPW, AIPW, and EEE estimators are numerically identical. When the covariates are continuous, nonparametric smoothers can be applied to estimate the selection probabilities and the estimating scores. These methods perform similarly. Extensive simulations show that all of these methods yield unbiased estimators while the complete-case (CC) analysis can be biased if the missingness depends on the observed data. Our simulations also demonstrate that these methods can gain substantial efficiency compared with the CC analysis. The methods are applied to a colorectal cancer study in which cancer subtype data are missing among some study individuals.  相似文献   

19.
A threshold effect takes place in situations where the relationship between an outcome variable and a predictor variable changes as the predictor value crosses a certain threshold/change point. Threshold effects are often plausible in a complex biological system, especially in defining immune responses that are protective against infections such as HIV‐1, which motivates the current work. We study two hypothesis testing problems in change point models. We first compare three different approaches to obtaining a p‐value for the maximum of scores test in a logistic regression model with change point variable as a main effect. Next, we study the testing problem in a logistic regression model with the change point variable both as a main effect and as part of an interaction term. We propose a test based on the maximum of likelihood ratios test statistic and obtain its reference distribution through a Monte Carlo method. We also propose a maximum of weighted scores test that can be more powerful than the maximum of likelihood ratios test when we know the direction of the interaction effect. In simulation studies, we show that the proposed tests have a correct type I error and higher power than several existing methods. We illustrate the application of change point model‐based testing methods in a recent study of immune responses that are associated with the risk of mother to child transmission of HIV‐1. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号