首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Armstrong and Sloan have reviewed two types of ordinal logistic models for epidemiologic data: the cumulative-odds model and the continuation-ratio model. I review here certain aspects of these models not emphasized previously, and describe a third type, the stereotype model, which in certain situations offers greater flexibility coupled with interpretational advantages. I illustrate the models in an analysis of pneumoconiosis among coal miners.  相似文献   

2.
Validation techniques for logistic regression models   总被引:4,自引:0,他引:4  
This paper presents a comprehensive approach to the validation of logistic prediction models. It reviews measures of overall goodness-of-fit, and indices of calibration and refinement. Using a model-based approach developed by Cox, we adapt logistic regression diagnostic techniques for use in model validation. This allows identification of problematic predictor variables in the prediction model as well as influential observations in the validation data that adversely affect the fit of the model. In appropriate situations, recommendations are made for correction of models that provide poor fit.  相似文献   

3.
Hu B  Palta M  Shao J 《Statistics in medicine》2006,25(8):1383-1395
Various R(2) statistics have been proposed for logistic regression to quantify the extent to which the binary response can be predicted by a given logistic regression model and covariates. We study the asymptotic properties of three popular variance-based R(2) statistics. We find that two variance-based R(2) statistics, the sum of squares and the squared Pearson correlation, have identical asymptotic distribution whereas the third one, Gini's concentration measure, has a different asymptotic behaviour and may overstate the predictivity of the model and covariates when the model is mis-specified. Our result not only provides a theoretical basis for the findings in previous empirical and numerical work, but also leads to asymptotic confidence intervals. Statistical variability can then be taken into account when assessing the predictive value of a logistic regression model.  相似文献   

4.
5.
6.
In cross-sectional studies or studies based on questionnaires, errors in exposures and misclassification of health status may be related. The reason may be that some subjects tend to over- or underreport both exposure and disease. The author investigated the effects of such dependent misclassification from a threshold-model point of view, in that an assumption was made of an underlying linear relation between a continuous exposure and response, both measured with error, and where these errors are correlated. Allowance is also made for covariates measured without error. This approach enables the derivation of explicit expressions for bias in the estimated association between exposure and outcome in different situations. It is shown that, dependent on the true effect of the exposure, the effect of the errors can be both an over- and an underestimation of the true relation. In addition, a study design from which the true effect can be consistently estimated is also provided.  相似文献   

7.
Correlation is inherent in longitudinal studies due to the repeated measurements on subjects, as well as due to time-dependent covariates in the study. In the National Longitudinal Study of Adolescent to Adult Health (Add Health), data were repeatedly collected on children in grades 7-12 across four waves. Thus, observations obtained on the same adolescent were correlated, while predictors were correlated with current and future outcomes such as obesity status, among other health issues. Previous methods, such as the generalized method of moments (GMM) approach have been proposed to estimate regression coefficients for time-dependent covariates. However, these approaches combined all valid moment conditions to produce an averaged parameter estimate for each covariate and thus assumed that the effect of each covariate on the response was constant across time. This assumption is not necessarily optimal in applications such as Add Health or health-related data. Thus, we depart from this assumption and instead use the Partitioned GMM approach to estimate multiple coefficients for the data based on different time periods. These extra regression coefficients are obtained using a partitioning of the moment conditions pertaining to each respective relationship. This approach offers a deeper understanding and appreciation into the effect of each covariate on the response. We conduct simulation studies, as well as analyses of obesity in Add Health, rehospitalization in Medicare data, and depression scores in a clinical study. The Partitioned GMM methods exhibit benefits over previously proposed models with improved insight into the nonconstant relationships realized when analyzing longitudinal data.  相似文献   

8.
We examine the properties of several tests for goodness-of-fit for multinomial logistic regression. One test is based on a strategy of sorting the observations according to the complement of the estimated probability for the reference outcome category and then grouping the subjects into g equal-sized groups. A g x c contingency table, where c is the number of values of the outcome variable, is constructed. The test statistic, denoted as Cg, is obtained by calculating the Pearson chi2 statistic where the estimated expected frequencies are the sum of the model-based estimated logistic probabilities. Simulations compare the properties of Cg with those of the ungrouped Pearson chi2 test (X2) and its normalized test (z). The null distribution of Cg is well approximated by the chi2 distribution with (g-2) x (c-1) degrees of freedom. The sampling distribution of X2 is compared with a chi2 distribution with n x (c-1) degrees of freedom but shows erratic behavior. With a few exceptions, the sampling distribution of z adheres reasonably well to the standard normal distribution. Power simulations show that Cg has low power for a sample of 100 observations, but satisfactory power for a sample of 400. The tests are illustrated using data from a study of cytological criteria for the diagnosis of breast tumors.  相似文献   

9.
Although a wide variety of change-point models are available for continuous outcomes, few models are available for dichotomous outcomes. This paper introduces transition methods for logistic regression models in which the dose-response relationship follows two different straight lines, which may intersect or may present a jump at an unknown change-point. In these models, the logit includes a differentiable transition function that provides parametric control of the sharpness of the transition at the change-point, allowing for abrupt changes or more gradual transitions between the two different linear trends, as well as for estimation of the location of the change-point. Linear-linear logistic models are particular cases of the proposed transition models. We present a modified iteratively reweighted least squares algorithm to estimate model parameters, and we provide inference procedures including a test for the existence of the change-point. These transition models are explored in a simulation study, and they are used to evaluate the existence of a change-point in the association between plasma glucose after an oral glucose tolerance test and mortality using data from the Mortality Follow-up of the Second National Health and Nutrition Examination Survey.  相似文献   

10.
Synthesis analysis refers to a statistical method that integrates multiple univariate regression models and the correlation between each pair of predictors into a single multivariate regression model. The practical application of such a method could be developing a multivariate disease prediction model where a dataset containing the disease outcome and every predictor of interest is not available. In this study, we propose a new version of synthesis analysis that is specific to binary outcomes. We show that our proposed method possesses desirable statistical properties. We also conduct a simulation study to assess the robustness of the proposed method and compare it to a competing method. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

11.
We analyze data obtained from a study designed to evaluate training effects on the performance of certain motor activities of Parkinson's disease patients. Maximum likelihood methods were used to fit beta-binomial/Poisson regression models tailored to evaluate the effects of training on the numbers of attempted and successful specified manual movements in 1 min periods, controlling for disease stage and use of the preferred hand. We extend models previously considered by other authors in univariate settings to account for the repeated measures nature of the data. The results suggest that the expected number of attempts and successes increase with training, except for patients with advanced stages of the disease using the non-preferred hand.  相似文献   

12.
Aalen's additive hazards regression model is a useful alternative to the proportional hazards model for censored data regression. When used to compare treatments this approach leads to weighted comparisons of the crude estimate of the hazard rate of each group as compared to a baseline group. This is contrasted to the weighted log rank test from the proportional hazards model which compares each treatment's rate to the pooled rate. We show in this brief note that Aalen's suggestion for weights in this test leads to inconsistent tests in the sense that the test statistic depends on which group we pick for a baseline group. We show that 'consistent' tests are obtained by using common weight functions for all comparisons and we make some suggestions.  相似文献   

13.
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.  相似文献   

14.
15.
We compare parameter estimates from the proportional hazards model, the cumulative logistic model and a new modified logistic model (referred to as the person-time logistic model), with the use of simulated data sets and with the following quantities varied: disease incidence, risk factor strength, length of follow-up, the proportion censored, non-proportional hazards, and sample size. Parameter estimates from the person-time logistic regression model closely approximated those from the Cox model when the survival time distribution was close to exponential, but could differ substantially in other situations. We found parameter estimates from the cumulative logistic model similar to those from the Cox and person-time logistic models when the disease was rare, the risk factor moderate, and censoring rates similar across the covariates. We also compare the models with analysis of a real data set that involves the relationship of age, race, sex, blood pressure, and smoking to subsequent mortality. In this example, the length of follow-up among survivors varied from 5 to 14 years and the Cox and person-time logistic approaches gave nearly identical results. The cumulative logistic results had somewhat larger p-values but were substantively similar for all but one coefficient (the age-race interaction). The latter difference reflects differential censoring rates by age, race and sex.  相似文献   

16.
17.
OBJECTIVE: Ordinal scales often generate scores with skewed data distributions. The optimal method of analyzing such data is not entirely clear. The objective was to compare four statistical multivariable strategies for analyzing skewed health-related quality of life (HRQOL) outcome data. HRQOL data were collected at 1 year following catheterization using the Seattle Angina Questionnaire (SAQ), a disease-specific quality of life and symptom rating scale. STUDY DESIGN AND SETTING: In this methodological study, four regression models were constructed. The first model used linear regression. The second and third models used logistic regression with two different cutpoints and the fourth model used ordinal regression. To compare the results of these four models, odds ratios, 95% confidence intervals, and 95% confidence interval widths (i.e., ratios of upper to lower confidence interval endpoints) were assessed. RESULTS: Relative to the two logistic regression analysis, the linear regression model and the ordinal regression model produced more stable parameter estimates with smaller confidence interval widths. CONCLUSION: A combination of analysis results from both of these models (adjusted SAQ scores and odds ratios) provides the most comprehensive interpretation of the data.  相似文献   

18.
Clinicians and health service researchers are frequently interested in predicting patient-specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.  相似文献   

19.
Assessing goodness-of-fit in logistic regression models can be problematic, in that commonly used deviance or Pearson chi-square statistics do not have approximate chi-square distributions, under the null hypothesis of no lack of fit, when continuous covariates are modelled. We present two easy to implement test statistics similar to the deviance and Pearson chi-square tests that are appropriate when continuous covariates are present. The methodology uses an approach similar to that incorporated by the Hosmer and Lemeshow goodness-of-fit test in that observations are classified into distinct groups according to fitted probabilities, allowing sufficient cell sizes for chi-square testing. The major difference is that the proposed tests perform this grouping within the cross-classification of all categorical covariates in the model and, in some situations, allow for a more powerful assessment of where model predicted and observed counts may differ. A variety of simulations are performed comparing the proposed tests to the Hosmer-Lemeshow test.  相似文献   

20.
The aim of this study was to use Monte Carlo simulations to compare logistic regression with propensity scores in terms of bias, precision, empirical coverage probability, empirical power, and robustness when the number of events is low relative to the number of confounders. The authors simulated a cohort study and performed 252,480 trials. In the logistic regression, the bias decreased as the number of events per confounder increased. In the propensity score, the bias decreased as the strength of the association of the exposure with the outcome increased. Propensity scores produced estimates that were less biased, more robust, and more precise than the logistic regression estimates when there were seven or fewer events per confounder. The logistic regression empirical coverage probability increased as the number of events per confounder increased. The propensity score empirical coverage probability decreased after eight or more events per confounder. Overall, the propensity score exhibited more empirical power than logistic regression. Propensity scores are a good alternative to control for imbalances when there are seven or fewer events per confounder; however, empirical power could range from 35% to 60%. Logistic regression is the technique of choice when there are at least eight events per confounder.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号