首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Applied researchers frequently use automated model selection methods, such as backwards variable elimination, to develop parsimonious regression models. Statisticians have criticized the use of these methods for several reasons, amongst them are the facts that the estimated regression coefficients are biased and that the derived confidence intervals do not have the advertised coverage rates. We developed a method to improve estimation of regression coefficients and confidence intervals which employs backwards variable elimination in multiple bootstrap samples. In a given bootstrap sample, predictor variables that are not selected for inclusion in the final regression model have their regression coefficient set to zero. Regression coefficients are averaged across the bootstrap samples, and non-parametric percentile bootstrap confidence intervals are then constructed for each regression coefficient. We conducted a series of Monte Carlo simulations to examine the performance of this method for estimating regression coefficients and constructing confidence intervals for variables selected using backwards variable elimination. We demonstrated that this method results in confidence intervals with superior coverage compared with those developed from conventional backwards variable elimination. We illustrate the utility of our method by applying it to a large sample of subjects hospitalized with a heart attack.  相似文献   

2.
The current study examined the impact of a censored independent variable, after adjusting for a second independent variable, when estimating regression coefficients using ‘naïve’ ordinary least squares (OLS), ‘partial’ OLS and full‐likelihood models. We used Monte Carlo simulations to determine the bias associated with all three regression methods. We demonstrated that substantial bias was introduced in the estimation of the regression coefficient associated with the variable subject to a ceiling effect when naïve OLS regression was used. Furthermore, minor bias was transmitted to the estimation of the regression coefficient associated with the second independent variable. High correlation between the two independent variables improved estimation of the censored variable's coefficient at the expense of estimation of the other coefficient. The use of ‘partial’ OLS and maximum‐likelihood estimation were shown to result in, at most, negligible bias in estimation. Furthermore, we demonstrated that the full‐likelihood method was robust under misspecification of the joint distribution of the independent random variables. Lastly, we provided an empirical example using National Population Health Survey (NPHS) data to demonstrate the practical implications of our main findings and the simple methods available to circumvent the bias identified in the Monte Carlo simulations. Our results suggest that researchers need to be aware of the bias associated with the use of naïve ordinary least‐squares estimation when estimating regression models in which at least one independent variable is subject to a ceiling effect. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

3.
目的将相对权重指标扩展应用于logistic回归分析,以更精确评价自变量的相对重要性。方法原始变量通过最小二乘正交变换获得一组独立不相关但与原变量最大相关的新变量集,并对因变量关于新变量集作回归分析获取一组标准回归系数β,再通过分析正交变量对原变量的回归作用返回至原变量集获取一组相关系数λ,最后对这两组估计参数平方乘积和所得结果就是自变量成比例贡献于因变量的重要性。结果相对权重总和等于模型的总变异R2,有效地分配了每个自变量对因变量的贡献大小。结论当存在共线性问题时,相对权重是评价自变量相对重要性的精确量化指标,为许多分类资料分析中希望确定自变量相对重要性的研究者提供一个可行的估计方法 。  相似文献   

4.
Growth models are commonly used in life course epidemiology to describe growth trajectories and their determinants or to relate particular patterns of change to later health outcomes. However, methods to analyse relationships between two or more change processes occurring in parallel, in particular to assess evidence for causal influences of change in one variable on subsequent changes in another, are less developed. We discuss linear spline multilevel models with a multivariate response and show how these can be used to relate rates of change in a particular time period in one variable to later rates of change in another variable by using the variances and covariances of individual‐level random effects for each of the splines. We describe how regression coefficients can be calculated for these associations and how these can be adjusted for other parameters such as random effect variables relating to baseline values or rates of change in earlier time periods, and compare different methods for calculating the standard errors of these regression coefficients. We also show that these models can equivalently be fitted in the structural equation modelling framework and apply each method to weight and mean arterial pressure changes during pregnancy, obtaining similar results for multilevel and structural equation models. This method improves on the multivariate linear growth models, which have been used previously to model parallel processes because it enables nonlinear patterns of change to be modelled and the temporal sequence of multivariate changes to be determined, with adjustment for change in earlier time periods. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
Quan H  Zhang J 《Statistics in medicine》2003,22(17):2723-2736
Analyses of study variables are frequently based on log transformations. To calculate the power for detecting the between-treatment difference in the log scale, we need an estimate of the standard deviation of the log-transformed variable. However, in many situations a literature search only provides the arithmetic means and the corresponding standard deviations. Without individual log-transformed data to directly calculate the sample standard deviation, we need alternative methods to estimate it. This paper presents methods for estimating and constructing confidence intervals for the standard deviation of a log-transformed variable given the mean and standard deviation of the untransformed variable. It also presents methods for estimating the standard deviation of change from baseline in the log scale given the means and standard deviations of the untransformed baseline value, on-treatment value and change from baseline. Simulations and examples are provided to assess the performance of these estimates.  相似文献   

6.
BACKGROUND AND OBJECTIVE: A continued controversy exists whether the assessment of the influence of low birth weight on adult blood pressure necessitates adjustment for adult weight in the analysis on the fetal origins of adult diseases hypothesis. Here we first explain the difficulty in understanding an adjusted multivariate regression model, and then propose another way of writing the regression model to make the interpretation of the separate influence of birth weight and changes in weight later in life more straightforward. STUDY DESIGN AND SETTING: We used a multivariate regression model containing birth weight (standard deviation score; SDS), and residual adult weight (SDS) to explore the effect on blood pressure (or any other outcome) separately. Residual adult weight was calculated as the difference between actual adult weight and the expected adult weight (SDS) given on a certain birth weight (SDS). RESULTS: The coefficients of birth weight and residual adult weight show directly the effect on the analyzed outcome variable. CONCLUSIONS: We prefer to use this regression model with unexplained residuals when the adjusted variable is in the causal pathway in the analyses of data referring to the fetal origins of adult diseases hypothesis.  相似文献   

7.
Wall MM  Li R 《Statistics in medicine》2003,22(23):3671-3685
In the areas of epidemiology, psychology, sociology, and other social and behavioural sciences, researchers often encounter situations where there are not only many variables contributing to a particular phenomenon, but there are also strong relationships among many of the predictor variables of interest. By using the traditional multiple regression on all the predictor variables, it is possible to have problems with interpretation and multicollinearity. As an alternative to multiple regression, we explore the use of a latent variable model that can address the relationship among the predictor variables. We consider two different methods for estimation and prediction for this model: one that uses multiple regression on factor score estimates and the other that uses structural equation modelling. The first method uses multiple regression but on a set of predicted underlying factors (i.e. factor scores), and the second method is a full-information maximum-likelihood technique that incorporates the complete covariance structure of the data. In this tutorial, we will explain the model and each estimation method, including how to carry out prediction. A data example will be used for demonstration, where respiratory disease death rates by county in Minnesota are predicted by five county-level census variables. A simulation study is performed to evaluate the efficiency of prediction using the two latent variable modelling techniques compared to multiple regression.  相似文献   

8.
In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449‐461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two‐stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two‐stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two‐stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two‐stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite‐sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two‐stage method. We also discuss extensions to different data structures. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B‐splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV‐infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
Logistic回归模型中自变量相对重要性的优势分析   总被引:1,自引:0,他引:1  
目的应用扩展优势分析方法于Logistic回归模型中,为研究者在确定模型中自变量相对重要性提供一种可选择的方法。方法通过计算和比较与某自变量有关的所有可能子模型(即含有该变量的不同组合)的平均贡献增量△R2,以评价该自变量的相对重要性,并应用于实例分析。结果优势分析所得的各变量的总平均贡献之和等于最终模型的决定系数,其重要性排序与标准回归系数的排序不同,且R2M和R2E更适合作为优势分析的指标。结论优势分析可将各自变量对因变量总方差的贡献,分解为已解释方差百分比,且独立于模型,能精确地衡量自变量的相对重要性。  相似文献   

11.
Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.  相似文献   

12.
Logistic回归模型中连续变量交互作用的分析   总被引:1,自引:0,他引:1       下载免费PDF全文
Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.  相似文献   

13.
胃癌危险因素研究中多因子共线性的logistic回归分析   总被引:5,自引:0,他引:5  
目的 探索胃癌的危险因素,并探讨研究中存在的多因子共线性的处理方法。方法 采用病例对照方法,获得50名胃癌患者和50名对照的流行病学资料;PCR方法检测个体基因型;应用线性回归中的三个工具,对各研究因素进行共线性诊断:用主成分分析改进的方法,得出并解释最终的回归模型。结果 多因素logistic回归结果与单因素分析结果不一致,共线性诊断显示方差膨胀因子普遍较大,GSTM1基因型、肿瘤家族史等因素之间存在多因子共线性。应用主成分分析改进后的logistic回归模型拟合数据,不仅各回归系数的标准误均有减小,而且有更多的因素被选入模型。结论 遗传易感性和环境因素在胃癌的发生中共同起作用。对疾病危险因素进行logistic回归分析时,应首先进行原始变量的多重共线性诊断,并结合主成分分析得出更合理的回归模型。  相似文献   

14.
In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the “gold standard” for diagnosing temporomandibular disorder (TMD) is a physical examination by a trained clinician. In large studies, examining all participants in this manner is infeasible. Instead, it is common to use questionnaires to screen for incidence of TMD and perform the “gold standard” examination only on participants who screen positively. Unfortunately, some participants may leave the study before receiving the “gold standard” examination. Within the framework of survival analysis, this results in missing failure indicators. Motivated by the Orofacial Pain: Prospective Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose a method for parameter estimation in survival models with missing failure indicators. We estimate the probability of being an incident case for those lacking a “gold standard” examination using logistic regression. These estimated probabilities are used to generate multiple imputations of case status for each missing examination that are combined with observed data in appropriate regression models. The variance introduced by the procedure is estimated using multiple imputation. The method can be used to estimate both regression coefficients in Cox proportional hazard models as well as incidence rates using Poisson regression. We simulate data with missing failure indicators and show that our method performs as well as or better than competing methods. Finally, we apply the proposed method to data from the OPPERA study. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

15.
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination.  相似文献   

16.
Estimating and testing interactions in a linear regression model when normally distributed explanatory variables are subject to classical measurement error is complex, since the interaction term is a product of two variables and involves errors of more complex structure. Our aim is to develop simple methods, based on the method of moments (MM) and regression calibration (RC) that yield consistent estimators of the regression coefficients and their standard errors when the model includes one or more interactions. In contrast to previous work using structural equations models framework, our methods allow errors that are correlated with each other and can deal with measurements of relatively low reliability. Using simulations, we show that, under the normality assumptions, the RC method yields estimators with negligible bias and is superior to MM in both bias and variance. We also show that the RC method also yields the correct type I error rate of the test of the interaction. However, when the true covariates are not normally distributed, we recommend using MM. We provide an example relating homocysteine to serum folate and B12 levels.  相似文献   

17.
Multiple imputation (MI) is a commonly used technique for handling missing data in large‐scale medical and public health studies. However, variable selection on multiply‐imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes it difficult to interpret the final model or draw scientific conclusions. In this paper, we propose a novel multiple imputation‐least absolute shrinkage and selection operator (MI‐LASSO) variable selection method as an extension of the least absolute shrinkage and selection operator (LASSO) method to multiply‐imputed data. The MI‐LASSO method treats the estimated regression coefficients of the same variable across all imputed datasets as a group and applies the group LASSO penalty to yield a consistent variable selection across multiple‐imputed datasets. We use a simulation study to demonstrate the advantage of the MI‐LASSO method compared with the alternatives. We also apply the MI‐LASSO method to the University of Michigan Dioxin Exposure Study to identify important circumstances and exposure factors that are associated with human serum dioxin concentration in Midland, Michigan. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.  相似文献   

19.
Taylor JM  Wang L  Li Z 《Statistics in medicine》2007,26(18):3443-3458
We consider the situation of two ordered categorical variables and a binary outcome variable, where one or both of the categorical variables may have missing values. The goal is to estimate the probability of response of the outcome variable for each cell of the contingency table of categorical variables while incorporating the fact that the categorical variables are ordered. The probability of response is assumed to change monotonically as each of the categorical variables changes level. A probability model is used in which the response is binomial with parameters p(ij) for each cell (i, j) and the number of observations in each cell is multinomial. Estimation approaches that incorporate Gibbs sampling with order restrictions on p(ij) induced via a prior distribution, two-dimensional isotonic regression and multiple imputation to handle missing values are considered. The methods are compared in a simulation study. Using a fully Bayesian approach with a strong prior distribution to induce ordering can lead to large gains in efficiency, but can also induce bias. Utilizing isotonic regression can lead to modest gains in efficiency, while minimizing bias and guaranteeing that the order constraints are satisfied. A hybrid of isotonic regression and Gibbs sampling appears to work well across a variety of scenarios. The methods are applied to a pancreatic cancer case-control study with two biomarkers.  相似文献   

20.
We have investigated different methods of controlling for asthma epidemics in the time series regression of the relationship between air pollution and asthma emergency visits in Barcelona, Spain. The relationship between air pollution and asthma emergency room visits was modelled using autoregressive Poisson models. We examined the effect of using no control by epidemics, and modelling asthma epidemics with a single dummy variable, six dummy variables, and a dummy variable for each epidemic day. Air pollution coefficients increased when controlling asthma epidemics with six dummy variables instead of a single variable. They further increased when autocorrelation was allowed for. Standard errors were relatively unaffected when either the epidemics or the autocorrelation were included in the model. Black smoke, nitrogen dioxide and ozone were statistically significant associated to asthma emergency visits after using six dummy variables to control for asthma epidemics. We have shown that different models, including different confounding variables, give markedly different estimates of the effect of a pollutant on health. Care is needed in the interpretation of such models, and careful reporting so it is clear how the confounding variables have been modelled.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号