首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 218 毫秒
1.
目的 在多元线性回归模型中,估计自变量的相对重要性.方法 运用相对权重法估计各变量的相对重要性,并应用于肝手术病人预计存活时间影响因素的评价.结果 血凝素、预后指数、酶功能的相对权重分别为:0.142、0.341、0.489;各变量对存活时间的贡献比例分别为:14.6%、35.1%、50.3%.结论 酶功能对肝手术病人预计存活时间的影响最大,其次为预后指数,血凝素最小.当自变量间存在相关时,相对权重法估计的自变量相对重要性结果更稳定精确,更加符合实际情况.  相似文献   

2.
Logistic回归模型中自变量相对重要性的优势分析   总被引:1,自引:0,他引:1  
目的应用扩展优势分析方法于Logistic回归模型中,为研究者在确定模型中自变量相对重要性提供一种可选择的方法。方法通过计算和比较与某自变量有关的所有可能子模型(即含有该变量的不同组合)的平均贡献增量△R2,以评价该自变量的相对重要性,并应用于实例分析。结果优势分析所得的各变量的总平均贡献之和等于最终模型的决定系数,其重要性排序与标准回归系数的排序不同,且R2M和R2E更适合作为优势分析的指标。结论优势分析可将各自变量对因变量总方差的贡献,分解为已解释方差百分比,且独立于模型,能精确地衡量自变量的相对重要性。  相似文献   

3.
目的比较和评价不同实验条件下常见估计方法在估计自变量相对重要性时的指标差异,探索影响各方法的估计结果差异的因素。方法通过设置不同相关程度、自变量共线性水平及自变量个数等因子,使用改进后的大规模模拟研究观察不同方法间自变量估计值。结果优势分析、相对权重、乘积尺度的重要性估计值之和与模型R2之差,小于标准回归系数平方、简单相关系数平方。在2400个重要性指标值中,乘积尺度法估计的负值达到229个(9.54%)。相关系数平方估计值小于优势分析法。标准回归系数平方出现较多极端值。自变量间共线性水平可解释平均Kendallτ值4%~25%的变异,样本量可解释20%~77%的变异,而自变量个数可解释14%~60%的变异。结论对自变量重要性估计结果的影响最大的两个因子是样本量和自变量个数,其次有共线性水平和自变量与因变量间的相关程度。标准回归系数平方的估计结果变异性最大,相对权重与优势分析的估计结果是相对"有偏"的。  相似文献   

4.
目的在多元线性回归模型中,估计各自变量的相对重要性,并探索区间估计方法。方法在自变量间存在相关时,运用Budescu(1993),Azen(2003)提出的优势分析法估计肝手术病例预计存活时间的影响因素重要性,并运用Bootstrap法探索区间估计方法以此来评价估计结果的变异性。结果血凝素、预后指数、酶功能对预计存活时间的相对贡献分别为0.1415、0.3408和0.490,其Bootstrap法95%可信区间分别为(0.0573,0.2744)、(0.2359,0.4545)和(0.3411,0.6090)。结论酶功能对肝手术病例预计存活时间的影响最大,预后指数次之,血凝素最小。当自变量间存在相关时,优势分析法估计的自变量相对重要性结果更精确稳定,值得推广应用。  相似文献   

5.
代鲁燕 《浙江预防医学》2012,24(2):20-22,37
<正>线性回归分析有2个主要任务:一是建立线性回归方程,用自变量预测因变量;二是分析和解释各自变量对因变量的作用和意义。对各自变量相对重要性估计,分析和解释自变量对因变量的作用和意义是线性回归分析的首要任务,它在医学、生物  相似文献   

6.
偏相关系数和偏回归系数的统计解析与意义   总被引:1,自引:0,他引:1  
本研究提出了一个解析在多元线性回归时每一个自变量与因变量之间的关系的偏相关与偏回归系数的方法,包括系数的估计和每一个自变量与因变量之间的散点图的绘制。使我们能够在多元的情况下也能象一元回归一样,作出变量间关系的散点图,从而达到深入了解回归分析结果的可靠性和协助进行回归诊断的目的。  相似文献   

7.
线性回归模型中自变量相对重要性的衡量   总被引:1,自引:0,他引:1  
线性回归模型在实际应用中经常用到,通常研究者需要在多个自变量中分析哪个自变量对y的影响大,哪个对y的影响小,即对自变量的相对重要性进行衡量。在实际工作中,标准化的回归系数、t统计量或P值等是常用的指标。偏回归系数、相关系数及其平方、半偏相关系数及其平方、偏回归平方和等指标都与自变量的相对重要性有关。Dalington〔1〕认为,如果目的是探索影响因素,偏回归系数是衡量影响力较好的  相似文献   

8.
目的 在高维组学研究中,混杂因素常常影响着随机森林筛选出与研究结局相关联的变量的能力,因此控制混杂因素具有非常重要的作用。方法 通过模拟试验和实例验证,我们比较了以下四种方法在筛选与研究结局相关联的变量中控制混杂因素的效果:随机森林(random forest, RF);Ranger法;加权Ranger,给予每个混杂因素以100%的权重;残差法,将去除混杂因素的因变量和自变量作为新的因变量和自变量纳入Ranger分析。研究采用危险因素在重要性评分排序中排在第一位的比例作为评价指标。结果 基于大量的模拟试验,我们发现残差法和加权Ranger法有效提高了危险因素在重要性评分排序中排在第一位的比例。GWAS实例证实,在使用这两种方法校正混杂因素之后,危险因素的排序有所提前。结论 校正混杂因素对于筛选与研究结局相关联的变量十分必要,且残差法在混杂因素校正上表现优于加权Ranger法,RF和Ranger几乎无混杂校正作用。  相似文献   

9.
目的通过从同一总体中抽样产生不同样本量及相同样本量的重复抽样数据集,来观察并评价样本量对重要性估计方法的影响以及重复抽样过程对各方法估计稳定性的影响。方法简单介绍已有的几种重要性评价方法,调用SAS中的PROC SURVEYSELECT过程从同一总体中重复抽样,观察样本量变化、重复抽样过程对重要性估计结果的影响,评价各方法的稳定性。结果样本量较小时,各方法的重要性估计值变异较大,随着样本量增大估计值也逐渐趋于稳定。优势分析、相对权重、乘积尺度(βr)的重要性估计值之和与模型R~2之差,小于标准回归系数平方(β~2)、简单相关系数平方(R~2),优势分析法的稳定性最好。结论在现有的几种常见重要性估计方法中,优势分析法的重要性估计稳定性最好,相对权重法虽然与优势分析法最为接近,但仍有不足之处。  相似文献   

10.
Excel在通径分析中的应用   总被引:7,自引:0,他引:7  
通径分析是衡量原因变量对结果变量相对重要性的一种多元统计分析方法。文献中常利用SAS、SPSS等软件进行通径分析,笔者发现在Excel内进行通径分析简单、快捷,本文以一实例介绍这一方法。  相似文献   

11.
Multiple linear regression analysis is widely used in many scientific fields, including public health, to evaluate how an outcome or response variable is related to a set of predictors. As a result, researchers often need to assess "relative importance" of a predictor by comparing the contributions made by other individual predictors in a particular regression model. Hence, development of valid statistical methods to estimate the relative importance of a set of predictors is of great interest. In this research, the authors considered the relative importance of a predictor when defined by that portion of the squared multiple correlation explained by the contribution of each predictor in the final model of interest. Here, a number of suggested relative importance indices motivated by this definition are reviewed, including the squared zero-order correlation, squared semipartial correlation, Product Measure (i.e., Pratt's Index), General Dominance Index, and Johnson's Relative Weight. The authors compared these indices using data sets from an occupational health study in which human inhalation exposure to styrene was measured and from a laboratory animal study on risk factors for atherosclerosis, and statistical properties using bootstrap methods were examined. The analysis suggests that the General Dominance Index and Johnson's Relative Weight are preferred methods for quantifying the relative importance of predictors in a multiple linear regression model. Johnson's Relative Weight involves significantly less computational burden than the General Dominance Index when the number of predictors in the final model is large.  相似文献   

12.
Consider a case-control study in which prevalent cases of a given disease define the index series and members of the base population without the disease are sampled to provide the referent series. Information on a set of explanatory variables (eg, genotypes) is collected at great cost for cases and controls. The objective of the study is to evaluate the relationship between case status and the explanatory variables. Subsequently, an investigator notes that the prevalence of a second disease was measured for the members of the index and referent series. The investigator wishes to make efficient use of the available data by assessing the relationship between this second disease and the set of explanatory variables. In this paper, we discuss 2 analytic approaches that might be used to assess associations between the explanatory variables and an outcome other than the original disease. One is through the inclusion of a design variable for original disease status as a covariate; and, the second is through weighted logistic regression using the inverse of the sampling fractions as the weights. The latter approach allows the investigator to derive an estimate of association between the explanatory variables and the second disease without adjustment for the first disease. Weighted logistic regression methods are readily implemented using available statistical packages.  相似文献   

13.
The analytical effect of the number of events per variable (EPV) in a proportional hazards regression analysis was evaluated using Monte Carlo simulation techniques for data from a randomized trial containing 673 patients and 252 deaths, in which seven predictor variables had an original significance level of p < 0.10. The 252 deaths and 7 variables correspond to 36 events per variable analyzed in the full data set.

Five hundred simulated analyses were conducted for these seven variables at EPVs of 2, 5, 10, 15, 20, and 25. For each simulation, a random exponential survival time was generated for each of the 673 patients, and the simulated results were compared with their original counterparts. As EPV decreased, the regression coefficients became more biased relative to the true value; the 90% confidence limits about the simulated values did not have a coverage of 90% for the original value; large sample properties did not hold for variance estimates from the proportional hazards model, and the Z statistics used to test the significance of the regression coefficients lost validity under the null hypothesis.

Although a single boundary level for avoiding problems is not easy to choose, the value of EPV = 10 seems most prudent. Below this value for EPV, the results of proportional hazards regression analyses should be interpreted with caution because the statistical model may not be valid.  相似文献   


14.
In a case-control study, a sample of post-neonatal deaths from pneumonia occurring in the Metropolitan Area of Rio de Janeiro, Brazil (1986-1987) were compared with healthy controls who lived in the same neighborhood. Risk factors investigated were variables related to the mother's pregnancy history and the child's birth, to the family's social condition and to the use of health services. Using the univariate logistic regression model, the coefficients of each independent variable, the relative risk and its confidence limits were first estimated. Birth weight and age of weaning were strongly associated with the dependent variable. After adjustment by means of the multiple logistic regression model, only 4 variables remained statistically associated with mortality: age of weaning, birth weight, over crowding, and BCG vaccination. Based on the available data, it was concluded that mortality from pneumonia in children under 1 year of age is significantly related to the social condition of the family, particularly to that of the mother.  相似文献   

15.
Coding ordinal independent variables in multiple regression analyses   总被引:8,自引:0,他引:8  
The authors present a coding scheme for ordinal independent variables which may be used in various forms of regression analysis. The scheme is useful in dose-response analyses, when the objective is to identify contrasts in the dependent (or response) variable between successive levels of the independent variable, or to identify critical threshold values of the independent variables at which significant changes occur in the response. An example is given of evaluating the survival of lung cancer patients according to their stage of symptomatology. The authors discuss the interpretation of the regression coefficients when this coding scheme is used with linear regression, logistic regression, or in the proportional hazards regression model.  相似文献   

16.
Modeling of the uncertainty of multiple input variables for a complex decision problem complicates sensitivity analysis. A method of analysis comprising stochastic simulation of the model and logistic regression of the simulated dichotomous decision variable against all of the input variables yields a direct measure of the importance of input variables to the decision. This method is demonstrated on a previously analyzed clinical decision either to continue observation or to immediately treat with anticoagulants a woman presenting with deep vein thrombosis in the first trimester of pregnancy. A relative measure of the importance of each input variable in causing a change of decision is estimated by calculating the change in the log odds attributable to variation of each input variable over its range of uncertain values compared with the total change of log odds from variation of all input variables. This method is compared with alternative measures of input variable importance, and is found to be a simple yet powerful tool for gaining quantitative insight into the nuances of a decision model.  相似文献   

17.
Statistical prediction methods typically require some form of fine‐tuning of tuning parameter(s), with K‐fold cross‐validation as the canonical procedure. For ridge regression, there exist numerous procedures, but common for all, including cross‐validation, is that one single parameter is chosen for all future predictions. We propose instead to calculate a unique tuning parameter for each individual for which we wish to predict an outcome. This generates an individualized prediction by focusing on the vector of covariates of a specific individual. The focused ridge—fridge—procedure is introduced with a 2‐part contribution: First we define an oracle tuning parameter minimizing the mean squared prediction error of a specific covariate vector, and then we propose to estimate this tuning parameter by using plug‐in estimates of the regression coefficients and error variance parameter. The procedure is extended to logistic ridge regression by using parametric bootstrap. For high‐dimensional data, we propose to use ridge regression with cross‐validation as the plug‐in estimate, and simulations show that fridge gives smaller average prediction error than ridge with cross‐validation for both simulated and real data. We illustrate the new concept for both linear and logistic regression models in 2 applications of personalized medicine: predicting individual risk and treatment response based on gene expression data. The method is implemented in the R package fridge.  相似文献   

18.
We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data from a cardiac trial of 673 patients in which 252 deaths occurred and seven variables were cogent predictors of mortality; the number of events per predictive variable was (252/7 = 36) for the full sample. For the simulations, at values of EPV = 2, 5, 10, 15, 20, and 25, we randomly generated 500 samples of the 673 patients, chosen with replacement, according to a logistic model derived from the full sample. Simulation results for the regression coefficients for each variable in each group of 500 samples were compared for bias, precision, and significance testing against the results of the model fitted to the original sample.

For EPV values of 10 or greater, no major problems occurred. For EPV values less than 10, however, the regression coefficients were biased in both positive and negative directions; the large sample variance estimates from the logistic model both overestimated and underestimated the sample variance of the regression coeffi-cients; the 90% confidence limits about the estimated values did not have proper coverage; the Wald statistic was conservative under the null hypothesis; and paradoxical associations (significance in the wrong direction) were increased. Although other factors (such as the total number of events, or sample size) may influence the validity of the logistic model, our findings indicate that low EPV can lead to major problems.  相似文献   


19.
OBJECTIVES: Automated variable selection methods are frequently used to determine the independent predictors of an outcome. The objective of this study was to determine the reproducibility of logistic regression models developed using automated variable selection methods. STUDY DESIGN AND SETTING: An initial set of 29 candidate variables were considered for predicting mortality after acute myocardial infarction (AMI). We drew 1,000 bootstrap samples from a dataset consisting of 4,911 patients admitted to hospital with an AMI. Using each bootstrap sample, logistic regression models predicting 30-day mortality were obtained using backward elimination, forward selection, and stepwise selection. The agreement between the different model selection methods and the agreement across the 1,000 bootstrap samples were compared. RESULTS: Using 1,000 bootstrap samples, backward elimination identified 940 unique models for predicting mortality. Similar results were obtained for forward and stepwise selection. Three variables were identified as independent predictors of mortality among all bootstrap samples. Over half the candidate prognostic variables were identified as independent predictors in less than half of the bootstrap samples. CONCLUSION: Automated variable selection methods result in models that are unstable and not reproducible. The variables selected as independent predictors are sensitive to random fluctuations in the data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号