首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 953 毫秒
1.
摘要:目的 采用随机森林算法分析体检人群肾结石的影响因素。方法 自体检人群中选取955例肾结石患者和1670例未患肾结石者,收集各项生化指标,先利用随机森林方法进行降维,再用传统的Logistic回归对降维后的变量进行分析。结果 经随机森林算法筛出8个重要性得分最高且错误率最低的变量纳入经典Logistic回归模型进行分析,最终进入Logistic回归模型的变量有性别、年龄、体质指数、收缩压、低密度脂蛋白、总胆红素。结论 肾结石的发病与性别、年龄及人体多项生化指标有关。  相似文献   

2.
目的 分析老年肌肉衰减症的影响因素,为老年肌肉衰减症干预提供参考依据.方法 通过对泸州市某社区及医院439例老年人进行问卷调查,应用随机森林算法对影响因素进行重要性排序并降维,将降维后的变量纳入Logistic回归分析模型,分析影响因素的作用方向和相对危险度.结果 随机森林算法显示:变量数为7时袋外估算误差率最低,依次...  相似文献   

3.
目的探讨随机森林(RF)的变量捕获方法在高维数据变量筛选中的应用。方法通过模拟实验和实际数据分析,对两种变量捕获(vh.md,vh.vimp)和逐步剔除方法(var SelRF)进行比较,并通过选入变量的数目、模型预测错误率(PE)和受试者工作特征曲线下面积(AUC)对其进行评价。结果模拟实验表明,在变量具有联合作用、交互作用和弱独立作用情况下,变量捕获方法均明显优于var SelRF方法和全变量VIMP排序方法;实际数据分析结果表明,变量捕获方法筛选变量结果稳定,并能够保证良好的预测效果。结论变量捕获方法适用于高维数据的变量筛选,具有实用价值。  相似文献   

4.
随机森林回归分析及在代谢调控关系研究中的应用   总被引:1,自引:0,他引:1  
李贞子  张涛  武晓岩  李康 《中国卫生统计》2012,29(2):158-160,163
目的探讨随机森林回归处理非线性、具有交互作用数据的性能,并将其应用于高维代谢组学数据的代谢网络变量筛选。方法通过模拟试验验证随机森林回归在具有交互作用和非线性情况下回归分析的效果,同时应用于卵巢良恶性肿瘤鉴别的代谢组学数据分析。结果模拟实验结果显示:对于具有交互作用及其他非线性关系的模拟数据,随机森林回归模型的效果明显优于多元线性回归模型;卵巢癌代谢组学数据分析显示,使用随机森林回归分析能够获得更为理想的结果。结论随机森林回归作为一种非参数回归技术,在一定的样本含量下(如n100),能够在高维数据中有效地分析具有交互作用和非线性关系的数据。  相似文献   

5.
目的 基于机器学习算法探讨阿尔兹海默病发病的脂蛋白及代谢物影响因素。方法 从ADNI数据库中选取2012年诊断结果为正常(cognitive normal,CN)和阿尔兹海默病(Alzheimer disease,AD)的研究对象共314例,收集其脂蛋白及代谢物数据。采用随机森林、lasso回归、XGboost算法三种方法对变量进行重要性排序及筛选。利用三种方法筛选出的变量,结合研究人群的性别、年龄、婚姻状况构建随机森林模型,预测影响AD发病的重要因素。结果 三种方法共筛选出12个脂蛋白及代谢物变量,结合研究人群的年龄、性别、婚姻状况共15个变量被纳入随机森林模型。模型的准确率为84.13%、灵敏度为93.75%、特异度为53.33%、Kappa值为0.518 3、AUC(95%CI)为0.735(0.600~0.871)。根据随机森林模型中Mean Decrease Accuracy和Mean Decrease Gini两指标分别筛选出的排名前五的变量中均包含以下四个变量:大极低密度脂蛋白中的磷脂与总脂质之比(L_VLDL_PL_PCT)、年龄(AGE)、乳糜微粒和极大极低密度脂蛋白...  相似文献   

6.
目的 探讨合成少数类过采样技术(SMOTE)结合机器学习模型在老年人是否具备健康素养预测评估中的应用。方法 利用单因素筛选从资料中筛选出与是否具备健康素养有关联的变量;以筛选出的变量作为输入变量,以是否具备健康素养为结局变量,分别在经SMOTE算法处理前后的数据集中建立logistic回归模型、随机森林和SVM模型,通过受试者工作特征曲线(ROC)来评价模型性能。结果 Logistic回归、随机森林和SVM在SMOTE算法处理前的测试集中的准确率分别为0.833、0.600和0.636,3种模型的ROC曲线下面积(AUC)分别为0.723、0.815和0.728;在SMOTE算法处理后的测试集中的准确率分别为0.936、0.908和0.890,3种模型的AUC分别为0.896、0.944和0.897。结论 随机森林模型在老年人是否具备健康素养的预后评估中具有较高的应用价值。  相似文献   

7.
目的应用随机森林和支持向量机算法处理乳腺癌基因数据,筛选三阴性和非三阴性乳腺癌的差异基因,为临床应用提供更多的参考靶点。方法使用TCGA乳腺癌基因数据,通过t检验和随机森林进行降维处理,然后使用支持向量机、支持向量机递归特征消除法、随机森林进行变量重要性排序,将随机森林和支持向量机与向前变量选择法结合进行模型预测并完成最终变量筛选,通过Holdout验证评价模型效果。结果数据经t检验的FDR降维后剩余18702个基因,经随机森林降维后剩余6326个基因;对降维后经三种方法排序的数据建立预测模型,获得各模型约登指数等评价指标;对排序结果中靠前的基因进行文献搜索,发现大部分基因和三阴性乳腺癌的转移或者预后有关。结论针对高维基因表达数据进行变量选择,使用t检验的FDR进行降维、随机森林对变量进行排序筛选、支持向量机进行预测效果最佳;通过检索重要性排序靠前基因发现大多数与三阴性乳腺癌有关,但某些靠前基因与三阴性乳腺癌无文献研究,建议研究这些基因与三阴性乳腺癌的相关性。  相似文献   

8.
目的 探索随机生存森林在大规模测序肺癌随访研究资料中的降维效果,为进一步建立预后预测模型提供依据.方法 利用随机生存森林法对120位肺癌患者399个单核苷酸多态性(single nucleotide polymorphisms,SNPs)位点进行降维分析,筛选出重要性评分较高且错分率较低的SNPs子集,再对该子集建立多元Cox比例风险模型,并利用交叉验证法评价模型的预测效果.结果 随机生存森林法筛选出25个重要的SNPs,控制临床协变量(临床分期、是否手术、组织病理学类型)的多元Cox比例风险模型显示有4个位点有统计学意义.交叉验证结果表明,该模型的平均准确度达83.63%.结论 对高维关联性研究数据利用随机生存森林法先去噪降维,再作进一步分析,有助于后续预后预测模型的建立.  相似文献   

9.
基因表达数据的随机森林逐步判别分析方法   总被引:3,自引:2,他引:3  
目的给出一种新的随机森林算法,它能在建模过程中自动对变量进行筛选,建立“最优”判断模型。方法采用变量重要性评分和逐步迭代算法选择有作用的变量;通过实际基因表达数据考核其应用效果,并使用R语言编程做模拟试验验证其有效性。结果三种疾病基因表达数据的判别模型,在包含很少量的基因情况下便获得了理想的分类效果;模拟试验则显示在类间区分度较大的情况下,随机森林逐步判别分析的效果明显,能有效地将有作用的变量保留在模型中,提高模型的判别效果;在类间区分度不够大的情况下分类效果提高不明显。结论随机森林逐步判别分析可以有效地应用于基因表达数据的基因筛选和分类研究,但要特别注意由随机波动对分析结果造成的影响。  相似文献   

10.
目的 基于随机森林回归模型构建小空间尺度的登革热风险评估工具,为登革热防控提供依据。方法 以2012年1月至2014年9月登革热病例及相关因素数据为训练集,分别构建登革热流行频率、持续时间及强度风险指标的随机森林回归模型,以2014年10月至2015年12月登革热病例及相关因素数据为验证集,并对构建的模型进行评估。结果 频率、持续时间、强度指标与发病数指标的相关系数均>0.7。依据训练集构建的登革热流行频率、持续时间和强度风险指标的随机森林回归模型变量解释度分别为96.72%、91.98%和90.1%,提示模型拟合度较好;交叉验证法可见各模型均方误差分别0.001 9、1.424 6和1.881 1,均处于较低水平;比较随机森林回归、支持向量回归、广义线性模型和广义相加模型的准确性,随机森林回归和支持向量机等机器学习模型均方误差远低于广义线性模型和广义相加模型。结论 以登革热频率、持续时间及强度指标为结局变量,气象、环境及社会经济特征为预测变量构建的随机森林回归模型准确性较好,可作为登革热风险评估工具,为登革热防控工作服务。  相似文献   

11.
Background: Smoking increases the risk of many diseases, and it is also linked to blood DNA methylation changes that may be important in disease etiology.Objectives: We sought to identify novel CpG sites associated with cigarette smoking.Methods: We used two epigenome-wide data sets from the Sister Study to identify and confirm CpG sites associated with smoking. One included 908 women with methylation measurements at 27,578 CpG sites using the HumanMethylation27 BeadChip; the other included 200 women with methylation measurements for 473,844 CpG sites using the HumanMethylation450 BeadChip. Significant CpGs from the second data set that were not included in the 27K assay were validated by pyrosequencing in a subset of 476 samples from the first data set.Results: Our study successfully confirmed smoking associations for 9 previously established CpGs and identified 2 potentially novel CpGs: cg26764244 in GNG12 (p = 9.0 × 10–10) and cg22335340 in PTPN6 (p = 2.9 × 10–05). We also found strong evidence of an association between smoking status and cg02657160 in CPOX (p = 7.3 × 107), which has not been previously reported. All 12 CpGs were undermethylated in current smokers and showed an increasing percentage of methylation in former and never-smokers.Conclusions: We identified 2 potentially novel smoking related CpG sites, and provided independent replication of 10 previously reported CpGs sites related to smoking, one of which is situated in the gene CPOX. The corresponding enzyme is involved in heme biosynthesis, and smoking is known to increase heme production. Our study extends the evidence base for smoking-related changes in DNA methylation.Citation: Harlid S, Xu Z, Panduri V, Sandler DP, Taylor JA. 2014. CpG sites associated with cigarette smoking: analysis of epigenome-wide data from the Sister Study. Environ Health Perspect 122:673–678; http://dx.doi.org/10.1289/ehp.1307480  相似文献   

12.

Background:

Smoking is a risk factor for many human diseases. DNA methylation has been related to smoking, but genome-wide methylation data for smoking in Chinese populations is limited.

Objectives:

We aimed to investigate epigenome-wide methylation in relation to smoking in a Chinese population.

Methods:

We measured the methylation levels at > 485,000 CpG sites (CpGs) in DNA from leukocytes using a methylation array and conducted a genome-wide meta-analysis of DNA methylation and smoking in a total of 596 Chinese participants. We further evaluated the associations of smoking-related CpGs with internal polycyclic aromatic hydrocarbon (PAH) biomarkers and their correlations with the expression of corresponding genes.

Results:

We identified 318 CpGs whose methylation levels were associated with smoking at a genome-wide significance level (false discovery rate < 0.05), among which 161 CpGs annotated to 123 genes were not associated with smoking in recent studies of Europeans and African Americans. Of these smoking-related CpGs, methylation levels at 80 CpGs showed significant correlations with the expression of corresponding genes (including RUNX3, IL6R, PTAFR, ANKRD11, CEP135 and CDH23), and methylation at 15 CpGs was significantly associated with urinary 2-hydroxynaphthalene, the most representative internal monohydroxy-PAH biomarker for smoking.

Conclusion:

We identified DNA methylation markers associated with smoking in a Chinese population, including some markers that were also correlated with gene expression. Exposure to naphthalene, a byproduct of tobacco smoke, may contribute to smoking-related methylation.

Citation:

Zhu X, Li J, Deng S, Yu K, Liu X, Deng Q, Sun H, Zhang X, He M, Guo H, Chen W, Yuan J, Zhang B, Kuang D, He X, Bai Y, Han X, Liu B, Li X, Yang L, Jiang H, Zhang Y, Hu J, Cheng L, Luo X, Mei W, Zhou Z, Sun S, Zhang L, Liu C, Guo Y, Zhang Z, Hu FB, Liang L, Wu T. 2016. Genome-wide analysis of DNA methylation and cigarette smoking in Chinese. Environ Health Perspect 124:966–973; http://dx.doi.org/10.1289/ehp.1509834  相似文献   

13.
目的:探讨检测人乳头瘤病毒16型(HPV16)基因组甲基化水平判断宫颈病变阶段的可行性。方法:在行宫颈癌筛查的妇女中收集73例HPV16阳性标本,采用重硫酸盐-焦磷酸测序方法对HPV16型基因组L1基因开放阅读框(ORF)以及上游调控区(URR)的甲基化位点进行分析,甲基化结果与病理学检查结果进行比对分析,分析HPV16型甲基化与宫颈病变的关系。结果:HPV16基因的31,37,43,52和58位点在正常标本中存在中度甲基化,宫颈上皮内瘤变1度(CIN1)~CIN3标本存在低度甲基化,而宫颈癌(CA)标本表现为高度甲基化。正常标本、CIN标本以及CA标本各位点甲基化水平两两比较差异均有统计学意义(P均<0.05)。在CIN1,CIN2以及CIN3间甲基化水平差异无统计学意义;HPV16基因的3887,3927,3941和5602位点甲基化水平随着宫颈病变程度的加深而升高,并且正常标本、CIN、CA间两两比较均存在差异(P均<0.05);5608,5709,5611,5617,5762,6367,6389位点甲基化水平在正常标本、CIN1、CIN2和CIN3标本间无统计学差异。正常标本与CA标本、CIN标本与CA标本间比较,甲基化水平差异有统计学意义(P均<0.05)。结论:HPV16型基因组的甲基化与宫颈病变程度存在相关性,临床上通过检测HPV16型DNA的甲基化可对宫颈病变进行辅助诊断。  相似文献   

14.
  目的  评价Logistic回归算法和随机森林算法对2型糖尿病患者3个月后血糖控制情况的预测效果,并探究血糖控制的影响因素。  方法  收集顺义、通州区2型糖尿病患者的基线调查和随访信息,以患者3个月后糖化血红蛋白是否大于6.5%作为结局分类变量,使用随机森林算法和Logistic算法建立预测模型,通过受试者工作特征曲线下面积(area under the curve,AUC)、灵敏度等指标比较预测效果。  结果  患者血糖控制效果的影响因素有基线空腹血糖(P < 0.001)、病程(P < 0.001)、吸烟(P=0.026)、静态活动时间(P=0.006)、体重指数(超重P=0.002,肥胖P=0.011)、手环使用(P=0.028)和糖尿病饮食(P=0.002)7个因素;Logistic回归预测模型的AUC为0.738,灵敏度为72.9%,特异度68.1%,准确率71.2%,随机森林模型的AUC为0.756,灵敏度74.5%,特异度69.5%,准确率72.8%。  结论  随机森林算法预测效果优于Logistic回归预测模型,可应用于血糖控制效果预测,辅助糖尿病患者的管理。  相似文献   

15.
目的采用随机森林算法探讨凉山地区人群高尿酸血症(HUA)患病危险因素。方法利用中国达能膳食营养研究与宣教基金项目(DIC2013-03)数据,用随机森林模型对单因素分析有统计学意义的自变量进行重要性排序并降维,将袋外估算误差率最小的变量集纳入logistic回归模型,分析自变量的作用方向和相对危险度。结果逐步随机森林分析显示,变量数为6时袋外估算误差率最低,重要性排名前六的变量依次是年龄、体质指数(BMI)、每日菌菇类摄入量、性别、每日禽畜肉摄入量、高甘油三脂(TG)血症;logistic回归分析显示,18~34岁、≥65岁组人群HUA患病风险分别是35~44岁组的1.557、1.496倍;男性HUA患病风险是女性的2.755倍;超重(24≤BMI<28)、肥胖组(BMI≥28)HUA患病风险分别是正常BMI组的1.822、2.534倍;高TG血症组HUA患病风险是非高TG血症组的2.379倍;菌菇类高摄入组(大于10g/d)HUA患病风险是低摄入组(小于5g/d)的1.420倍;畜禽肉高摄入组(大于75g/d)HUA患病风险是低摄入组(小于40g/d)组的1.300倍。结论影响凉山地区人群HUA患病的前六位危险因素依次是年龄、BMI、每日菌菇类摄入量、性别、每日禽畜肉摄入量、高TG血症,建议加大健康生活方式、饮食习惯的宣教力度,加强对BMI、高TG血症的检测和控制。  相似文献   

16.
目的 本研究旨在探讨广西壮族自治区汉族儿童白细胞介素6(interleukin 6,IL6)、白细胞介素12A/B(interleukin 12A/B,IL 12A/B)、白细胞介素13(interleukin 13,IL13)基因胞嘧啶-磷酸-鸟苷酸(cytosine-phosphoric-guanylic,CpG)...  相似文献   

17.

Background

With epigenome-wide mapping of DNA methylation, a number of novel smoking-associated loci have been identified.

Objectives

We aimed to assess dose–response relationships of methylation at the top hits from the epigenome-wide methylation studies with smoking exposure as well as with total and cause-specific mortality.

Methods

In a population-based prospective cohort study in Germany, methylation was quantified in baseline blood DNA of 1,000 older adults by the Illumina 450K assay. Deaths were recorded during a median follow-up of 10.3 years. Dose–response relationships of smoking exposure with methylation at nine CpGs were modeled by restricted cubic spline regression. Associations of individual and aggregate methylation patterns with all-cause, cardiovascular, and cancer mortality were assessed by multiple Cox regression.

Results

Clear dose–response relationships with respect to current and lifetime smoking intensity were consistently observed for methylation at six of the nine CpGs. Seven of the nine CpGs were also associated with mortality outcomes to various extents. A methylation score based on the top two CpGs (cg05575921 and cg06126421) showed the strongest associations with all-cause, cardiovascular, and cancer mortality, with adjusted hazard ratios (95% CI) of 3.59 (2.10, 6.16), 7.41 (2.81, 19.54), and 2.48 (1.01, 6.08), respectively, for participants with methylation levels in the lowest quartile at both CpGs. Adding methylation at those two CpGs into a model that included the variables of the Systematic Coronary Risk Evaluation chart for fatal cardiovascular risk prediction improved the predictive discrimination.

Conclusion

The novel methylation biomarkers are highly informative for both smoking exposure and smoking-related mortality outcomes. In particular, these biomarkers may substantially improve cardiovascular risk prediction. Nevertheless, the findings of the present study need to be further validated in additional large longitudinal studies.

Citation

Zhang Y, Schöttker B, Florath I, Stock C, Butterbach K, Holleczek B, Mons U, Brenner H. 2016. Smoking-associated DNA methylation biomarkers and their predictive value for all-cause and cardiovascular mortality. Environ Health Perspect 124:67–74; http://dx.doi.org/10.1289/ehp.1409020  相似文献   

18.
Logistic regression is the standard method for assessing predictors of diseases. In logistic regression analyses, a stepwise strategy is often adopted to choose a subset of variables. Inference about the predictors is then made based on the chosen model constructed of only those variables retained in that model. This method subsequently ignores both the variables not selected by the procedure, and the uncertainty due to the variable selection procedure. This limitation may be addressed by adopting a Bayesian model averaging approach, which selects a number of all possible such models, and uses the posterior probabilities of these models to perform all inferences and predictions. This study compares the Bayesian model averaging approach with the stepwise procedures for selection of predictor variables in logistic regression using simulated data sets and the Framingham Heart Study data. The results show that in most cases Bayesian model averaging selects the correct model and out-performs stepwise approaches at predicting an event of interest.  相似文献   

19.
Background: Epigenetic modifications, such as DNA methylation, due to in utero exposures may play a critical role in early programming for childhood and adult illness. Maternal smoking is a major risk factor for multiple adverse health outcomes in children, but the underlying mechanisms are unclear.Objective: We investigated epigenome-wide methylation in cord blood of newborns in relation to maternal smoking during pregnancy.Methods: We examined maternal plasma cotinine (an objective biomarker of smoking) measured during pregnancy in relation to DNA methylation at 473,844 CpG sites (CpGs) in 1,062 newborn cord blood samples from the Norwegian Mother and Child Cohort Study (MoBa) using the Infinium HumanMethylation450 BeadChip (450K).Results: We found differential DNA methylation at epigenome-wide statistical significance (p-value < 1.06 × 10–7) for 26 CpGs mapped to 10 genes. We replicated findings for CpGs in AHRR, CYP1A1, and GFI1 at strict Bonferroni-corrected statistical significance in a U.S. birth cohort. AHRR and CYP1A1 play a key role in the aryl hydrocarbon receptor signaling pathway, which mediates the detoxification of the components of tobacco smoke. GFI1 is involved in diverse developmental processes but has not previously been implicated in responses to tobacco smoke.Conclusions: We identified a set of genes with methylation changes present at birth in children whose mothers smoked during pregnancy. This is the first study of differential methylation across the genome in relation to maternal smoking during pregnancy using the 450K platform. Our findings implicate epigenetic mechanisms in the pathogenesis of the adverse health outcomes associated with this important in utero exposure.  相似文献   

20.
探讨雌激素受体2基因(Estrogen Receptor beta,ESR2)的表观遗传修饰作用与孤独症发病的关联性,为孤独症的病因学研究提供依据.方法 收集哈尔滨医科大学儿童发育行为研究中心的孤独症男性患儿54例,并按年龄-性别匹配原则随机收集正常对照男性儿童54名,运用亚硫酸盐测序法(Bisulfite Sequencing PCR,BSP)检测外周血细胞ESR2基因5,近端调控区的DNA甲基化.通过Mann-Whitney U检验比较病例组和对照组的DNA甲基化水平.结果 ESR2基因5,近端调控区的整体甲基化水平在孤独症组和对照组间的差异无统计学意义(P>0.05),但启动子区甲基化岛(Prom CGI)所包含的15个CpG位点中,有7个位点(CpG 5,6,8,9,10,11和12)的甲基化水平在孤独症组明显升高(P值均<0.05),部分位点存在于转录因子结合位点的保守序列中,包括USF2,ZBTB33,REL,ESR2和TFEC.此外,外显子区甲基化岛(Exon CGI)包含26个CpG位点,其中CpG41位点的甲基化水平在孤独症组(31.30±2.74)%高于对照组(24.07±2.59)%(P<0.05).结论 ESR2基因5’近端调控区的表观遗传修饰与孤独症的发病有明显关联.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号