首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 472 毫秒
主成分分析与聚类分析在民族分化研究中的应用比较   总被引:10,自引:0,他引:10  
目的 比较主成分分析与聚类分析两种聚类方法对13个人群进行分类的结果。方法 采用两种数值分类方法并用Y染色体的12种单体型的双等位基因频率数据,对朝鲜族等13个人群进行分类,分析群体间,阐明民族的起源。结果 两种分类方法得到的结果不尽相同。主成分分析可以减少无关指标的影响,但是在简化数据降低维数的过程中又有可能丢失信息。聚类分析充分利用原始数据信息,但无法排除无关指标的“噪音”干扰。结论 主成分分析与聚类分析都适宜做多维复杂数据的分类研究,但在实际应用中,应运用两种分类方法得到的结果结合领域知识给出客观、合理的结论。  相似文献   

<正>主成分聚类分析法是将主成分分析与聚类分析有机结合的一种新方法,该方法克服了众多指标之间的共线性,提取出能够代表大部分信息的综合指标,利用综合得分聚类后,能够合理准确地突出各个类别的能  相似文献   

本文是中国区域卫生状况分类研究之二,以我国50个疾病监测点所在区县为研究对象,用主成分分析、因子分析和K-均数聚类等多因素统计方法进行指标分析归纳、筛选、简化,比较多种分类结果的差异,找出最简单、快速科学的地区卫生状况评价指标体系和分类方法,并归纳出卫生状况综合评价得分公式。  相似文献   

基因表达数据聚类分析结果的评价方法研究   总被引:3,自引:0,他引:3  
目的:本文探讨基因表达数据聚类分析结果的评价方法,提供一种最佳聚类结果的判别准则。方法:从数据结构(内部信息)和功能分类(外部信息)两个方面对聚类结果进行评判。即一方面,采用Entropy(信息熵)评判法,考察聚类结果与部分已知功能基因分类的符合程度;另一方面,采用adjust-FOM评价法,从数据结构的本身进行评价。我们综合两种方法得到一种新的评价方法,并称此方法为Entropy-FOM评价方法,结果:将该方法应用于Lyer的血清数据集和Ferea的酵母数据集对聚类分析结果进行了评价,给出了六种聚类方法的adjust-FOM图和Entropy-FOM图。结果:通过大量计算结果提示,谱聚类SOM方法和模糊聚类方法有相对高的聚类效能。  相似文献   

云南省卫生资源区域分类指标研究   总被引:14,自引:3,他引:11  
目的 筛选适合于云南省卫生资源优化配置的区域分类指标。用这些指标将云南省16个地州市进行适当分类。方法 采用专家咨询打分法,离散趋势法,主成分分析法和聚类分析法进行分析。结果 从50多个影响因素中筛选出对区域分类有代表性好,灵敏度高,独立性强,具有实用性的7个指标。用这7个分类指标对云南省16个地州市1999年资料进行系统聚类,将云南省16个地州市分成五类地区。结论 这种区域分类与云南省实际区域是相同的,云南省区域分类指标研究结果为云南省区域分类和区域卫生资源配置标准的制定提供了科学依据。  相似文献   

卫生城镇建设综合评价方法   总被引:2,自引:0,他引:2  
本文提出了卫生城镇建设综合评价的方法:根据创建卫生城镇的总目标,以特尔斐方法进行指标的初步筛选;在此基础上用主成分分析,聚类分析、变异系数法、相关系数法等多种数学方法进行检验证或进一步筛选;对指标进行标准化处理后,用多元统计方法(因子分析和聚类分析)建立综合评价的数学模型,根据模型计算城镇的综合得分,进行分类。评价方法的建立可客观真实、高效简便、灵敏可靠地对城镇的卫生状况进行评估,用于卫生城镇的检查、调研和评估等。  相似文献   

目的探索基于女性生殖系统原发性恶性淋巴瘤相关免疫基因的共识聚类分型和分型群组之间免疫治疗的敏感性。方法从人类基因图谱(TCGA)数据库和基因表达综合(GEO)数据库(GSE138734,GSE168422)下载恶性淋巴瘤数据全转录组测序数据RNA-seq,提取其中原发部位为女性生殖系统的数据集合,进行批次矫正和对数值转换。从IMMPORT数据库提取免疫基因相关信息。使用R软件和edge R包进行差异基因分析。以差异倍数(FC)大于4,即|log_2FC|2,P0.05为筛选标准。使用单因素COX分析筛选与预后相关的免疫基因。采用Ward's方法对所有样本进行垂直分层聚类分析。单因素和多因素COX分析探索患者预后相关的独立因素。根据垂直分层聚类分析的结果,结合样本全转录组数据,进行主成分分析(PCA)。比较不同聚类患者PD-L1表达的差异性。结果共筛选出正常组织与女性生殖系统原发性恶性淋巴瘤组织差异表达的免疫相关基因102个,其中在淋巴瘤组织中上调的55个;在淋巴瘤组织中下调的47个。单因素COX分析表明共20个免疫基因与患者预后相关(P0.05),其中16个基因表达量越高预后越差(HR 1); 4个基因表达量越高预后越好(HR1)。共识聚类分型将176例患者分为群组1 (113例)和群组2 (63例)。PCA分析表明,群组1和群组2组间具有良好的区分度,基于差异表达免疫相关基因的聚类分型,可以表明样本特征。单因素COX分析表明,聚类分型群组与女性生殖系统恶性淋巴瘤患者预后相关,群组1的预后较群组2好(P0.05,HR1);多因素COX分析表明,年龄和聚类分型群组是女性生殖系统恶性淋巴瘤患者预后相关的独立危险因素(P0.05,HR1)。免疫治疗靶点敏感性分析表明PD-L1在群组1中的表达量(11.31±5.14)显著高于其在群组2中的表达量(3.76±1.41),差异有统计学意义(t=19.487,P0.05),群组1接受PD-L1靶点的免疫治疗较群组2敏感。结论基于差异表达的免疫基因的共识聚类分型具有良好的区分度,可以表明样本特征。群组1接受PD-L1靶点的免疫治疗较群组2敏感。  相似文献   

目的探索反映临床主任的能力和水平,对临床科主任综合素质和能力进行科学合理评估的方法。方法运用主成分聚类分析法,对医院29个学科带头人从自身条件、医疗水平、教学科研、科室建设等方面遴选提炼12个指标先进行主成分分析排序,对分析结果再聚类分析。结果发现12个指标基本能够反映临床科主任的临床能力、影响力、科室管控、科研水平等,并且通过主成分聚类方法能够评估每个临床主任的优势能力和综合能力。结论主成分聚类方法评估临床科主任能力可以避免评估片面性,适用于多因素综合评估,是一种值得应用推广的方法 。  相似文献   

二十九个省,市,自治区卫生状况的综合评价(上)   总被引:4,自引:1,他引:3  
本文搜集了全国29个省(市)有关社会卫生状况的指标150项,结合专业知识,根据评价目的选出反映这方面情况的指标60项,用“变异数法”,“相关系数法”聚类分析,主成分分析,分别对指标作进一步的筛选,最后入选指标27项。然后,我们采用定性排序和定量转化的方法给指标以合理的权重系数,用秩和比法(RSR),对指标进行综合得出社会经济(RSR1),卫生服务(RSR2),卫生资源(RSR3),健康状况(RSR4)4个综合指标。然后以这4个指标为分类变量,采用K一mean聚类,将29个省(市)分为5类,同时对综合指标间关系进行了分析。  相似文献   

目的 农村社区卫生服务功能评价方法学研究。方法 聚类分析与判别分析相结合;样品聚类与秩和比法相结合;因子分析与判别分析相结合。结果 实现了对未知类别的新事物建立分类标准并准确分类的目的。结论 方法结合运用可以在保留各自优点的同时互相弥补对方的缺点,既使结果很好地符合现实情况,又提高了评价的效率和保证了评价结果的再现性和稳定性。  相似文献   

When a large number of genes are significant in correlating microarray gene expression data with patient prognosis, clustering of significant genes may be effective not only for further dimension reduction but also for identifying co-regulated genes that belong to the same molecular pathway related to disease biology and aggressiveness. Moreover, a reduced feature, such as the average expression across samples for a cluster of significant genes, can play an important role in reducing variance in prediction analysis. We propose a simple procedure to select gene clusters that have strong marginal association with survival outcome from a large pool of candidate hierarchical clusters of significant genes. Selected gene clusters can have better predictive capability than the other gene clusters and singleton genes. Application of such clustering to the data set from a clinical study for patients with multiple myeloma and associated microarrays is given.  相似文献   

The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.  相似文献   

目的研究饲喂高脂日粮对小鼠股骨基因表达谱的影响。方法 4w龄C57BL/6雄性小鼠,体重13~14g,饲养正常日粮4d后,根据体重随机分为对照组(基础日粮)和高脂日粮组(19.5%猪油),每组8只小鼠,饲养12w后处死,迅速分离股骨,每组4份50mg股骨,提取RNA后等量合并,应用Affymetrix MOE430A小鼠基因表达芯片获得股骨基因表达谱变化的信息,通过DAVID在线分析工具进行聚类分析。结果长期饲喂高脂日粮导致C57BL/6小鼠股骨基因显著表达差异主要涉及如下功能:阳离子通道、信号转导和转入调控、骨矿化、磷代谢调控和胶原合成。结论长期摄入高脂日粮导致小鼠骨组织众多骨代谢相关基因表达改变,造成骨形成减少。  相似文献   

In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.  相似文献   

关鹏  全宇  何苗  周宝森 《中国公共卫生》2006,22(10):1264-1265
目的探讨两步骤聚类分析及其在病理图像诊断分析中的应用。方法对正常、低度鳞状上皮内病变和高度鳞状上皮内病变宫颈细胞的51个特征参数采用两步骤聚类分析:(1)将样品预聚类成小的子类;(2)对预聚类的子类再进行逐步聚类。聚类采用对数似然距离,根据贝叶斯信息准则自动决定适宜分类数目,并对各指标重要性进行度量。结果对于正常、低度鳞状上皮内病变和高度鳞状上皮内病变宫颈细胞的分类正确率分别为98.0%。96.1%和100%。结论该聚类分析方法分类正确率较高,分类中各指标重要性的度量对指导病理图像分析具有一定的实际意义和应用价值。  相似文献   

  目的  利用时间序列特征提取方法对中国25个省级行政区的百日咳发病数据进行聚类, 根据聚类结果分析出各地区百日咳不同的发病模式, 为中国实施百日咳疾病防控统一规划提供科学依据。  方法  提取全国25个省级行政区百日咳时间序列的9个全局特征, 利用主成分分析将9个指标转化为3个主成分组成的特征矩阵进行层次聚类分析。选择最佳聚类数划分百日咳时间序列不同的发病模式。  结果  层次聚类最佳聚类数为3类, 即对应百日咳的3种发病模式, 分别为无周期性有季节性无趋势性模式(共9个省级行政区)、无周期性有季节性有趋势性模式(共10个省级行政区)和有周期性有季节性有趋势性模式(共6个省级行政区)。  结论  时间序列特征提取的层次聚类能够很好地将相似模式紧密的分在一组, 并准确的划分出中国25个省级行政区百日咳疫情的发病模式, 聚类结果可为相关部门制定不同省份百日咳的防控措施提供理论依据。  相似文献   

目的分析青海省啮齿类动物区系分布特点,初步探讨青海省啮齿类动物地理分布格局。方法通过文献资料检索,共收集青海省啮齿类45种,隶属2目9科。将青海省各自然地理单元作为基本单元,采用Ward’s方法对各基本单元进行聚类研究。结果青海省啮齿类聚为两大类群,青海羌塘高原和果洛玉树高原在距离系数为0.13水平上首先聚合为一个新类群,在距离系数0.21的水平上青海湖北山地、黄南山地、湟水河谷、柴达木盆地和青海祁连山地的啮齿类聚为一个新类群。结论聚类结果体现了各动物地理单元啮齿类与其地理环境相互渗透的现象。  相似文献   

几种野生菊科中草药微量元素主成分分析   总被引:1,自引:0,他引:1  
对菊科中草药臭灵丹、紫茎泽兰、辣子草、艾叶、叶下花、牛蒡、佩兰、千里光、天名精等9种中草药进行微量元素分析,通过主成分分析,选出3个主成分方程,找出中草药微量元素在主成分中的影响作用。根据主成分值和综合评价值进行二维聚类,找出中草药之间的亲疏关系。结果显示,具有抗HIV活性的有臭灵丹(Ti5067.3),辣子草(Ti5022.3),艾叶(Ti5020.3),紫茎泽兰(Ti506.92)聚为一类;叶下花(Ti5016.7)自成一类。为中草药抗HIV活性筛选提供一定参考。  相似文献   

Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed‐effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

Although population differences in gene expression have been established, the impact on differential gene expression studies in large populations is not well understood. We describe the effect of self-reported race on a gene expression study of lung function in asthma. We generated gene expression profiles for 254 young adults (205 non-Hispanic whites and 49 African Americans) with asthma on whom concurrent total RNA derived from peripheral blood CD4(+) lymphocytes and lung function measurements were obtained. We identified four principal components that explained 62% of the variance in gene expression. The dominant principal component, which explained 29% of the total variance in gene expression, was strongly associated with self-identified race (P<10(-16)). The impact of these racial differences was observed when we performed differential gene expression analysis of lung function. Using multivariate linear models, we tested whether gene expression was associated with a quantitative measure of lung function: pre-bronchodilator forced expiratory volume in one second (FEV(1)). Though unadjusted linear models of FEV(1) identified several genes strongly correlated with lung function, these correlations were due to racial differences in the distribution of both FEV(1) and gene expression, and were no longer statistically significant following adjustment for self-identified race. These results suggest that self-identified race is a critical confounding covariate in epidemiologic studies of gene expression and that, similar to genetic studies, careful consideration of self-identified race in gene expression profiling studies is needed to avoid spurious association.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号