首页 | 本学科首页   官方微博 | 高级检索  
     

基于伪F统计量的模糊聚类方法在基因表达数据分析中的应用
引用本文:易东,张彦琦,王文昌,张蔚,杨梦苏,黄明辉,方志俊. 基于伪F统计量的模糊聚类方法在基因表达数据分析中的应用[J]. 中国卫生统计, 2002, 19(3): 146-150
作者姓名:易东  张彦琦  王文昌  张蔚  杨梦苏  黄明辉  方志俊
作者单位:1. 第三军医大学卫生统计学教研室,400038
2. 香港城市大学基因组科技应用研究中心
摘    要:
目的:通过对基因芯片数据的分析,提出一种基因表达的分类方法。方法:首先,应用FCM模糊聚类法(Fuzzy Clustering Method)进行聚类,然后我们应用PFS(Psudo F-satistics)统计量作为一个判别函数来确定最佳类数目。结果:将本方法应用于模拟数据,人类纤维原细胞血清基因表达数据上,其结果表明明显好于K-均值法。结论:本方法基于没有聚类数据的任何先验知识和组成成分信息的前提下,考虑如何确定数据的分类结构,根据实际结果发现,该方法是揭示基因表达变化内在模式的有力工具。

关 键 词:基因 模糊聚类法 伪F-统计 数据分析

Gene Expression Clustering By Pseudo F-statistics Method
Yi Dong,Michael Yang et al. Gene Expression Clustering By Pseudo F-statistics Method[J]. Chinese Journal of Health Statistics, 2002, 19(3): 146-150
Authors:Yi Dong  Michael Yang et al
Abstract:
Objective In this paper, we consider the problem of determining the structure of clustered data, without prior knowledge of clusters or any other information about their composition. Methods First the Fuzzy Clustering Method (FCM) was used. The FCM is based on fuzzy qualities of each cluster, so it is suitable to cluster the points, which may overlap with each other. Then we used PFS (Pseudo F statistics) as a criterion function to determine the best number of clusters. It is defined as PFS Fuzzy Clustering method in the present study. Results Examples including gene expression of HL60 cells in response to ajoene are given. They show that this approach can give performance that is much better than the K means method, which often fails to identify overlapping points or requires an arbitrary cluster number. Conclusion This PFS Fuzzy method is effective to find the optimal number of clusters.
Keywords:Gene expression   Fuzzy clustering   PFS criteria  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号