首页 | 本学科首页   官方微博 | 高级检索  
检索        

基于粒子群优化算法的白血病基因表达样本分类研究
引用本文:刘亚杰,高莲,周杰,姚瑞晗,朱玲,王晓燕.基于粒子群优化算法的白血病基因表达样本分类研究[J].生物医学工程与临床,2020(1):75-80.
作者姓名:刘亚杰  高莲  周杰  姚瑞晗  朱玲  王晓燕
作者单位:昆明医科大学第三附属医院(云南省肿瘤医院);云南大学信息学院;云南农业大学信息工程学院
基金项目:云南省肿瘤医院博士科研启动基金资助(BSJJ201513)。
摘    要:目的基于分子生物学的微阵列基因表达数据和智能优化算法对白血病肿瘤样本进行分类研究。方法给出基于粒子群优化(PSO)算法用于分类模型的训练和测试,选取含7129个基因的72个白血病基因表达样本,从中选取包含50、100和200个特征基因的3组数据,在不同基因数条件下分别执行10次分类测试。建立基于K-均值算法的分类模型,在同等条件下验证PSO算法分类性能。使用准确率、精确率、召回率、F1值等机器学习指标及Boxplot和Heatmap图谱用于分析对比。结果PSO算法用于分类测试的数据分别含20例急性淋巴细胞白血病(ALL)和14例急性髓细胞白血病(AML)样本。10次分类结果的平均分类准确率均在90%左右;PSO算法的分类准确率并不稳定,10次分类测试中,准确率的平均值和最优值间存在明显差异;ALL亚型的召回率明显高于AML亚型,均接近100%,但AML亚型的精确率明显高于ALL亚型,均接近100%,F1值可比性不大。K-均值算法与PSO算法类似,分类性能随着基因数的增加而降低;K-均值算法在200基因数条件下分类结果较差,分类稳定性和准确率均出现大幅下降,且低于同等条件下PSO算法分类结果;100个基因数条件下,ALL亚型召回率为100%,高于AML亚型;AML亚型精确率为100%,高于ALL亚型;200个基因数条件下,平均值中ALL亚型召回率和F1值高于AML亚型,AML亚型精确率高于ALL亚型,其最优值的统计学指标差异不大。相同白血病肿瘤样本的不同特征基因数条件下,PSO算法可获得较高准确率的分类结果,但分类稳定性不足,整体上优于K-均值算法。结论PSO算法能够应用于白血病基因表达样本的分类研究。

关 键 词:基因表达样本分类  白血病  粒子群优化算法  K-均值算法

Classification of leukemia gene expression data based on particle swarm optimization algorithm
LIU Ya-jie,GAO Lian,ZHOU Jie,YAO Rui-han,ZHU Ling,WANG Xiao-yan.Classification of leukemia gene expression data based on particle swarm optimization algorithm[J].Biomedical Engineering and Clinical Medicine,2020(1):75-80.
Authors:LIU Ya-jie  GAO Lian  ZHOU Jie  YAO Rui-han  ZHU Ling  WANG Xiao-yan
Institution:(The Third Affiliated Hospital of Kunming Medical University?Yunnan Cancer Hospital,Kunming 650118,Yunnan,China;Scholl of Information Science and Engineering,Kunming 650091,Yunnan,China;School of Information Engineering,Yunnan Agricultural University,Kunming 650201,Yunnan,China)
Abstract:Objective To study classification of microarray gene expression data and intelligent optimization algorithms for molecular biology based on leukemia tumor samples.Methods The classification model training and testing were used to based on particle swarm optimization(PSO)algorithm,72 leukemia gene expression samples of 7129 genes were selected,from which 3 sets of data contained 50,100 and 200 characteristic genes were selected,and 10 classification tests were performed in different gene count conditions.The classification model based on K-means was established to verify performance of PSO algorithm at the same conditions.The machine learning indicators such as accuracy,accuracy rate,recall rate,F1,Boxplot and Heatmap were used to analyze and compare.Results The data used by PSO algorithm for classification testing contained 20 acute lymphoblastic leukemia(ALL)and 14 acute myelocytic leukemia(AML)samples,respectively.The mean classification accuracy of 10 classification results was about 90%;The classification accuracy of PSO algorithm was unstable,the mean and optimal values of 10 classification accuracy were significantly different.The recall rate of ALL was significantly higher than that of AML,which was close to 100%,but accuracy of AML was significantly higher than that of ALL,which was close to100%,the F1 value was not comparable.The K-means algorithm was similar to PSO algorithm.The classification performance decreased with the increase of gene counts.The K-means algorithm showed poor classification results.In 200 genes count condition,the classification stability and accuracy were significantly reduced,which were lower than PSO algorithm classifica-tion.In 100 genes count condition,recall rate of ALL was 100%,which was higher than that of AML;AML accuracy rate was100%,which was higher than that of ALL;In 200 genes count condition,recall rate of All and F1 value were higher than those of AML,and accuracy rate of AML was higher than that of ALL,and statistical value of optimal value was not much different.In different characteristic gene count conditions of the same leukemia samples,PSO algorithm classification method was obtained higher accuracy classification results,but stability was insufficient,and the overall was better than that of K-means algorithm.Conclusion It is demonstrated that the PSO could be used as the classification algorithm for leukemia gene expression samples.
Keywords:classification of gene expression samples  leukemia  particle swarm optimization(PSO)  K-means
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号