首页 | 本学科首页   官方微博 | 高级检索  
检索        

一种基于卷积神经网络的DIA数据预处理模型
引用本文:陈冲,郑浩然.一种基于卷积神经网络的DIA数据预处理模型[J].北京生物医学工程,2020(1):56-61.
作者姓名:陈冲  郑浩然
作者单位:中国科学技术大学计算机科学与技术学院
基金项目:国家重点基础研究发展计划(2017YFA0505502);安徽省自然科学基金(1508085MF128)资助
摘    要:目的数据非依赖性采集(data independent acquisition,DIA)是目前针对大通量蛋白质组学分析常用的一种数据采集方式。在对DIA数据无目标的分析方式中,由于无法预测肽段出现在DIA数据中的位置,需要对谱中所有的峰进行分析。但谱中含有大量的噪声峰,这些峰会严重影响后续蛋白质定性定量分析的效率与效果,所以在DIA数据的无目标分析过程中先进行预处理以去除噪声峰就成了很重要的一步。为了能充分利用从DIA数据中提取出来的肽段在一级质谱(first stage of mass spectrometry,MS1)和二级质谱(second stage of mass spectrometry,MS2)中的峰信息,提出质谱卷积神经网络(mass spectrometry convolutional neural network,MSCNN)模型。方法不同于传统的方法,本文首先提出适用于MSCNN网络结构的样本提取流程,然后利用MSCNN对样本进行训练和学习,该模型可以最大限度利用肽在MS1和MS2中的特征,最后通过观察模型在测试集中的结果来验证模型的效果。结果和传统算法相比,在保证真峰处理效果大致相同的情况下,MSCNN模型过滤噪声峰的数量提高了约11.2%。结论本文提出的MSCNN模型可以更有效地去除DIA数据中的噪声峰。

关 键 词:蛋白质组学  卷积神经网络  质谱  预处理  相关性

A preprocessing model for dia data based on convolutional neural network
CHEN Chong,ZHENG Haoran.A preprocessing model for dia data based on convolutional neural network[J].Beijing Biomedical Engineering,2020(1):56-61.
Authors:CHEN Chong  ZHENG Haoran
Institution:(Department of Computer Science and Technology,University of Science and Technology of China,Hefei 230027)
Abstract:Objective DIA(data-independent acquisition)data is currently a commonly used data acquisition method for high-throughput proteomics analysis.In the untargeted analysis of DIA data,all peaks in the spectra need to be analyzed because it is impossible to predict where the peptides will appear in the DIA data.However,the spectra contains a large number of noise peaks,which have a great influence on the efficiency and effect of subsequent identification and quantification of protein.Therefore,the preprocessing to remove noise peaks is a critical step during the untargeted analysis of DIA data.In order to make full use of the features of peptides extracted from DIA data in MS1(first stage of mass spectrometry)and MS2(second stage of mass spectrometry),we propose a MSCNN(mass spectrometry convolutional neural network)model based on convolutional neural network.MethodsUnlike traditional methods,this paper first proposes a sample extraction process suitable for MSCNN network structure,and then uses the sample to train MSCNN,which can make the best use of the features of peptides in MS1 and MS2.Finally,the effect of our model is obtained by observing the results of test set.ResultsCompared with the traditional algorithm,the number of filtered noise peaks of the MSCNN model is increased by about 11.2%under the condition that the true peak processing effect is substantially the same.ConclusionsThe MSCNN model proposed in this paper can remove noise peaks in DIA data more effectively.
Keywords:proteomics  convolutional neural network  mass spectrometry  preprocessing  correlation
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号