首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
基因功能预测问题中的样本不平衡处理   总被引:3,自引:1,他引:3  
应用机器学习进行分类是基因功能预测的一种重要手段。但是许多预测集中的阳性样本过少,会降低功能预测的效果。针对此问题,本研究对结合支持向量机(SVM)算法的几种常用非平衡数据分类方法进行实验比较,包括投票整合分类器和移动分类面等。在此基础上提出通过加权修正投票的整合策略,以提高预测效果。实验结果显示,结合多数类样本限数取样及整合思想的投票整合法预测效果优于移动分类面法,而在投票整合法基础上的加权修正整合方法在所有方法中获得更好更稳定的结果。  相似文献   

2.
随着人类基因组测序工作的完成,人类基因组10^9碱基序列中具有功能的基因约3万个,其功能基因的序列已经明了,这些基因功能的研究正在轰轰烈烈的进行中。生物体内编码基因要执行其功能,需要经历DNA的复制、转录和翻译,即基因表达的过程。因而,要了解基因的功能,可以采用转基因、诱导突变观察基因缺失型的表现,进而推测它的功能;或是从基因的DNA、RNA或蛋白水平进行干预,抑制其表达,观察基因抑制后表型变化,研究其功能。转基因和诱导特异的基因突变效率低,相对而言,随着生物技术的发展,后一种方案就简单多了,因此生命科学研究中,出现了多种在DNA、RNA和蛋白水平上抑制某个或某些特定基因发挥功能的方法,而且这些方法也越来越多地应用于基因过度表达的疾病的治疗中。  相似文献   

3.
生物信息学在蛋白质结构与功能预测中的应用   总被引:15,自引:1,他引:14  
生物信息学是现代生命科学与信息科学、计算机科学、数学、统计学、物理学、化学等学科相互渗透而高度交叉形成的一门新兴前沿学科。随着人类基因组计划的完成,应用生物信息学技术预测蛋白质结构与功能将成为后基因组时代的一项重要任务。本文介绍了主要的蛋白质结构和序列数据库资源,在此基础上提出了一种高效的整合方法,并简述蛋白质结构与功能预测的基本方法、进展及其改进,展望了蛋白质预测技术的前景。  相似文献   

4.
基因预测是生物信息学领域中的一个重要研究方向.本文在研究了基因局部特征的基础上,针对现有遗传算法在预测基因5'exons方面的不足,从生物免疫机制出发,构建了一种用于基因5'exons预测的免疫遗传算法.该算法利用T细胞发育过程中强大的多样性维持机制来设计算法的选择机制,提高算法的求解性能.实验结果表明该算法提高了基因预测的精度,提供了一种可能的研究基因预测方案.  相似文献   

5.
基因功能研究的技术和方法   总被引:1,自引:0,他引:1  
随着基因组计划的深入,越来越多的有生理意义的基因被成功克隆,对基因功能的研究显得日益重要。目前基因功能研究的主要方法有:基因转导、反义技术、核酶、基因重组、染色体转导技术等。  相似文献   

6.
应用生物信息学方法分析人HCA56基因   总被引:8,自引:2,他引:8  
目的利用生物信息学对新克隆的人ligatin样基因HCA56的基因和蛋白序列进行分析,探讨生物信息学在新基因研究中的作用。方法以人类基因组数据库为基础,利用电子PCR和SAGE数据库对HCA56进行染色体定位和组织表达分布分析;BLAST程序进行HCA56的基因结构分析和相似序列搜索;ORF finder和Gene Runner3.05程序对HCA56编码蛋白进行序列预测和功能分析。结果得出HCA56的全长cDNA为2021bp,定位于染色体1q31-q32,其最长的开放读码框架为1755bp,编码584个氨基酸,编码氨基酸含有一个亮氨酸拉链和一假定的RNA结合保守序列。相似性搜索发现HCA56基因片段与人ligatin基因片段具有很高的相似性(99%)。SAGE结果显示HCA56在多种组织中都有表达。结论生物信息学是进行新基因研究的有效方法;HCA56很可能是ligatin的全长编码基因。  相似文献   

7.
目的 利用生物信息学方法对新克隆的GP1的基因和蛋白序列进行分析,推测其在胃癌发生、发展过程中的作用。方法 以人类基因组数据库为基础,利用Megablast工具和SAGE数据库对GP1进行染色体定位和组织表达分布分析;BLAST程序进行GP1的基因结构分析和相似序列搜索;ORF finder程序对GP1编码蛋白进行序列预测;应用ProParam、TMPRED等软件分析GP1编码蛋白质的理化性质、亚细胞定位及结构域等信息。结果 GP1基因全长1362bp,其最长开放读码框为801bp,编码267个氨基酸,相似性搜索发现GP1基因片段与人AF083246基因片段具有很高的相似性(100%)。在GP1蛋白上有一个与真核起始因子5C(eIF5C)具有低度同源性的结构域。SAGE结果显示GP1在多种组织中都有表达。RT-PCR验证了基因的组织表达谱。结论 生物信息学的分析表明GP1是一个胃癌相关基因,可能在胃癌的发生发展过程中起到一定的作用。  相似文献   

8.
目的:搜索miRBase数据库获取多个物种的miR-31的序列特征。方法用TargetScan、 PicTar和miRecords 3种在线工具对miR-31靶基因进行预测;搜索文献,查找miRecords获得证实的miR-31的靶基因;对所用靶基因进行功能富集分析和信号通路富集分析。结果 miR-31的靶基因的富集的生物进程和富集的信号通路,多与各种疾病的发生相关,如肿瘤、心脏疾病。结论 miR-31的很多生物功能还未被证实,通过生物信息学方法预测得到的结果可以为后续实验研究提供方向和思路。  相似文献   

9.
10.
基因预测是生物信息学领域中的一个重要研究方向。本文在研究了基因局部特征的基础上,针对现有遗传算法在预测基因5’exons方面的不足,从生物免疫机制出发,构建了一种用于基因5’exons预测的免疫遗传算法。该算法利用T细胞发育过程中强大的多样性维持机制来设计算法的选择机制,提高算法的求解性能。实验结果表明该算法提高了基因预测的精度,提供了一种可能的研究基因预测方案。  相似文献   

11.
This article addresses the role of probabilistic prediction in the systems organization of behavioral acts. The systems mechanisms predicting the required results of the behavioral acts of living beings in stable and changing conditions are discussed. It is suggested that in all these types of behavior, the parameters of results satisfying the leading needs, which constitute the aims of behavior, are strictly predicted. The author believes that probabilistic prediction is related only to the means, actions, and accompanying emotional states, as well as that possible ways of achieving results, i.e., not to the parameters but to the means of achieving the required results.  相似文献   

12.

Background

The responsible genes have not yet been identified for many genetically mapped disease loci. Physically interacting proteins tend to be involved in the same cellular process, and mutations in their genes may lead to similar disease phenotypes.

Objective

To investigate whether protein–protein interactions can predict genes for genetically heterogeneous diseases.

Methods

72 940 protein–protein interactions between 10 894 human proteins were used to search 432 loci for candidate disease genes representing 383 genetically heterogeneous hereditary diseases. For each disease, the protein interaction partners of its known causative genes were compared with the disease associated loci lacking identified causative genes. Interaction partners located within such loci were considered candidate disease gene predictions. Prediction accuracy was tested using a benchmark set of known disease genes.

Results

Almost 300 candidate disease gene predictions were made. Some of these have since been confirmed. On average, 10% or more are expected to be genuine disease genes, representing a 10‐fold enrichment compared with positional information only. Examples of interesting candidates are AKAP6 for arrythmogenic right ventricular dysplasia 3 and SYN3 for familial partial epilepsy with variable foci.

Conclusions

Exploiting protein–protein interactions can greatly increase the likelihood of finding positional candidate disease genes. When applied on a large scale they can lead to novel candidate gene predictions.  相似文献   

13.
Risk assessment has high prognostic value in patients with colorectal cancer (CRC), and the use of proper models is an effective approach frequently used to evaluate cancer progression for further treatment plans. Alterations in metabolism are confirmed to be a significant feature of tumor cells and have been an intense focus in disease research. Here, we mined the genes that were differentially expressed in CRC tissues compared to paired normal samples from a public database and then constructed a novel assessment system for the prognosis of patients based on the value of a risk score considering the expression status of metabolism-related genes after screening. The score successfully stratified patients by risk and was externally verified in our study. Moreover, we built a nomogram combining the score and clinical parameters to predict patient survival using a visual method. The results suggested that the risk score was well fit and could provide assistance for the individual treatment of CRC patients in the clinic.  相似文献   

14.
The identification of genomic loci associated with human genetic syndromes has been significantly facilitated through the generation of high density SNP arrays. However, optimal selection of candidate genes from within such loci is still a tedious labor‐intensive bottleneck. Syndrome to Gene (S2G) is based on novel algorithms which allow an efficient search for candidate genes in a genomic locus, using known genes whose defects cause phenotypically similar syndromes. S2G ( http://fohs.bgu.ac.il/s2g/index.html ) includes two components: a phenotype Online Mendelian Inheritance in Man (OMIM)‐based search engine that alleviates many of the problems in the existing OMIM search engine (negation phrases, overlapping terms, etc.). The second component is a gene prioritizing engine that uses a novel algorithm to integrate information from 18 databases. When the detailed phenotype of a syndrome is inserted to the web‐based software, S2G offers a complete improved search of the OMIM database for similar syndromes. The software then prioritizes a list of genes from within a genomic locus, based on their association with genes whose defects are known to underlie similar clinical syndromes. We demonstrate that in all 30 cases of novel disease genes identified in the past year, the disease gene was within the top 20% of candidate genes predicted by S2G, and in most cases—within the top 10%. Thus, S2G provides clinicians with an efficient tool for diagnosis and researchers with a candidate gene prediction tool based on phenotypic data and a wide range of gene data resources. S2G can also serve in studies of polygenic diseases, and in finding interacting molecules for any gene of choice. Hum Mutat 30:1–8, 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

15.
16.
Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation in humans. The number of SNPs identified in the human genome is growing rapidly, but attaining experimental knowledge about the possible disease association of variants is laborious and time-consuming. Several computational methods have been developed for the classification of SNPs according to their predicted pathogenicity. In this study, we have evaluated the performance of nine widely used pathogenicity prediction methods available on the Internet. The evaluated methods were MutPred, nsSNPAnalyzer, Panther, PhD-SNP, PolyPhen, PolyPhen2, SIFT, SNAP, and SNPs&GO. The methods were tested with a set of over 40,000 pathogenic and neutral variants. We also assessed whether the type of original or substituting amino acid residue, the structural class of the protein, or the structural environment of the amino acid substitution, had an effect on the prediction performance. The performances of the programs ranged from poor (MCC 0.19) to reasonably good (MCC 0.65), and the results from the programs correlated poorly. The overall best performing methods in this study were SNPs&GO and MutPred, with accuracies reaching 0.82 and 0.81, respectively.  相似文献   

17.
目的蛋白质磷酸化是通过激酶催化特定位点把磷酸基转移到底物蛋白质氨基酸残基的过程,是研究蛋白质活力及功能的重要机制。目前已鉴定的数千个磷酸化位点大多缺失激酶信息,为此本研究提出基于PU-learning的磷酸激酶预测算法,通过迭代标记磷酸位点,可以准确预测催化磷酸肽的磷酸激酶。方法首先该算法以PU-learning为框架,利用最大熵方差对不同种类的磷酸激酶自动筛选最佳阈值,从而提取每条磷酸肽上潜在的磷酸化位点,然后根据统计分析确定磷酸化位点对应的激酶,最后通过五折交叉验证该算法在Phospho.ELM数据库上的预测性能,并与现有算法对比。结果该算法的交叉验证特异性和灵敏度比现有最好算法在单个数据集上最高提高4%及10%,其预测Phospho.ELM中数据准确度达到79.52%。结论基于PU-learning的磷酸激酶预测算法显著优于现有算法,且可以准确预测Phospho.ELM数据库中未知激酶信息的磷酸肽,在磷酸化实验中具有较强的指导意义。  相似文献   

18.
目的应用生物信息学技术预测曼氏血吸虫己糖激酶(SmHK)的结构和功能,为进一步功能研究提供信息。方法从GenBank获取SmHK及其他物种HK全长eDNA序列及氨基酸序列,应用NCBI、Expasy等在线生物信息学网站及VectorNTI软件包,对所获氨基酸序列的保守功能域及基序、蛋白质理化参数、亚细胞定位、亲水性、B细胞线性表位、二级结构及拓扑结构、三级结构建模分析及预测。结果SmHK编码451氨基酸残基,理论分子量为50446.01Da,具有完整HK-1及HK-2保守功能域,与结构和功能有关的位点高度保守,与宿主(人、鼠)的同源性为30%,与人等脊椎动物有较近的进化关系;有多个潜在的抗原表位、多个磷酸化位点及1个跨膜结构。三级结构分子建模显示该蛋白两基团间有一裂隙,葡萄糖、ATP结合位点及ATP催化区位于该裂隙中或周围,跨膜区与其两端的碱性氨基酸形成一阴离子通道。结论SmHK与结构和功能相关的位点高度保守。与宿主有较近的进化关系,推测该蛋白可能通过跨膜区锚定在线粒体外膜上,主要功能位点位于蛋白裂隙中或周围,多个磷酸化位点说明其参与多种细胞功能的调节,在调节能量代谢中起重要作用,是潜在的疫苗候选分子和药物作用靶标。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号