首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
CRISPR/Cas系统是最近几年才被发现的,它是基于RNA来控制细菌与古细菌中病毒与质粒的入侵.在有CRISPR/Cas免疫系统的原核生物基因中发现了短的重复序列和来自于CRISPR基因座的小RNA,它能引导Cas9蛋白去识别与降解入侵的核苷酸序列.目前,CRISPR/Cas9系统已迅速革新基因工程这片领域,让研究者可以相对轻松地改变多种生物的基因组,而且在免疫系统中能通过编辑免疫细胞达到缓解疾病的程度.  相似文献   

2.
目的非编码RNA-蛋白质的相互作用(noncoding RNA-protein interactions,ncRPI)具有重要的生物学意义,目前预测其相互作用已成为当下研究非编码RNA (noncoding RNA,ncRNA)和蛋白质功能的重要途径之一。方法本研究基于ncRNA和蛋白质的序列信息提取特征,运用卷积自编码器预处理原始数据,训练三个机器学习模型:LightGBM(LBM)、随机森林(random forest,RF)和极端梯度增强算法(extreme gradient boosting,XGB),预测ncRNA与蛋白质的相互作用。结果在RPI369和RPI488两个数据集做5倍交叉验证,LBM、RF与XGB三个模型在两个数据集均达到较高的预测准确率,在RPI369数据集三个模型的预测准确率分别为0.757(LBM)、0.791(RF)、0.791(XGB),在RPI488数据集三个模型的预测准确率分别为0.918(LBM)、0.908(RF)、0.918(XGB);三个模型在RPI1807、RPI2241、RPI13254大数据集也取得较高的AUC(area under curve)值,在RPI1807三个模型的AUC值均为0.99,在RPI2241三个模型最低AUC值为0.87,在RPI13254三个模型最低AUC值为0.81,都表现出较好的预测准确性。结论机器学习方法能够预测ncRNA与蛋白质是否存在相互作用。  相似文献   

3.
序列分类方法被广泛应用于各种生物信息学问题,例如转录调控元件识别和蛋白结构预测。本研究设计了一个新的基于序列特征的分类方法,并将其用于RNA剪接调控元件的研究。该方法从已知剪接元件中抽取序列特征,构建一个打分算法,由此预测未知元件RNA剪接调控功能。作为应用实例,采用已知外显子剪接增强子和沉默子(ESE和ESS)八联体作为实验数据,对本方法和若干已知常用方法的预测结果进行比较,3类计算验证实验中的平均预测精度为93%,表现出良好预测精度,且其透明的预测结构可帮助进行生物解释。该研究提供了一种可用于分析生物序列数据的新方法,给出了一个从生物信息学角度来研究基因调控问题的新途径。  相似文献   

4.
HMG-CoA还原酶是甲羟戊酸生物合成的限速酶.系统进化分析表明生物界存在两类HMG-CoA还原酶,Ⅰ类主要存在于真核生物和部分古细菌中,Ⅱ类主要存在于原核生物和少数古细菌中.它汀类药物抑制HMG-CoA还原酶活性,是HMG-CoA还原酶的良好竞争性抑制剂.文章阐述了HMG-CoA还原酶的催化机理及系统进化特征与分类,并比较了两类HMG-CoA还原酶催化特性的差异.  相似文献   

5.
目的根据肝癌临床诊断的需求,建立肝癌诊断预测模型,以达到无创检测肝癌的目的。方法利用德国企业产ILD3000型电子鼻设备采集正常受试者和肝癌患者的呼气数据,对呼气所得时间序列数据进行特征提取,包括序列数据的最大值、最小值、均值、标准差、序列数据总和等统计学特征。结合特征降维算法和机器学习分类模型对呼气特征数据进行正常受试者和原发性肝癌患者的二分类实验。结果通过模型选择和参数调整,在线性核函数支持向量机上对呼气数据取得92.3%的最优二分类结果。结论以正常受试者和肝癌患者的呼气数据为样本,利用机器学习建模的方法可以对肝癌做出诊断预测,且在此数据上,线性核函数支持向量机算法具有最好的分类效果。  相似文献   

6.
以氨基酸组成为特征对膜蛋白的分类,忽略了序列残基之间的相关性信息,而采用传统支持向量机算法作为分类算法,在解决多类问题时会出现分类盲区问题。针对这两种情况,计算蛋白质序列的氨基酸组成、二肽组成以及6种氨基酸相关系数,将三类特征结合,作为膜蛋白序列的特征向量;同时采用模糊支持向量机作为分类器,解决了传统支持向量机在多类数据识别中的盲区问题。测试结果表明,在相同特征输入下,模糊支持向量机分类性能优于传统支持向量机;在相同分类器的情况下,氨基酸组成、二肽组成和相关系数组合的特征选择方法的分类性能优于只使用其中一类或两类特征的方法;而采取组合特征和模糊支持向量机相结合的分类策略,在独立性数据集测试中的整体预测精度达到97%,优于现有的多种分类策略,是目前最有效的膜蛋白分类方法之一。  相似文献   

7.
蛋白质与蛋白质相互作用研究是蛋白质组学的重要研究内容之一.本研究采用支持向量机学习方法,将氨基酸物理化学特性和序列信息方法相结合构建支持向量,选取DIP数据库中的酵母表达蛋白序列进行蛋白质相互作用预测.在34 000对酵母表达蛋白实验数据中,预测准确率达到83.72%,而单独运用基于氨基酸物理化学特性的方法和基于序列信息的方法预测准确率分别为75.86%和79.63%.在提高预测准确率的同时通过引入离散信息度量函数(FDOD)减少支持向量的维数,使支持向量学习时间缩短,提高相互作用预测的速度.  相似文献   

8.
通过统计建模的方法,对蛋白质跨膜螺旋片段进行准确有效的预测。针对跨膜蛋白序列的生物学特征,提出了一种新的隐马尔科夫模型分段训练算法,对跨膜螺旋的分段位点以及螺旋方向等特征进行建模和预测。同标准训练算法相比,该算法具有时同复杂度低,预测精度高等优点。对于包含160条跨膜螺旋的蛋白序列进行10次交叉验证测试。使用该训练算法的预测准确率为96.98%,正确定位精度为91.25%,高于其他预测方法对该数据集的预测结果,验证了该算法的合理性和有效性。  相似文献   

9.
目的 预测蛋白质二级结构是预测其空间结构的基础,提高蛋白质二级结构的预测率非常重要.方法在本研究中,结合氨基酸的疏水性与含有进化信息的位置特异性得分矩阵(PSSM),构建BP神经网络.本文的数据来源于蛋白质数据集合CB513,在此集合中去除氨基酸个数小于30及含有X、B的序列,共492条蛋白序列作为数据集.通过4-交互验证预测准确率.在本研究中,将蛋白质二级结构预测的结果与仅用PSSM作为输入的神经网络预测相比较.结果 采用疏水性与进化信息相结合作为输入所构建的神经网络对α螺旋的预测准确率有了较大的提高,达到近79%,敏感性及特异性分别达到79%及91%.同时对二级结构总体预测准确率达到75.96%.结论 此种方法构建的BP网络能提高蛋白质二级结构,尤其是α螺旋的预测准确率.  相似文献   

10.
糖尿病患者的血糖浓度时间序列具有时变、非线性和非平稳的特点,为提高血糖预测精度,提出一种自适应噪声的完整聚合经验模态分解(CEEMDAN)与极限学习机(ELM)相结合的短期血糖预测模型。首先,利用CEEMDAN方法将患者的血糖浓度时间序列进行分解,得到不同频段的血糖分量IMF(本征模态函数)和残余分量,以降低血糖时间序列的非平稳性;然后对各血糖分量IMF和残余分量分别构建极限学习机,并将各极限学习机的预测结果融合,获得患者未来血糖浓度的预测值,提高预测精度;在此基础上,进行低血糖预警。利用从河南省人民医院内分泌科采集的56例患者的数据进行模型检验,结果表明:与ELM模型和EMD-ELM模型相比,CEEMDAN-ELM短期血糖预测模型提前45 min的预测仍可达到较高预测水平(RMSE=0.205 1,MAPE=2.116 4%);低血糖预警虚警率和漏警率分别为0.97%和7.55%。血糖预测时间的延长,可以为医生和患者提供充足时间进行血糖浓度控制,提高糖尿病治疗的效果。  相似文献   

11.
BackgroundWe performed this study to establish a prediction model for 1-year neurological outcomes in out-of-hospital cardiac arrest (OHCA) patients who achieved return of spontaneous circulation (ROSC) immediately after ROSC using machine learning methods.MethodsWe performed a retrospective analysis of an OHCA survivor registry. Patients aged ≥ 18 years were included. Study participants who had registered between March 31, 2013 and December 31, 2018 were divided into a develop dataset (80% of total) and an internal validation dataset (20% of total), and those who had registered between January 1, 2019 and December 31, 2019 were assigned to an external validation dataset. Four machine learning methods, including random forest, support vector machine, ElasticNet and extreme gradient boost, were implemented to establish prediction models with the develop dataset, and the ensemble technique was used to build the final prediction model. The prediction performance of the model in the internal validation and the external validation dataset was described with accuracy, area under the receiver-operating characteristic curve, area under the precision-recall curve, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Futhermore, we established multivariable logistic regression models with the develop set and compared prediction performance with the ensemble models. The primary outcome was an unfavorable 1-year neurological outcome.ResultsA total of 1,207 patients were included in the study. Among them, 631, 139, and 153 were assigned to the develop, the internal validation and the external validation datasets, respectively. Prediction performance metrics for the ensemble prediction model in the internal validation dataset were as follows: accuracy, 0.9620 (95% confidence interval [CI], 0.9352–0.9889); area under receiver-operator characteristics curve, 0.9800 (95% CI, 0.9612–0.9988); area under precision-recall curve, 0.9950 (95% CI, 0.9860–1.0000); sensitivity, 0.9594 (95% CI, 0.9245–0.9943); specificity, 0.9714 (95% CI, 0.9162–1.0000); PPV, 0.9916 (95% CI, 0.9752–1.0000); NPV, 0.8718 (95% CI, 0.7669–0.9767). Prediction performance metrics for the model in the external validation dataset were as follows: accuracy, 0.8509 (95% CI, 0.7825–0.9192); area under receiver-operator characteristics curve, 0.9301 (95% CI, 0.8845–0.9756); area under precision-recall curve, 0.9476 (95% CI, 0.9087–0.9867); sensitivity, 0.9595 (95% CI, 0.9145–1.0000); specificity, 0.6500 (95% CI, 0.5022–0.7978); PPV, 0.8353 (95% CI, 0.7564–0.9142); NPV, 0.8966 (95% CI, 0.7857–1.0000). All the prediction metrics were higher in the ensemble models, except NPVs in both the internal and the external validation datasets.ConclusionWe established an ensemble prediction model for prediction of unfavorable 1-year neurological outcomes in OHCA survivors using four machine learning methods. The prediction performance of the ensemble model was higher than the multivariable logistic regression model, while its performance was slightly decreased in the external validation dataset.  相似文献   

12.
To understand the RNA expression in response to acid stress of Helicobacter pylori in genomic scale, a microarray membrane containing 1,534 open reading frames (ORFs) from strain 26695 was used. Total RNAs of H. pylori under growth conditions of pH 7.2 and 5.5 were extracted, reverse transcribed into cDNA, and labeled with biotin. Each microarray membrane was hybridized with cDNA probe from the same strain under two different pH conditions and developed by a catalyzed reporter deposition method. Gene expression of all ORFs was measured by densitometry. Among the 1,534 ORFs, 53 ORFs were highly expressed (> or = 30% of rRNA control in densitometry ratios). There were 445 ORFs which were stably expressed (<30% of rRNA in densitometry) under both pH conditions without significant variation. A total of 80 ORFs had significantly increased expression levels at low pH, while expressions of 4 ORFs were suppressed under acidic condition. The remaining 952 ORFs were not detectable under either pH condition. These data were highly reproducible and comparable to those obtained by the RNA slot blot method. Our results suggest that microarray can be used in monitoring prokaryotic gene expression in genomic scale.  相似文献   

13.
Bacteriophages are viruses that specifically infect and lyse prokaryotic cells and therefore might be used as biocontrol agents. However, it is necessary to acquire genomic information to predict and understand the phage’s characteristics for the efficient and safe use of bacteriophages as biocontrol agents against bacterial pathogens. In this study, the complete genome sequence of a novel enterobacteriophage, phiKP26, was determined by pyrosequencing. Genomic analysis of phiKP26 revealed a genome size of 47,285 bp with an overall G + C content of 44.3 %. Seventy-eight open reading frames (ORFs) in the phiKP26 genome were grouped into the modules of replication, DNA packaging, morphogenesis, cell lysis and absence of genes related to virulence and lysogeny.  相似文献   

14.
Prediction of kidney transplant outcome represents an important and clinically relevant problem. Although several prediction models have been proposed based on large, national collections of data, their utility at the local level (where local data distributions may differ from national data) remains unclear. We conducted a comparative analysis that modeled the outcome data of transplant recipients in the national US Renal Data System (USRDS) against a representative local transplant dataset at the University of Utah Health Sciences Center, a regional transplant center. The performance of an identical set of prediction models was evaluated on both national and local data to assess how well national models reflect local outcomes. Compared with the USRDS dataset, several key characteristics of the local dataset differed significantly (e.g., a much higher local graft survival rate; a much higher local percentage of white donors and recipients; and a much higher proportion of living donors). This was reflected in statistically significant differences in model performance. The area under the receiver operating characteristic curve values of the models predicting 1, 3, 5, 7, and 10-year graft survival on the USRDS data were 0.59, 0.63, 0.76, 0.91, and 0.97, respectively. In contrast, in the local dataset, these values were 0.54, 0.58, 0.58, 0.61, and 0.70, respectively. Prediction models trained on a national set of data from the USRDS performed better in the national dataset than in the local data. This might be due to the differences in the data characteristics between the two datasets, suggesting that the wholesale adoption of a prediction model developed on a large national dataset to guide local clinical practice should be done with caution.  相似文献   

15.
Two contrasting and very different proposals have been put forward to account for the evolutionary relationships among prokaryotes. The currently widely accepted three domain proposal by Woese et al. (Proc. Natl. Acad. Sci. USA (1990) 87: 4576-4579) calls for the division of prokaryotes into two primary groups or domains, termed archaebacteria (Archaea) and eubacteria (Bacteria), both of which are suggested to have originated independently from a universal ancestor. However, this proposal, which is based primarily on genes involved in the information transfer processes, is inconsistent with the ultrastructural characteristics of prokaryotes as well as with many gene phylogenies and provides no explanation as to how the structural and molecular differences seen between these groups arose and how other prokaryotic taxa are related or evolved from the common ancestor. It also postulates that the last common ancestor of all organisms was a hypothetical entity lacking a cell membrane, which is contrary to the basic requirement of a cell membrane to define and separate all forms of life from the surrounding environment. A second alternate proposal for the evolutionary relationships among prokaryotes has emerged from extensive analyses of numerous conserved inserts and deletions found in various proteins (Gupta, R. S., Microbiol. Mol. Biol. Rev. (1998)62: 1435-1491; FEMS Microbiol. Rev. (2000) 24: in press. This proposal points to a specific relationship between archaebacteria and gram-positive bacteria, both of which are prokaryotes bounded by a single cell membrane (monoderm prokaryotes). Gram-negative bacteria, which are bounded by two different membranes (diderm prokaryotes), are indicated to comprise a structurally and phylogenetically distinct taxa originating from gram-positive bacteria. This proposal postulates that the earliest prokaryote was a gram-positive bacteria from which both archaebacteria and diderm prokaryotes evolved by normal evolutionary mechanisms in response to the strong selection pressure exerted by antibiotics produced by certain groups of gram-positive bacteria. This proposal accounts for both the molecular as well structural differences seen among the main groups of prokaryotes by known evolutionary mechanisms without invoking any hypothetical process or entity and thus is a closer representation of the natural relationships among prokaryotes than the proposal for two distinct domains. Based on this new proposal, it is now possible to logically deduce the branching order of different prokaryotic taxa from the common ancestor, which is as follows: Gram-positive bacteria (Low G + C) (<=> Archaebacteria) => Gram-positive bacteria (High G + C) (<=> Archaebacteria)=> Deinococcus-Thermus => Green nonsulfur bacteria => Cyanobacteria => Spirochetes => Chlamydia- Cytophaga-Green sulfur bacteria => Proteobacteria-1 (epsilon, delta)=> Proteobacteria-2 (alpha) => Proteobacteria-3 (beta) => Proteobacteria-4 (gamma). A surprising but very important aspect of the relationship deduced here is that the main eubacterial phyla are related to each other linearly rather than in a tree-like manner, suggesting that the major evolutionary changes within prokaryotes (bacteria) have occurred in a directional manner.  相似文献   

16.
Beet curly top Iran virus (BCTIV) was previously reported as a distinct curtovirus in Iran. Complete nucleotide sequences of three BCTIV isolates, one each from central, southern, and south eastern Iran were determined to be 2844, 2844, and 2845 nt long, respectively. BCTIV shared highest nucleotide sequence identity (52.3%) with Spinach curly top virus (SpCTV) and lowest identity (46.6%) with Horseradish curly top virus (HrCTV). The BCTIV genome comprises three virion-sense (V1, V2, and V3) and two complementary-sense (C1 and C2) ORFs. ORFs C3 and C4 were not found in BCTIV genome. Based on a comparison of nucleotide sequence identity of individual genes, the three virion-sense ORFs were 72.7–79.9% related to the corresponding ORFs of curtoviruses, whereas no significant relationship was found between the C1 and C2 ORFs of BCTIV and curtoviruses. These two ORFs, however, were only distantly related with those of mastreviruses. Similar to the latter viruses, the BCTIV genome comprises two intergenic regions. The BCTIV large intergenic region included a sequence capable of forming a stem loop structure and a novel nonanucleotide (TAAGATT/CC) with a unique nick site. Phylogenetic analysis using deduced amino acid sequence of individual ORFs revealed that the V2 and V3 ORFs are monophyletic and the V1 ORF is classified with the related ORF of curtoviruses. Whereas the two complementary-sense ORFs are grouped with those of mastreviruses. Computer-based prediction suggested that BCTIV has a chimeric genome which may have arisen by a recombination event involving curto- and mastrevirus ancestors. Percent nucleotide sequence identities of the coat protein gene of ten isolates of BCTIV, collected from a wide range of geographical regions in Iran, varied from 87.1 to 99.9, with the isolates being distributed between two subgroups. Based on biological and molecular properties, BCTIV is proposed as a new member of the genus Curtovirus.  相似文献   

17.
18.
Proteome-scale studies of protein three-dimensional structures should provide valuable information for both investigating basic biology and developing therapeutics. Critical for these endeavors is the expression of recombinant proteins. We selected Caenorhabditis elegans as our model organism in a structural proteomics initiative because of the high quality of its genome sequence and the availability of its ORFeome, protein-encoding open reading frames (ORFs), in a flexible recombinational cloning format. We developed a robotic pipeline for recombinant protein expression, applying the Gateway cloning/expression technology and utilizing a stepwise automation strategy on an integrated robotic platform. Using the pipeline, we have carried out heterologous protein expression experiments on 10,167 ORFs of C. elegans. With one expression vector and one Escherichia coli strain, protein expression was observed for 4854 ORFs, and 1536 were soluble. Bioinformatics analysis of the data indicates that protein hydrophobicity is a key determining factor for an ORF to yield a soluble expression product. This protein expression effort has investigated the largest number of genes in any organism to date. The pipeline described here is applicable to high-throughput expression of recombinant proteins for other species, both prokaryotic and eukaryotic, provided that ORFeome resources become available.  相似文献   

19.
Two contrasting and very different proposals have been put forward to account for the evolutionary relationships among prokaryotes. The currently widely accepted three domain proposal by Woese et al. (Proc. Natl. Acad. Sci. USA (1990) 87: 4576-4579) calls for the division of prokaryotes into two primary groups or domains, termed archaebacteria (Archaea) and eubacteria (Bacteria), both of which are suggested to have originated independently from a universal ancestor. However, this proposal, which is based primarily on genes involved in the information transfer processes, is inconsistent with the ultrastructural characteristics of prokaryotes as well as with many gene phylogenies and provides no explanation as to how the structural and molecular differences seen between these groups arose and how other prokaryotic taxa are related or evolved from the common ancestor. It also postulates that the last common ancestor of all organisms was a hypothetical entity lacking a cell membrane, which is contrary to the basic requirement of a cell membrane to define and separate all forms of life from the surrounding environment. A second alternate proposal for the evolutionary relationships among prokaryotes has emerged from extensive analyses of numerous conserved inserts and deletions found in various proteins (Gupta, R. S., Microbiol. Mol. Biol. Rev. (1998) 62: 1435-1491; FEMS Microbiol. Rev. (2000) 24: in press. This proposal points to a specific relationship between archaebacteria and Gram-positive bacteria, both of which are prokaryotes bounded by a single cell membrane (monoderm prokaryotes). Gram-negative bacteria, which are bounded by two different membranes (diderm prokaryotes), are indicated to comprise a structurally and phylogenetically distinct taxa originating from Gram-positive bacteria. This proposal postulates that the earliest prokaryote was a Gram-positive bacteria from which both archaebacteria and diderm prokaryotes evolved by normal evolutionary mechanisms in response to the strong selection pressure exerted by antibiotics produced by certain groups of gram-positive bacteria. This proposal accounts for both the molecular as well structural differences seen among the main groups of prokaryotes by known evolutionary mechanisms without invoking any hypothetical process or entity and thus is a closer representation of the natural relationships among prokaryotes than the proposal for two distinct domains. Based on this new proposal, it is now possible to logically deduce the branching order of different prokaryotic taxa from the common ancestor, which is as follows: Gram-positive bacteria (Low G + C) (? Archaebacteria) ? Gram-positive bacteria (High G + C) (? Archaebacteria)? Deinococcus-Thermus ? Green nonsulfur bacteria ? Cyanobacteria ? Spirochetes ? Chlamydia- Cytophaga-Green sulfur bacteria ? Proteobacteria-1 (ε, δ)? Proteobacteria-2 (α) ? Proteobacteria-3 (β) ? Proteobacteria-4 (γ). A surprising but very important aspect of the relationship deduced here is that the main eubacterial phyla are related to each other linearly rather than in a tree-like manner, suggesting that the major evolutionary changes within prokaryotes (bacteria) have occurred in a directional manner.  相似文献   

20.
目的 :构建重组人淋巴毒素α缺失体 (rhLT αΔN2 7)的原核表达载体 ,在大肠杆菌中进行表达 ,建立纯化rhLT αΔN2 7的工艺。方法 :从Jurkat细胞中提取总RNA ,用RT PCR扩增rhLT αΔN2 7基因 ,并插入原核表达载体 pET 2 3b中 ,转化大肠杆菌BL2 1(DE3) ,用IPTG诱导rhLT αΔN2 7表达。包涵体经洗涤和复性后 ,用DEAESepharoseFF和Phenyl SepharoseFF纯化。结果 :rhLT αΔN2 7以包涵体的形式表达 ,表达量占菌体总蛋白的 30 %以上。纯化后 ,rhLT αΔN2 7的纯度达 99% ,比活性高于 8× 10 7U/mg。纯化样品的相对分子质量 (Mr)、等电点 ,以及N端序列等其他理化性质均同预计的结果相符。结论 :构建了rhLT αΔN2 7的表达载体 ,并成功地在大肠杆菌中进行了表达 ,建立了rhLT αΔN2 7的纯化工艺  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号