首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A recent paper introduced the approach of using nonlinear system identification as a means for automatically classifying protein sequences into their structure/function families. The particular technique utilized, known as parallel cascade identification (PCI), could train classifiers on a very limited set of exemplars from the protein families to be distinguished and still achieve impressively good two-way classifications. For the nonlinear system classifiers to have numerical inputs, each amino acid in the protein was mapped into a corresponding hydrophobicity value, and the resulting hydrophobicity profile was used in place of the primary amino acid sequence. While the ensuing classification accuracy was gratifying, the use of (Rose scale) hydrophobicity values had some disadvantages. These included representing multiple amino acids by the same value, weighting some amino acids more heavily than others, and covering a narrow numerical range, resulting in a poor input for system identification. This paper introduces binary and multilevel sequence codes to represent amino acids, for use in protein classification. The new binary and multilevel sequences, which are still able to encode information such as hydrophobicity, polarity, and charge, avoid the above disadvantages and increase classification accuracy. Indeed, over a much larger test set than in the original study, parallel cascade models using numerical profiles constructed with the new codes achieved slightly higher two-way classification rates than did hidden Markov models (HMMs) using the primary amino acid sequences, and combining PCI and HMM approaches increased accuracy. © 2000 Biomedical Engineering Society. PAC00: 8714Ee, 8715Cc, 3620Fz, 8715Aa  相似文献   

2.
We consider the representation and identification of nonlinear systems through the use of parallel cascades of alternating dynamic linear and static nonlinear elements. Building on the work of Palm and others, we show that any discrete-time finite-memory nonlinear system having a finite-order Volterra series representation can be exactly represented by a finite number of parallel LN cascade paths. Each LN path consists of a dynamic linear system followed by a static nonlinearity (which can be a polynomial). In particular, we provide an upper bound for the number of parallel LN paths required to represent exactly a discrete-time finite-memory Volterra functional of a given order. Next, we show how to obtain a parallel cascade representation of a nonlinear system from a single input-output record. The input is not required to be Gaussian or white, nor to have special autocorrelation properties. Next, our parallel cascade identification is applied to measure accurately the kernels of nonlinear systems (even those with lengthy memory), and to discover the significant terms to include in a nonlinear difference equation model for a system. In addition, the kernel estimation is used as a means of studying individual signals to distinguish deterministic from random behaviour, in an alternative to the use of chaotic dynamics. Finally, an alternate kernel estimation scheme is presented.  相似文献   

3.
Parallel cascade identification (PCI) is a method for approximating the behavior of a nonlinear system, from input/output training data, by constructing a parallel array of cascaded dynamic linear and static nonlinear elements. PCI has previously been shown to provide an effective means for classifying protein sequences into structure/function families. In the present study, PCI is used to distinguish proteins that are binding to adenosine triphosphate or guanine triphosphate molecules from those that are nonbinding. Classification accuracy of 87.1% using the hydrophobicity scale of Rose et al. (Hydrophobicity of amino acid residues in globular proteins. Science 229:834–838, 1985), and 88.8% using Korenberg's SARAH1 scale, are obtained, as measured by tenfold cross-validation testing. Nearest-neighbor and K-nearest-neighbor (KNN) classifiers are constructed, and the resulting accuracy is, respectively, 88.0% and 90.8% on the SARAH1–encoded test data set, as measured by the above testing protocol. Significantly improved classification accuracy is achieved by combining PCI and KNN classifiers using quadratic discriminant analysis: accuracy rises from 87.9% (PCI) and 87.4% (KNN) to 96.5% for the combination, as measured by twofold cross-validation testing on the SARAH1–encoded test data set. © 2003 Biomedical Engineering Society. PAC2003: 8714Ee, 8715Cc, 8715Aa  相似文献   

4.
The mitochondrial genome displays a highly plastic architecture in the green algal division comprising the classes Prasinophyceae, Trebouxiophyceae, Ulvophyceae, and Chlorophyceae (Chlorophyta). The compact mitochondrial DNAs (mtDNAs) of Nephroselmis (Prasinophyceae) and Prototheca (Trebouxiophyceae) encode about 60 genes and have been ascribed an ‘ancestral’ pattern of evolution, whereas those of chlorophycean green algae are much more reduced in gene content and size. Although the mtDNA of the early-diverging ulvophyte Pseudendoclonium contains 57 conserved genes, it differs from ‘ancestral’ chlorophyte mtDNAs by its unusually large size (96 kb) and long intergenic spacers. To gain insights into the evolutionary trends of mtDNA in the Ulvophyceae, we have determined the complete mtDNA sequence of Oltmannsiellopsis viridis, an ulvophyte belonging to a distinct, early-diverging lineage. This 56,761 bp genome harbours 54 conserved genes, numerous repeated sequences, and only three introns. From our comparative analyses with Pseudendoclonium mtDNA, we infer that the mitochondrial genome of the last common ancestor of the two ulvophytes closely resembled that of the trebouxiophyte Prototheca in terms of gene content and gene density. Our results also provide strong evidence for the intracellular, interorganellar transfer of a group I intron and for two distinct events of intercellular, horizontal DNA transfer.Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

5.
本实验利用DNA免疫技术 ,将PCR得到的疟原虫环子孢子蛋白 (CSP )不同长度的重复序列基因克隆到pcDNA3质粒中 ,对小鼠进行肌肉注射 ,经过免疫检测 ,结果证明不同长度CSP的重复序列能够单独诱发机体全面的细胞免疫和体液免疫 ,这为今后疟疾疫苗的发展提供了新的思路。  相似文献   

6.
DNA sequencing reveals that the genomes of the human, gorilla and chimpanzee share more than 98% homology. Comparative chromosome painting and gene mapping have demonstrated that only a few rearrangements of a putative ancestral mammalian genome occurred during great ape and human evolution. However, interspecies representational difference analysis (RDA) of the gorilla between human and gorilla revealed gorilla-specific DNA sequences. Cloning and sequencing of gorilla-specific DNA sequences indicate that there are repetitive elements. Gorilla-specific DNA sequences were mapped by fluorescence in-situ hybridization (FISH) to the subcentromeric/centromeric regions of three pairs of gorilla submetacentric chromosomes. These sequences could represent either ancient sequences that got lost in other species, such as human and orang-utan, or, more likely, recent sequences which evolved or originated specifically in the gorilla genome.  相似文献   

7.
Parallel Cascade Identification (PCI) has been successfully applied to build dynamic nonlinear systems that address diverse challenges in the field of bioinformatics. PCI may be used to identify either single-input single-output (SISO) or multi-input single-output (MISO) models. Although SISO PCI models have typically sufficed, it has been suggested that MISO PCI systems could also be used to form bioinformatics classifiers, and indeed they were successfully applied in one study. This paper reports on the first systematic comparison of MISO and SISO PCI classifiers. Motivation for using the MISO structure is given. The construction of MISO parallel cascade models is also briefly reviewed. In order to compare the accuracy of SISO and MISO PCI classifiers, genetic algorithms are applied to optimize the model architecture on a number of equivalent single-input and multi-input biological training datasets. Through evaluation of both model structures on independent test datasets, we establish that MISO PCI is capable of building classifiers of equal accuracy to those resulting from SISO PCI models. Moreover, we discuss and illustrate the benefits of the MISO approach, including significant reduction in training and testing times, and the ability to adjust automatically the weighting of individual inputs according to information content.  相似文献   

8.
Despite their common function, centromeric DNA sequences are not conserved between organisms. Most centromeres of animals and plants so far investigated have now been shown to consist of large blocks of tandemly repeated satellite sequences that are embedded in recombination-deficient heterochromatic regions. This central domain of satellite sequences that is postulated to mediate spindle attachment is surrounded by pericentromeric sequences incorporating various classes of repetitive sequences often including retroelements. The centromeric satellite DNA sequences are amongst the most rapidly evolving sequences and pose some fundamental problems of maintaining function. In this overview, we will discuss work on centromeric repetitive sequences in Arabidopsis thaliana and its relatives, and highlight some of the common features that are emerging when analysing closely related species.  相似文献   

9.
Summary The complete 94,192 bp sequence of the mitochondrial genome from race s of Podospora anserina is presented (1 kb=103 base pairs). Three regions unique to race A are also presented bringing the size of this genome to 100,314 bp. Race s contains 31 group I introns (33 in race A) and 2 group II introns (3 in race A). Analysis shows that the group I introns can be categorized according to families both with regard to secondary structure and their open reading frames. All identified genes are transcribed from the same strand. Except for the lack of ATPase 9, the Podospora genome contains the same genes as its fungal counterprts, N. crassa and A. nidulans. About 20% of the genome has not yet been identified. DNA sequence studies of several excision-amplification plasmids demonstrate a common feature to be the presence of short repeated sequences at both termini with a prevalence of GGCGCAAGCTC.  相似文献   

10.
Exposure to environmental toxicants and stressors, radiation, pharmaceutical drugs, inflammation, cellular respiration, and routine DNA metabolism all lead to the production of cytotoxic DNA strand breaks. Akin to splintered wood, DNA breaks are not “clean.” Rather, DNA breaks typically lack DNA 5′‐phosphate and 3′‐hydroxyl moieties required for DNA synthesis and DNA ligation. Failure to resolve damage at DNA ends can lead to abnormal DNA replication and repair, and is associated with genomic instability, mutagenesis, neurological disease, ageing and carcinogenesis. An array of chemically heterogeneous DNA termini arises from spontaneously generated DNA single‐strand and double‐strand breaks (SSBs and DSBs), and also from normal and/or inappropriate DNA metabolism by DNA polymerases, DNA ligases and topoisomerases. As a front line of defense to these genotoxic insults, eukaryotic cells have accrued an arsenal of enzymatic first responders that bind and protect damaged DNA termini, and enzymatically tailor DNA ends for DNA repair synthesis and ligation. These nucleic acid transactions employ direct damage reversal enzymes including Aprataxin (APTX), Polynucleotide kinase phosphatase (PNK), the tyrosyl DNA phosphodiesterases (TDP1 and TDP2), the Ku70/80 complex and DNA polymerase β (POLβ). Nucleolytic processing enzymes such as the MRE11/RAD50/NBS1/CtIP complex, Flap endonuclease (FEN1) and the apurinic endonucleases (APE1 and APE2) also act in the chemical “cleansing” of DNA breaks to prevent genomic instability and disease, and promote progression of DNA‐ and RNA‐DNA damage response (DDR and RDDR) pathways. Here, we provide an overview of cellular first responders dedicated to the detection and repair of abnormal DNA termini. Environ. Mol. Mutagen. 56:1–21, 2015. © 2014 Wiley Periodicals, Inc.  相似文献   

11.
12.
A DNA library derived from the B chromosome of Podisma kanoi was obtained by chromosome microdissection. A total of 153 DNA clones were isolated from the microdissected DNA library. Twenty of them were sequenced. A comparison of B chromosome DNA sequences with sequences of other species from the DDBJ/GenBank/EMBL database () was performed. Different patterns of signals were observed after FISH with labeled cloned DNA fragments. FISH signals with cloned DNA fragments painted either whole Bs or their different regions. Some clones also gave signals in pericentromeric regions of A chromosomes. Other cloned DNA fragments gave only background-like signals on A and B chromosomes. Comparative FISH analysis of B chromosomes in Podisma kanoi and P. sapporensis with DNA probes derived from the Bs of these species revealed homologous DNA that was confined within pericentromeric and telometric regions of the B chromosome in P. kanoi. In contrast to the B chromosomes in P. sapporensis containing large regions enriched with rDNA, only a small cluster of rDNA was detected in one of the examined B chromosomes in P. kanoi. The data strongly suggest an independent origin of B chromosomes in two closely related Podisma species.  相似文献   

13.
14.
Summary Two minicircular DNAs of 1.2 kb (K1) and 1.4 kb (K2) were found in mitochondria of fertile lupin (Lupinus albus). The plasmid-like DNA, K1, was cloned, labelled and hybridized with mitochondrial DNA from three different species of lupin. We have found no evidence for integrated copies of K1 in any of the mitochondrial genomes probed in this study. No sequence homology between plasmid K1 and K2, and no homology of either with chloroplast DNA, has been detected. The K1 DNA is two-fold more abundant than the K2 DNA and about seven-fold more abundant than a unique segment of the mtDNA. The entire nucleotide sequence of the K1 DNA has been determined. This sequence exibits a 340 base pair region with highly organized repeats. The sequence of K1 shows no substantial homology with sequence of other mitochondrial plasmids of higher plants.  相似文献   

15.
Ultrastructural analysis has been carried out on three Leishmania isolates which are proven causal agents of human cutaneous Leishmaniasis, L. tropicamajor, L. aethiopica and a unidentified species, Leishmania SP48. No significant differences in submicroscopic morphology have been found in thin-sectioned organisms from the three isolates. Extensive plate cristae have been observed within the mitochondria and connections between the rim of the kinetoplast nucleoid and the inner mitochondrial membrane noted.Kinetoplast DNA (kDNA) has been isolated from these isolates and from L. tarentolae and examined by protein monolayer spreading and darkfield electronmicroscopy. The basic molecular arrangement of isolated kDNA in the form of 5 μm networks of 0.28-0.3 μm mini-circles with long looped DNA in the interior and at the periphery of networks is similar in all isolates. Minor differences between L. aethiopica and SP48 compared with L. tropicamajor have been observed. The kDNAs of L. aethiopica and SP48 are identical morphologically.Buoyant density analysis has shown that kDNA from L. aethiopica and SP48 have identical values and these are different from the value for L. tropicamajor. The finding of similar buoyant densities for kDNA from L. tropicamajor and L. tarentolae also imply a sequence homology by this criteria which is refuted by the results given in the following paper.The results given in this and the following paper (Arnot, D.E. and Barker, D.C. (1981) Mol. Biochem. Parasitol. 3, 47–56 indicate that the unknown Leishmania SP48 is very closely related to, if not identical with, L. aethiopica. This finding is consistent with the clinical and ecological facts known for the organism SP48.  相似文献   

16.
Summary In our previous study of chloroplast (Cp) DNA replication in Chlamydomonas reinhardtii, one D-loop site with its flanking regions was cloned and sequenced. The D-loop site mapped by electron mircroscopy (EM) overlaps with an open reading frame (ORF) potentially coding for a polypeptide of 136 amino acids. In this report, the corresponding D-loop isolated from another species of Chlamydomonas was sequenced. An ORF was also detected. Sequence comparison indicated that most conserved sequences between these two cloned origins are located within the ORE Amino acid sequences of these two ORFs are highly conserved. The corresponding sequence for this ORF in the tobacco Cp genome was located by a Southern blotting analysis. Since the complete sequence data of Cp DNAs from a liverwort and from tobacco have been determined in 2 Japanese laboratories recently, it has been possible for us to show that this ORF encodes a protein homologous to the Cp ribosomal protein (r-protein) L16, by sequence comparison.  相似文献   

17.
18.
目的:阐述广义复杂度定义和意义,比较四种脊椎动物DNA序列复杂性,检验我们提出的复杂度定义的正确性和合理性。方法:以不动点,周期2,不等概周期4,混沌和随机序列等简明序列为计算例,以“Bach/猴子击琴”历史著名质疑悬案问题为判据,检验广义复杂度定义的正确性。以人、公牛、老鼠和母鸡四种脊椎动为应用例,以重复性复杂度GCR(2)为主要测度指标,以有序化信息量I=log4-H1和剩余度R=1-H1/log4为参考指标,按定义公式计算GCR(2)、I和R值。结果:在简明计算例中,完全随机序列和完全规则序列的GCR(2)=0,GCC(2)=0,GCT(2)=0,说明完全规则不复杂,完全随机亦不复杂,这完全符合“Bach/猴子击琴”疑案问题的要求,检验了我们提出的广义复杂度定义的正确性。在四种脊椎动物DNA序列之间,GCR(2)、I和R值没有数量级差异,而在数量GCR(2)、I和R之间有数量级差异,即GCR(2)■I(bit),GCR(2)■R(%),而且人的GCR(2),I和R数值远远大于其它三种脊椎动物。结果表明,脊椎动物进化基本上在同一水平上,但人的DNA序列的复杂性、有序化信息量和后备存储剩余度均高于其它脊椎动物。脊椎动物DNA序列的进化有个共同特点:在进化熵减过程中,重点不是放在碱基组成的有序化(I)上,而是放在碱基关联的复杂化GCR(2)上。所有这些结论均符合生物信息学生物分子进化规律,说明我们提出的广义复杂度定义是合理的。  相似文献   

19.
基因芯片技术检测鉴定临床常见致病真菌的初步研究   总被引:4,自引:0,他引:4  
目的为了快速、简便、高通量地鉴定临床常见致病真菌,建立了一种采用基因芯片技术对临床常见的致病真菌鉴定的分子生物学方法。方法以5.8S rDNA与28S rDNA间的内转录间区2(internal transcribed spacer-2,ITS-2)为靶标,针对待检的临床常见致病真菌设计合成一系列寡核苷酸探针,制成寡核苷酸芯片。待检真菌DNA经通用引物扩增标记后,与芯片杂交,对杂交图谱分析归纳,得到一套种特异性的典型杂交图谱。待检的样品菌与基因芯片杂交,得到的杂交结果与典型图谱比对即可判断出样品的种类。结果以涉及8个属20个种的标准致病真菌菌株对芯片的特异性、重复性、灵敏度进行考察,结果表明,该研究建立的基因芯片技术可以有效地区分20种临床常见致病真菌,特异性良好,重复性良好(信噪比CV<10%),灵敏度为15 pg/ml真菌DNA。收集从临床分离的84株致病真菌菌株对基因芯片进行试用,结果显示基因芯片的鉴定结果与常规鉴定方法的鉴定结果一致。结论这项技术的建立可以稳定、特异性地实现临床常见致病真菌的高通量鉴定,为进一步检测研究奠定了基础。  相似文献   

20.
Prognostic biomarkers for GIST are under investigation. The aim of this study was to assess whether exon 11 mutations, Ki67, and p16INK4A are predictors of prognosis in GIST. Consecutive GIST cases (n = 84) had their specimens evaluated for exon 11 mutations and expression of Ki67 and p16INK4A. Surgical cases were categorized according to NIH and Miettinen's classification, and survival was analyzed from hospital database. GISTs were predominately gastric (45%) and with spindle cell morphology (74%). The risk category was very low or low in 28%, intermediate in 23%, and high in 49%. Exon 11 mutation was identified in 29 (48%) out of 60 cases studied. There were 12 point mutations, 10 deletions, 4 duplications, and 3 double mutations. A third of GISTs had either high Ki67 index (>3%) or negativity for p16INK4A. In multivariate analysis, independent predictors of mortality were Ki67 > 3% (HR = 7.3; P = 0.036) and high mitotic index (HR = 10.4; P = 0.043). There was no association between exon 11 mutations and survival. This study suggests that Ki67 > 3% is an independent predictor of poor prognosis in patients with GIST. Exon 11 mutations and negativity for p16INK4A need further studies to address the prognostic value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号