首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
A recent paper introduced the approach of using nonlinear system identification as a means for automatically classifying protein sequences into their structure/function families. The particular technique utilized, known as parallel cascade identification (PCI), could train classifiers on a very limited set of exemplars from the protein families to be distinguished and still achieve impressively good two-way classifications. For the nonlinear system classifiers to have numerical inputs, each amino acid in the protein was mapped into a corresponding hydrophobicity value, and the resulting hydrophobicity profile was used in place of the primary amino acid sequence. While the ensuing classification accuracy was gratifying, the use of (Rose scale) hydrophobicity values had some disadvantages. These included representing multiple amino acids by the same value, weighting some amino acids more heavily than others, and covering a narrow numerical range, resulting in a poor input for system identification. This paper introduces binary and multilevel sequence codes to represent amino acids, for use in protein classification. The new binary and multilevel sequences, which are still able to encode information such as hydrophobicity, polarity, and charge, avoid the above disadvantages and increase classification accuracy. Indeed, over a much larger test set than in the original study, parallel cascade models using numerical profiles constructed with the new codes achieved slightly higher two-way classification rates than did hidden Markov models (HMMs) using the primary amino acid sequences, and combining PCI and HMM approaches increased accuracy. © 2000 Biomedical Engineering Society. PAC00: 8714Ee, 8715Cc, 3620Fz, 8715Aa  相似文献   

2.
Parallel cascade identification (PCI) is a method for approximating the behavior of a nonlinear system, from input/output training data, by constructing a parallel array of cascaded dynamic linear and static nonlinear elements. PCI has previously been shown to provide an effective means for classifying protein sequences into structure/function families. In the present study, PCI is used to distinguish proteins that are binding to adenosine triphosphate or guanine triphosphate molecules from those that are nonbinding. Classification accuracy of 87.1% using the hydrophobicity scale of Rose et al. (Hydrophobicity of amino acid residues in globular proteins. Science 229:834–838, 1985), and 88.8% using Korenberg's SARAH1 scale, are obtained, as measured by tenfold cross-validation testing. Nearest-neighbor and K-nearest-neighbor (KNN) classifiers are constructed, and the resulting accuracy is, respectively, 88.0% and 90.8% on the SARAH1–encoded test data set, as measured by the above testing protocol. Significantly improved classification accuracy is achieved by combining PCI and KNN classifiers using quadratic discriminant analysis: accuracy rises from 87.9% (PCI) and 87.4% (KNN) to 96.5% for the combination, as measured by twofold cross-validation testing on the SARAH1–encoded test data set. © 2003 Biomedical Engineering Society. PAC2003: 8714Ee, 8715Cc, 8715Aa  相似文献   

3.
More than 30% of Duchenne and Becker muscular dystrophy (DMD/BMD) patients have no gross DNA rearrangements like deletions or duplications. The large size of the coding sequence of the dystrophin gene (11 kilobases) complicates systematic identification of point mutations. Recently reported approaches based on genomic DNA or mRNA show that chemical cleavage of mismatches is an effective but time consuming and technically demanding method for the identification of point mutations in the human dystrophin gene. We have used a fast and convenient system consisting of PCR amplification of genomic DNA, non-isotopic SSCP analysis, and direct sequencing of PCR products for the detection of mutations in exon 13 and adjacent intron sequences. Sixty-eight DMD patients without detectable deletions or duplications were analysed, resulting in the identification of a point mutation in the coding sequence and two polymorphisms in the 5' flanking intron. The C to T change of the first nucleotide in the third triplet leads to a stop codon and seems to be the cause of the functional deficiency of the gene product in this patient.  相似文献   

4.
S100A6 has been implicated in a variety of biological functions as well as tumorigenesis. In this study, we investigated the expression status of S100A6 in relation to the clinicopathological features and prognosis of patients with gastric cancer and further explored a possible association of its expression with epigenetic regulation. S100A6 expression was remarkably increased in 67.5% of gastric cancer tissues as compared with matched noncancerous tissues. Statistical analysis demonstrated a clear correlation between high S100A6 expression and various clinicopathological features, such as depth of wall invasion, positive lymph node involvement, liver metastasis, vascular invasion, and tumor-node metastasis stage (P < 0.05 in all cases), as well as revealed that S100A6 is an independent prognostic predictor (P = 0.026) significantly related to poor prognosis (P = 0.0004). Further exploration found an inverse relationship between S100A6 expression and the methylation status of the seventh and eighth CpG sites in the promoter/first exon and the second to fifth sites in the second exon/second intron. In addition, the level of histone H3 acetylation was found to be significantly higher in S100A6-expressing cancer cells. After 5-azacytidine or trichostatin A treatment, S100A6 expression was clearly increased in S100A6 low-expressing cells. In conclusion, our results suggested that S100A6 plays an important role in the progression of gastric cancer, affecting patient prognosis, and is up-regulated by epigenetic regulation.S100A6, also known as calcyclin, is a low-molecular-weight acidic protein containing 2 EF-hand calcium-binding motifs.1,2 It was discovered on the basis of its cell cycle-dependent expression and is preferentially expressed in G1 phase of the cell cycle after mitogenic stimuli.3,4 S100A6 is a member of the S100 family, which are found localized to the cytoplasm and/or nucleus in a wide range of cell types.5 S100A6 may interact with binding or target proteins, thereby regulating dynamics of cytoskeleton constituents, cell growth and differentiation, and calcium homeostasis.5,6,7,8,9,10Subsequent studies showed that S100A6 may also be involved in the regulation of cancer progression.11 The deregulation of S100A6 expression during malignant transformation has thus far been described in human pancreatic cancer, malignant thyroid neoplasms, malignant melanoma, breast cancer, hepatocellular carcinoma, lung cancer, prostate cancer, and colorectal carcinoma.12,13,14,15,16,17,18,19,20,21 S100A6 overexpression has been reported to link with metastasis in colon cancer14,15 and is a well-established marker of melanoma in which its level correlates with tumor invasiveness and poor prognosis. Although the high expression of S100A6 was reported in gastric cancer, its correlation with patient prognosis and clinicopathological features has not been fully investigated.22 Like other S100 proteins, S100A6 may promote cancer progression through specific roles in cell survival and apoptotic pathways,6 however the exact mechanism is unclear.In this study, we performed a detailed analysis of S100A6 expression in primary gastric cancer, matched metastatic lymph nodes, and liver metastatic nodules. Then we analyzed the relationship between S100A6 overexpression and clinicopathological features and patients prognosis. And to gain insight into the mechanism of the regulation of S100A6 expression in gastric cancer, we examined DNA methylation and histone modifications along the S100A6 gene, which may affect S100A6 expression in cancer cell lines by previous reports.23,24  相似文献   

5.
Spontaneous thioguanine-resistant mutants were derived from populations of finite-life-span, diploid human fibroblasts by means of a fluctuation analysis. cDNA was prepared from mutantHPRT mRNA and amplified by the polymerase chain reaction, and the sequence of the product was analyzed. Exon deletions, which very likely arose from mutations in the intron splice site consensus sequences, were found in 10 of the 37 mutants examined (27% of the total). Among the 28 mutations in the coding sequence, base pair substitutions predominated (89%). With the exception of one base pair involved in a tandem mutation, all base pair substitutions resulted in alterations in the predicted amino acid sequence of the protein. In addition there were three frameshift mutations, consisting of the deletion of one or two base pairs. Although mutations occurred throughout the coding sequence, 50% (14/28) were found in the 5 portion of exon 3.  相似文献   

6.
A 438 basepair intron 1 sequence adjacent to exon 2 in the human major histocompatibility complex DQA1 gene defined 16 allelic variants in 69 individuals from wide ethnic backgrounds. In contrast, the most variable coding region spanned by the 247 basepair exon 2 defined 11 allelic variants. Our phylogenetic human intron 1 tree derived by the Bootstrap algorithm reflects the same relative allelic relationships as the reported DQA1 exon 2 tree [Gyllensten and Erlich, Hum Immunol 36:1–10, 1989]. Thus 3′ DQA1 intron 1 and exon 2 have cosegregated since divergence of the human races. Comparison of human alleles to a Rhesus monkey DQA1 first intron sequence found only 10 nucleotide substitutions unique to Rhesus, with the other 428 positions (98%) found in at least one human allele. This high degree of homology reflects the evolutionary stability of intron sequences since these two species diverged over 20 million years ago. Because more intron 1 alleles exist than exon 2 alleles, these polymorphic introns can be used to improve tissue typing for transplantation, paternity testing, and forensics and to derive more complete phylogenetic trees. These results suggest that introns represent a previously underutilized polymorphic resource. © 1994 Wiley-Liss, Inc.  相似文献   

7.
The Zygnematales (Charophyta) contain a group-I intron (subgroup ICl) within their nuclear-encoded small subunit ribosomal DNA (SSU rDNA) coding region. This intron, which is inserted after position 1506 (relative to the SSU rDNA ofEscherichia coli), is proposed to have been vertically inherited since the origin of the Zygnematales approximately 350–400 million years ago. Primary and secondary structure analyses were carried out to model group-I intron evolution in the Zygnematales. Secondary structure analyses support genetic data regarding sequence conservation within regions known to be functionally important for in vitro self-splicing of group-I introns. Comparisons of zygnematalean group-I intron secondary structures also provided some new insights into sequences that may have important roles in in vivo RNA splicing. Sequence analyses showed that sequence divergence rates and the nucleotide compositions of introns and coding regions within any one taxon varied widely, suggesting that the 1506 group-I introns and rDNA coding regions in the Zygnematales evolve independently.  相似文献   

8.

Background

In the spectrum of molecular alterations found in hepatocellular carcinoma (HCC), somatic mutations in the WNT/β-catenin pathway and the p53/cell cycle control pathway are among the most frequent ones. It has been suggested that both mutations occur in a mutually exclusive manner and they are used as molecular classifiers in HCC classification proposals.

Case presentation

Here, we report the case of a treatment-naïve mixed hepatocellular/cholangiocellular carcinoma (HCC/CCC) with morphological and genetic intratumor heterogeneity. Within the predominant part of the tumor with hepatocellular differentiation, a p.D32V mutation in exon 3 of the CTNNB1 gene occurred concomitantly with a TP53 intron 7/exon 8 splice site mutation.

Conclusion

Intratumor heterogeneity challenges the concept of CTNNB1 and TP53 gene mutations being mutually exclusive molecular classifiers in HCC, which has implications for HCC classification approaches.
  相似文献   

9.
10.
Summary Recently, the nucleotide sequences for three mitochondrial plasmids associated with senescence of Podospora anserina were determined (Cummings et al. 1985). One of these sequences, corresponding to the plasmid termed senDNA, contains three class I introns, all within a protein coding sequence equivalent to the mammalian URF1 gene. Here, we present primary and secondary structure analyses for two of these introns as well as a partial analysis for the third, which extends beyond the DNA sequence determined. With regard to both primary and secondary structure, the closest known relative of intron 1 is the self-splicing intron in the large ribosomal RNA gene of Tetrahymena. One secondary structure domain at the periphery of intron 1 and Tetrahymena models is also present in intron 2. The latter intron is the longest known class I member and contains remnants of two protein-coding sequences, one of which is split by the other. Evolutionary processes that might be responsible for the unusual structure of introns 1 and 2 are discussed.  相似文献   

11.
Summary Two unexpectedly small mitochondrial (mt) genomes of Coprinus cinereus, P and S, were compared with the H and J genomes we have described previously. H and J are 42 kb in size and differ in having alternative 1.23 kb insertions in or adjacent to the co-1 gene. P and S DNAs lacked both insertions and had an identical 4.4 kb deletion between the co-1 and L-RNA gene. P DNA contained a 700 by insertion and S DNA a 300 by deletion within a sequence coding the L-RNA gene. This was shown by Southern blot analysis using probes containing the 5 or the 3 exon sequences of the L-RNA gene of Neurospora crassa. These hybridisations showed also that the L-RNA gene and co-1 gene in the C. cinereus mt genome are oppositely orientated and must be transcribed from different DNA strands. No DNA homology was detected using probes containing intron sequences from the L-RNA genes of Saccharomyces cerevisiae or N. crassa. There was no evidence of respiratory deficiency in P and S strains and transfer of nuclei by dikaryon formation made it possible to recombine H nuclei with P and S mitochondria, S nuclei with H and P mitochondria and P nuclei with H mitochondria with no apparent detrimental effect on growth. We conclude that P and S mtDNAs represent naturally occurring variants of the C. cinereus mt genome.  相似文献   

12.
The restriction site mutation (RSM) assay was used to studythe mutational sensitivities of three target regions of themurine p53 gene. The non-coding intron 6 target region was comparedwith the coding regions exon 4 and exon 5 with respect to theirrelative sensitivity to the induction of mutations by 1,2-dimethylhydrazine(DMH). Our results demonstrated that the majority of inducedmutations detected were in the intron 6 gene region. A totalof 15 enzyme-resistant restriction sites were detected in DMHtreated mice, nine of these in the intron 6 region, four inthe exon 4 region and two in the exon 5 region. The elevated sensitivity of the intron 6 region was exemplified by our detectionof spontaneous mutations in this region; two resistant restrictionsites were detected in untreated animals. No spontaneous mutationswere detected in either of the exon sequences studied here,nor have any been detected in exon targets in our previous invivo RSM analyses. The mutations induced by DMH were mostlyGC  相似文献   

13.
Parallel Cascade Identification (PCI) has been successfully applied to build dynamic nonlinear systems that address diverse challenges in the field of bioinformatics. PCI may be used to identify either single-input single-output (SISO) or multi-input single-output (MISO) models. Although SISO PCI models have typically sufficed, it has been suggested that MISO PCI systems could also be used to form bioinformatics classifiers, and indeed they were successfully applied in one study. This paper reports on the first systematic comparison of MISO and SISO PCI classifiers. Motivation for using the MISO structure is given. The construction of MISO parallel cascade models is also briefly reviewed. In order to compare the accuracy of SISO and MISO PCI classifiers, genetic algorithms are applied to optimize the model architecture on a number of equivalent single-input and multi-input biological training datasets. Through evaluation of both model structures on independent test datasets, we establish that MISO PCI is capable of building classifiers of equal accuracy to those resulting from SISO PCI models. Moreover, we discuss and illustrate the benefits of the MISO approach, including significant reduction in training and testing times, and the ability to adjust automatically the weighting of individual inputs according to information content.  相似文献   

14.
Summary Earlier, we reported that the ND1 mitochondrial gene of Podospora anserina is mosaic, containing at least three class I introns. We have now completed the sequence of the ND1 gene and have determined that it contains four class I introns of 1,820, 2,631, 2,256 and 2,597 by with the entire gene complex containing 10,505 bp, only 1,101 of which are exon sequences. Introns 1 and 3 appear to be related in that their open reading frames (ORFS) exhibit extensive amino acid sequence similarity and like the URFN sequence from Neurospora crassa have multiple sequence repetitions. Introns 2 and 4 are similar in that both appear to be mosaic introns. Where intron 2 has many short ORFS, intron 4 has two, 391 and 262 as respectively. The first ORF has some patch work sequence similarity with one of the intron 2 ORFs but the second ORF is strikingly similar to the single intron ORF in the ND1 gene of N. crassa. Just upstream of the sequences necessary to form the central core of the P. anserina intron 4 secondary structure, there is a 17 bp sequence which is an exact replica of the exon sequence abutting the 5 flank of the 1,118 by N. crassa ND1 intron. Secondary structure analysis suggests that the 2,597 by intron 4 can fold as an entity but a similar structure can be constructed just from an 1,130 bp portion by utilizing the 17 bp element as an alternate splice site. Detailed structural analysis suggests that intron 4 (as well as the single ND1 intron from N. crassa) can utilize helical configurations which bring the downstream open reading frame into juxtaposition with the exon sequences.  相似文献   

15.
Glutathione S-transferases (GSTs) are soluble dimeric proteins that are involved in the metabolism, detoxification, and excretion of a large number of endogenous and exogenous compounds such as insecticides from the cell. In the current study, field specimens of Anopheles stephensi Liston, Anopheles fluviatilis James, and Anopheles culicifacies Giles collected from Sistan and Baluchistan province in Iran and subjected to World Health Organization susceptibility test. Only An. stephensi was resistant to 4% DDT. DNA extraction and rDNA-ITS2-polymerase chain reaction (PCR) for correct species identification, followed by amplification of GSTe2 gene, including exon I and II and full sequence of intron I, identified a 500-bp fragment in these three species. These fragments were purified and sequenced from both ends. The comparison of coding sequence of GSTe2 gene between these species and with Anopheles gambiae Giles showed 82 to 86% similarity at nucleic acid levels and identified nucleotide polymorphisms within An. culicifacies and An. stephensi populations. Species-specific differences have been detected in intron I of GSTe2 gene. This is in concordance with the previous studies and confirmed the conserved nature of intron sequence in GSTe2 gene of each species, probably useful as a molecular marker for species-specific identification. Phylogenetic analysis based on rDNA-ITS2, and coding (exon I and II) and noncoding sequences of GSTe2, showed the systematic relatedness between Iranian malaria vectors and the possibility of using these sequences in both differentiation of Anopheles species and defining their evolutionary relationship with the only available GSTe2 sequence of An. gambiae. These data may be useful for implementation and evaluation of malaria control programs in aspects of population genetics and molecular resistance.  相似文献   

16.
HLA class-II proteins are cell-surface molecules that present antigens to T cells, and their expressional regulation is crucial to the immune reaction. Sequence variation at the regulatory region can directly affect the gene expression level. We cloned and sequenced a 4.7-kb region containing the regulatory region, exon1, and partial intron1 of both HLA-DPA1 and DPB1 genes in 25 variable sequences from southern Chinese ethnic groups and got a high-density map of 162 single nucleotide polymorphisms (SNPs): seven in 5-flanking regions, four in 5-untranslated regions, and four in the coding regions. By comparing these data with SNPs in dbSNP database in the NCBI, 145 SNPs (89.5%) were novel. In addition, eight genetic variations of insertion-deletion polymorphisms (INDELs) were discovered within the 4.7-kb region. These high-resolution maps can be used as resources of markers for association studies of complex diseases, assessment of individuals predisposition to diseases, and tailoring of therapies, as well as research markers for population genetics and evolution.  相似文献   

17.
Surprisingly half of all severe haemophilia A patients haveno mutation in the promoter, coding sequences and normal RNAprocessing signals of the factor VIII gene. Instead they manifesta unique mRNA defect that prevents the amplification of themessage across the boundary between exon 22 and 23. This locatesthe defect to internal regions of intron 22. Novel sequences3' to exon 22 were isolated from the 9 availlable patients withthe above abnormality by combining RACE and vectorette amplificationson trace amounts of mRNA. This showed that exons 1 – 22of the factor VIII mRNA had become part of a hybrid messagecontaining new multi exonic sequences expressed In normal cells.The novel sequences were not located in a YAC covering the wholefactor VIII gene. Southern blots from patients probed by novelsequences and clones covering intron 22 showed no obvious abnormalities.This suggested inversions involving intron 22 repeated sequences.Screening of 3 YAC libraries with the novel sequences locatedthem at least 200 kb telomeric (5') to factor VIII and pulsedfield gel analysis detected abnormal bands in patients. Thisdemonstrates that the mutations in the patients are inversionsof long DNA regions possibly involving the repeated sequencesand occurring at the surprising rate of approximately 4 x 10–6per gene per gamete per generation.  相似文献   

18.
Thymic Fas-ligand (FasL) cDNA and hepatic FasL genomic sequences were obtained from a 2-month-old LW pig. From these nucleotide sequences, amino acid sequence was deduced and compared with FasL sequences obtained from various animals. This comparison reveals that porcine FasL is closer to that of human, macaca and cat, and differs more from mouse and rat. The extracelluar domains of porcine and human FasL proteins appear to be functionally compatible. The complete genomic DNA sequence of porcine FasL was also compared with its human counterpart. Exons showed 80-89% nucleotide homology between pig and human, while introns showed 64-69% nucleotide homology. Sequence comparison by Harr plot analysis revealed many stretches within introns having identical sequences, suggesting that the sites may have unidentified common functions. One potential extra exon between exons 2 and 3 was located within porcine intron 2. This potential exon has no counterpart in human FasL intron 2. Whether or not this extra exon can be expressed and could cause additional immunological responses remains to be investigated. For future xenotransplantation, it is important to compare porcine and human genomic sequences, and to investigate their system compatibilities.  相似文献   

19.
Summary The Aspergillus nidulans 3-phosphoglycerate kinase gene (PGK) has been isolated from a phage genomic library, using the equivalent yeast gene as a hybridization probe. The location of the PGK gene within the cloned DNA has been physically mapped. The DNA sequence of a small region of the putative PGK has been determined and found to code for amino acids corresponding to the N-terminal end of the PGK protein. In contrast to the yeast PGK gene the Aspergillus gene contains a 57 base pair intron occurring between the coding sequences for amino acid 22 and 23.A DNA fragment encompassing the PGK gene was shown to hybridize a 1,700 base poly(A) mRNA, sufficient to encode the PGK polypeptide.  相似文献   

20.
Summary The complete genomic sequence of the inducible Chlorella kessleri H+/hexose cotransporter (HUP1) has been obtained from two overlapping clones isolated from a gt10 library. The HUP1 gene is interrupted by 14 introns with the first intron being located in the 5-untranslated part of the gene. The average intron length is 220 bp, yielding a very regular intron/exon pattern in the gene. The codon usage in this gene is strongly biased with a clear preference for C and a strong suppression of A. A consensus sequence for a putative algal polyadenylation sequence is shown and compared with other algal cDNA sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号