首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Non‐protein‐coding RNAs have increasingly been shown to be an important class of regulatory RNAs having significant roles in regulation of gene expression. The long noncoding RNA (lncRNA) gene family presently constitutes a large number of noncoding RNA (ncRNA) loci almost equaling the number of protein‐coding genes. Nevertheless, the biological roles and mechanisms of the majority of lncRNAs are poorly understood, with exceptions of a very few well‐studied candidates. The availability of genome‐scale variation datasets, and increasing number of variant loci from genome‐wide association studies falling in lncRNA loci have motivated us to understand the patterns of genomic variations in lncRNA loci, their potential functional correlates, and selection in populations. In the present study, we have performed a comprehensive analysis of genomic variations in lncRNA loci. We analyzed for patterns and distributions of genomic variations with respect to potential functional domains in lncRNAs. The analysis reveals a distinct distribution of variations in subclasses of long ncRNAs and in potential functional domains of lncRNAs. We further examined signals of selections and allele frequencies of these prioritized set of lncRNAs. To the best of our knowledge, this is the first and comprehensive large‐scale analysis of genetic variations in long ncRNAs.  相似文献   

3.
4.
5.
Genome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process. We compared the performance when assuming independent and first-order Markov-dependent nucleotides, respectively, and used Karlin-Altschul and Karlin-Dembo statistics to evaluate the significance of hits. We hypothesized that a first-order Markov-dependent process might have better power to detect ncRNA genes since nearest-neighbor models have been shown to be successful in predicting RNA structures. A model based on a first-order partial sum process (analyzing overlapping dinucleotides) had better sensitivity and specificity than a zeroth-order model when applied to the AT-rich genome of the amoeba Dictyostelium discoideum. In this genome, we detected 94% of previously known ncRNA genes (at this sensitivity, the false positive rate was estimated to be 25% in a simulated background). The predictions were further refined by clustering candidate genes according to sequence similarity and/or searching for an ncRNA-associated upstream element. We experimentally verified six out of 10 tested ncRNA gene predictions. We conclude that higher-order models, in combination with other information, are useful for identification of novel ncRNA gene families in single-genome analysis of D. discoideum. Our generalizable approach extends the range of genomic data that can be searched for novel ncRNA genes using well-grounded statistical methods.  相似文献   

6.
Homozygous deletions or loss of heterozygosity (LOH) at human chromosome band 3p12 are consistent features of lung and other malignancies, suggesting the presence of a tumor suppressor gene(s) (TSG) at this location. Only one gene has been cloned thus far from the overlapping region deleted in lung and breast cancer cell lines U2020, NCI H2198, and HCC38. It is DUTT1 (Deleted in U Twenty Twenty), also known as ROBO1, FLJ21882, and SAX3, according to HUGO. DUTT1, the human ortholog of the fly gene ROBO, has homology with NCAM proteins. Extensive analyses of DUTT1 in lung cancer have not revealed any mutations, suggesting that another gene(s) at this location could be of importance in lung cancer initiation and progression. Here, we report the discovery of a new, small, homozygous deletion in the small cell lung cancer (SCLC) cell line GLC20, nested in the overlapping, critical region. The deletion was delineated using several polymorphic markers and three overlapping P1 phage clones. Fiber-FISH experiments revealed the deletion was approximately 130 kb. Comparative genomic sequence analysis uncovered short sequence elements highly conserved among mammalian genomes and the chicken genome. The discovery of two EST clusters within the deleted region led to the isolation of two noncoding RNA (ncRNA) genes. These were subsequently found differentially expressed in various tumors when compared to their normal tissues. The ncRNA and other highly conserved sequence elements in the deleted region may represent miRNA targets of importance in cancer initiation or progression.  相似文献   

7.
8.
9.
Identification and characterization of functional elements in the noncoding regions of genomes is an elusive and time-consuming activity whose output does not keep up with the pace of genome sequencing. Hundreds of bacterial genomes lay unexploited in terms of noncoding sequence analysis, although they may conceal a wide diversity of novel RNA genes, riboswitches, or other regulatory elements. We describe a strategy that exploits the entirety of available bacterial genomes to classify all noncoding elements of a selected reference species in a single pass. This method clusters noncoding elements based on their profile of presence among species. Most noncoding RNAs (ncRNAs) display specific signatures that enable their grouping in distinct clusters, away from sequence conservation noise and other elements such as promoters. We submitted 24 ncRNA candidates from Staphylococcus aureus to experimental validation and confirmed the presence of seven novel small RNAs or riboswitches. Besides offering a powerful method for de novo ncRNA identification, the analysis of phylogenetic profiles opens a new path toward the identification of functional relationships between co-evolving coding and noncoding elements.In all living organisms, the genome regions located between protein-coding sequences are home to a wide diversity of functional elements that include noncoding RNA (ncRNA) genes, DNA regulatory elements, untranslated regions (UTRs) of genes, transposable and self-replicating elements, and a variety of other transcribed or nontranscribed functional sequences. As these elements are often key players in gene regulation and thus in the global cell interaction network, their systematic identification and characterization has become a major challenge in biology.Computational protocols developed to collect and characterize noncoding elements in genomic sequences rely, to a large extent, on comparative genomics. The most common strategies involve, first, collecting sequences under selective pressure and, second, analyzing the aligned sequences using various classifiers that exploit criteria such as nucleotide composition, folding potential, fold conservation, or covariation between distant positions (Rivas and Eddy 2001; Washietl et al. 2005; Pedersen et al. 2006; Torarinsson et al. 2006). In general, such classifiers are designed to detect structured RNAs among noncoding elements with no further distinction between regulatory elements, repeats, or artifacts produced by sequence comparison algorithms.Comparative genomics entails a significant amount of expert intervention, especially in obtaining the right genome set to optimize the specificity and sensitivity of RNA detection. Although a number of studies have successfully identified ncRNAs in several animal (Missal et al. 2005; Washietl et al. 2007) and microbial genomes (Altuvia 2007), the pace at which such studies are performed and published lags far behind the rate of genome sequence output. Most of the complete genomes sequenced to date have escaped the scrutiny of RNA experts, and their potential for novel RNA functions lies unexplored. Recently, Livny et al. (2008) introduced an automated procedure, SIPHT, which combines conserved sequence detection and the presence of adjacent Rho-independent terminators. The procedure is sufficiently automated to be applied to all available bacterial genomes; however, the requirement for a terminator motif introduces a bias against the significant fraction of RNA elements that are not followed by a detectable terminator.Nucleic acid phylogenetic profiling (NAPP) addresses both the issue of high-throughput noncoding sequence identification and that of their functional characterization. Phylogenetic profiling (Ragan and Gaasterland 1998; Pellegrini et al. 1999) posits that genes belonging to the same metabolic or regulatory pathway tend to occur concomitantly in a given set of organisms. By clustering proteins based on their occurrence profile in many species, one therefore obtains clusters of functionally related proteins, which enables functional assignments to uncharacterized sequences. While this principle has been successfully applied to protein annotation (Pellegrini et al. 1999; Enault et al. 2003; Srinivasan et al. 2005), to date, no application to ncRNA annotation has emerged. An obstacle to building nucleic acid profiles over a large evolutionary time frame is certainly the lower sensitivity of DNA sequence similarity searches compared to searches at the amino acid sequence level. However, in spite of this limitation, Janky and van Helden (2008) were able to obtain clear phylogenetic profiles for promoter DNA elements, by focusing their analysis on upstream sequences of orthologous genes. In this study, we apply phylogenetic profiling for the first time to the complete noncoding DNA in bacterial genomes, with the objective of classifying all conserved noncoding elements in a single computing process.  相似文献   

10.
11.
<正>基因的失调控(异常表达)是肿瘤发生发展的重要原因(诱因)业已被学术界所公认,而在这一过程中非编码RNA(non-coding RNA,ncRNA)起到了极为重要的调控作用~([1-2])。小RNA(small RNA;包括微小RNA,microRNA,miRNA,miR)和长链非编码RNA(long non-coding RNA,lncRNA)均属于ncRNA~([3-4])。  相似文献   

12.
13.
14.
15.
R F Pacha  R F Allison  P Ahlquist 《Virology》1990,174(2):436-443
Cowpea chlorotic mottle virus (CCMV) is a positive-strand RNA virus that infects dicotyledonous plants. The genome comprises three capped RNAs: RNA1 (3.2 kb), RNA2 (2.9 kb), and RNA3 (2.1 kb). cis-Acting sequences required for amplification in vivo were explored for RNA3, which does not contribute trans-acting factors to viral RNA replication. Using a CCMV cDNA expression system, deletions throughout RNA3 were constructed and tested for successful replication in barley protoplasts coinoculated with RNAs 1 and 2. As previously found for RNA3 of the related brome mosaic virus (BMV) (R. French and P. Ahlquist, 1987, J. Virol. 61, 1457-1465), either of the two coding regions can be individually deleted without blocking RNA3 amplification. However, in striking contrast to BMV, the entire intercistronic noncoding region separating these genes is also dispensable for CCMV RNA3 amplification. Moreover, although simultaneous deletions of the 3a and coat protein genes were deleterious for BMV RNA3 accumulation, CCMV RNA3 derivatives bearing larger deletions encompassing the 3a gene, intercistronic region, and coat protein gene amplify to high levels. Thus, unlike BMV RNA3, cis-acting sequences required for CCMV RNA3 amplification map solely in the 5' and 3' noncoding regions. Normal levels of CCMV RNA3 accumulation require over 125 but no more than 220 bases from the 3' noncoding region, and no more than the first 89 bases of the 238-base-long 5' noncoding region.  相似文献   

16.
17.
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号