首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Transposon-free regions in mammalian genomes   总被引:6,自引:2,他引:4  
Despite the presence of over 3 million transposons separated on average by approximately 500 bp, the human and mouse genomes each contain almost 1000 transposon-free regions (TFRs) over 10 kb in length. The majority of human TFRs correlate with orthologous TFRs in the mouse, despite the fact that most transposons are lineage specific. Many human TFRs also overlap with orthologous TFRs in the marsupial opossum, indicating that these regions have remained refractory to transposon insertion for long evolutionary periods. Over 90% of the bases covered by TFRs are noncoding, much of which is not highly conserved. Most TFRs are not associated with unusual nucleotide composition, but are significantly associated with genes encoding developmental regulators, suggesting that they represent extended regions of regulatory information that are largely unable to tolerate insertions, a conclusion difficult to reconcile with current conceptions of gene regulation.  相似文献   

2.
Whole-genome sequence assembly for mammalian genomes: Arachne 2   总被引:13,自引:5,他引:13       下载免费PDF全文
We previously described the whole-genome assembly program Arachne, presenting assemblies of simulated data for small to mid-sized genomes. Here we describe algorithmic adaptations to the program, allowing for assembly of mammalian-size genomes, and also improving the assembly of smaller genomes. Three principal changes were simultaneously made and applied to the assembly of the mouse genome, during a six-month period of development: (1) Supercontigs (scaffolds) were iteratively broken and rejoined using several criteria, yielding a 64-fold increase in length (N50), and apparent elimination of all global misjoins; (2) gaps between contigs in supercontigs were filled (partially or completely) by insertion of reads, as suggested by pairing within the supercontig, increasing the N50 contig length by 50%; (3) memory usage was reduced fourfold. The outcome of this mouse assembly and its analysis are described in (Mouse Genome Sequencing Consortium 2002).  相似文献   

3.
Comparative sequence analyses on a collection of carefully chosen mammalian genomes could facilitate identification of functional elements within the human genome and allow quantification of evolutionary constraint at the single nucleotide level. High-resolution quantification would be informative for determining the distribution of important positions within functional elements and for evaluating the relative importance of nucleotide sites that carry single nucleotide polymorphisms (SNPs). Because the level of resolution in comparative sequence analyses is a direct function of sequence diversity, we propose that the information content of a candidate mammalian genome be defined as the sequence divergence it would add relative to already-sequenced genomes. We show that reliable estimates of genomic sequence divergence can be obtained from small genomic regions. On the basis of a multiple sequence alignment of approximately 1.4 megabases each from eight mammals, we generate such estimates for five unsequenced mammals. Estimates of the neutral divergence in these data suggest that a small number of diverse mammalian genomes in addition to human, mouse, and rat would allow single nucleotide resolution in comparative sequence analyses.  相似文献   

4.
5.
6.
The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ~2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.  相似文献   

7.
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.  相似文献   

8.
9.
Despite the availability of dozens of animal genome sequences, two key questions remain unanswered: First, what fraction of any species'' genome confers biological function, and second, are apparent differences in organismal complexity reflected in an objective measure of genomic complexity? Here, we address both questions by applying, across the mammalian phylogeny, an evolutionary model that estimates the amount of functional DNA that is shared between two species'' genomes. Our main findings are, first, that as the divergence between mammalian species increases, the predicted amount of pairwise shared functional sequence drops off dramatically. We show by simulations that this is not an artifact of the method, but rather indicates that functional (and mostly noncoding) sequence is turning over at a very high rate. We estimate that between 200 and 300 Mb (∼6.5%–10%) of the human genome is under functional constraint, which includes five to eight times as many constrained noncoding bases than bases that code for protein. In contrast, in D. melanogaster we estimate only 56–66 Mb to be constrained, implying a ratio of noncoding to coding constrained bases of about 2. This suggests that, rather than genome size or protein-coding gene complement, it is the number of functional bases that might best mirror our naïve preconceptions of organismal complexity.What fraction of a genome confers biological function, as opposed to the remaining proportion that has had no biological effect and thus has not been subject to selection? While the complement of (functional) protein-coding sequence has been estimated in many organisms (e.g., 1.06% of the human genome; Church et al. 2009), it has been more challenging to identify functional sequence that fails to encode protein (Mouse Genome Sequencing Consortium 2002). Even the more simple task of estimating the size of this fraction, or more precisely, the genomic fraction that is under evolutionary constraint and is thereby inferred to confer function to the organism, has proven particularly contentious (Chiaromonte et al. 2003; Pheasant and Mattick 2007).Methods to detect constraint do so by comparing genomic sequence and therefore show greatest power to identify “shared” constrained sequence, and lower power to reveal sequence whose function is “lineage-specific.” Analyzing species at various divergences thus offers an opportunity to investigate the dynamics of genome evolution: Is the functional fraction largely shared and evolving slowly by accumulating a low rate of point mutations, or does, instead, rapid sequence turnover of lineage-specific functional sequence play an important role? While protein-coding genes appear to evolve predominantly in the first mode, it is readily apparent that lineage-specific sequence occurs abundantly in most genomes. Instances where functional sequence has been gained, and erstwhile functional sequence has been lost, have been identified in mammals (Dermitzakis and Clark 2002; Smith et al. 2004; Odom et al. 2007; Kunarso et al. 2010), flies (Ludwig et al. 2000; Bergman and Kreitman 2001; Moses et al. 2006), and yeast (Borneman et al. 2007). Although convincing, these examples represent a very small fraction of the functional complement of each genome, and argue neither for nor against the ubiquity of functional sequence turnover.A second key question is whether the genomes of different species contain different amounts of functional sequence, and whether this measure is related to organismal complexity. For example, it is clear that both the genome size and the number of genes present in a genome fail to reflect at least naïve preconceptions of organismal complexity (Gregory 2005; Ponting 2008). While varying proportions of nonfunctional (“junk”) DNA, often in the form of transposed repetitive elements (TEs), may explain the large variation in genome size across species, the relatively stable number of protein-coding genes suggests the possibility that our naïve notion of complexity is fundamentally incorrect, and that many species are in fact of comparable complexity, in a sense yet to be defined. Alternatively, it may be that much of the apparent differences in complexity between species are encoded by a varying amount of noncoding regulatory sequence, regulating a fairly stable core of protein-coding genes.Addressing these two questions requires accurate estimates of the amount of functional, yet noncoding, sequence in genomes from across the metazoan subkingdom. Several groups have developed comparative genomic methods to estimate this quantity. For example, an early estimate of the genomic fraction of human constrained sequence was obtained from alignments of human and mouse genome assemblies, and suggested that approximately αsel = 5% of the human genome has been subject to selective constraint (Chiaromonte et al. 2003). (Here, we adopt from Chiaromonte et al. the symbol αsel as the estimated fraction of a genome that has been subject to selective constraint and thus may be considered functional. In addition, we define g as the full extent of the euchromatic sequence of a genome, and gsel = g × αsel as the amount of sequence that has been subject to purifying selection.) This estimate of αsel was obtained by contrasting nucleotide conservation inside and outside of ancestral repeats (ARs, TEs whose insertion predates the species'' last common ancestor) while taking account of the known regional variation in nucleotide substitution rates. Subsequently, other substitution-based approaches, taking advantage of multiple genome sequence alignments, yielded similar results (Margulies et al. 2003; Cooper et al. 2005; Siepel et al. 2005).All such estimates of αsel have shown a strong dependence on the parameterization of the underlying neutral substitution model, and as neutral substitutions are difficult to model (Clark 2006), the resulting estimates have wide confidence intervals. For example, the initial approach by Chiaromonte et al. (2003) indicated αsel as being between 2.3% and 7.9% of the human genome, depending on which values of model parameters were chosen. The attendant uncertainty in the final estimates makes it difficult to use this or similar methods to quantify lineage-specific constrained sequence.More recently, three analyses have estimated αsel by taking advantage of the 1% of the human genome that has been scrutinized within the pilot phase of the ENCODE project (The ENCODE Project Consortium 2007). These yielded higher αsel estimates of between 5% and 12% (Asthana et al. 2007; Garber et al. 2009; Parker et al. 2009) with the spread of αsel values being again dependent upon the values of model parameters that were chosen. With one algorithm constraint was identified within 45% of ARs (Parker et al. 2009). Estimates of αsel in ENCODE regions may also be upwardly biased, since only some of ENCODE''s regions were randomly selected, while others were chosen because of their functional content.For invertebrates estimates of αsel have also been imprecise, in the main because their small genomes often contain only a meager amount of neutrally evolving sequence on which to tune a neutral model (Peterson et al. 2009). Estimates of αsel for Drosophila range between ∼40% and 70% (Andolfatto 2005; Siepel et al. 2005; Halligan and Keightley 2006; Keith et al. 2008), while one study indicated that 18%–37% of the Caenorhabditis elegans genome is under selective constraint (Siepel et al. 2005).As alluded to above, methods for inferring quantities of functional DNA rest upon the hypothesis that in functional sequence most nucleotide changes are detrimental, causing such changes to be purged from the species'' populations, which results in evolutionarily conserved sequence. Methods for quantifying constrained sequence typically contrast interspecies levels of sequence conservation within a sequence of interest and within matched putatively neutrally evolved sequence, typically ARs. While the deletion of conserved sequence identified in this manner does not always result in an overt phenotype (Ahituv et al. 2007; Visel et al. 2009), it has been shown that selection rather than mutational cold-spots are responsible for the low rate of mutation accumulation (Drake et al. 2006). The outlined approach has been further criticized for overlooking sequence that is lineage-specific or that exhibits only weak conservation (Dermitzakis and Clark 2002), for tacitly assuming, rather than demonstrating, the neutrality of ARs, and for overlooking sequence that has evolved by positive, rather than negative, selection (Pheasant and Mattick 2007).Here, we estimate the quantities of functional DNA that are shared between species pairs at various divergences. This allows us to investigate the dependence of this quantity on species divergence, thus partially addressing lineage specificity. An earlier study using the same method demonstrated that ARs are predominantly neutrally evolving (Lunter et al. 2006), thereby addressing the second concern, and the present study confirms these findings. By continuing to overlook potentially positively selected sequence our estimates of the amount of functional sequence are expected to remain slightly conservative.The approach presented here (based on the neutral indel model; Lunter et al. 2006) uses indel mutations, rather than single-nucleotide substitutions, to estimate αsel. Although indel events occur approximately eightfold less often than substitution mutations (Lunter 2007; Cartwright 2009), their impact upon functional sequence may well be more profound than that exerted by single-nucleotide substitutions. Indels may induce, for example, frame shifts in coding regions and secondary structure changes in RNAs, suggesting that stronger purifying selection may often act upon them. This will compensate for their lower mutation rate when indels are exploited in approaches to detecting evolutionary constraint. In contrast to many substitution-based methods that require fitting an explicit background model to neutrally evolving sequence, the present method has a single free parameter (the indel rate) which can be trained from the full data, without the requirement of first identifying the neutral fraction.Here, we estimate αsel values for diverse mammalian species and for birds, teleost fish, and fruit flies. We show that the neutral indel model estimates gsel for closely related pairs as being up to threefold higher than for more distantly related species, a result that is a feature of the data rather than being an inherent bias of the method. This suggests a substantial rate of “turnover” of otherwise constrained sequence. Finally, we show that, despite their comparable protein-coding gene complement, vertebrate (mammalian or avian) genomes harbor substantially more functional sequence than invertebrate (Drosophila and C. elegans) genomes, as a result of a larger complement of functional noncoding sequence.  相似文献   

10.
11.
12.
13.
Genome instability is a hallmark of most human cancers and is exacerbated following replication stress. However, the effects that drugs/xenobiotics have in promoting genome instability including chromosomal structural rearrangements in normal cells are not currently assessed in the genetic toxicology battery. Here, we show that drug-induced replication stress leads to increased genome instability in vitro using proliferating primary human cells as well as in vivo in rat bone marrow (BM) and duodenum (DD). p53-binding protein 1 (53BP1, biomarker of DNA damage repair) nuclear bodies were increased in a dose-dependent manner in normal proliferating human mammary epithelial fibroblasts following treatment with compounds traditionally classified as either genotoxic (hydralazine) and nongenotoxic (low-dose aphidicolin, duvelisib, idelalisib, and amiodarone). Comparatively, no increases in 53BP1 nuclear bodies were observed in nonproliferating cells. Negative control compounds (mannitol, alosteron, diclofenac, and zonisamide) not associated with cancer risk did not induce 53BP1 nuclear bodies in any cell type. Finally, we studied the in vivo genomic consequences of drug-induced replication stress in rats treated with 10 mg/kg of cyclophosphamide for up to 14 days followed by polymerase chain reaction-free whole genome sequencing (30X coverage) of BM and DD cells. Cyclophosphamide induced chromosomal structural rearrangements at an average of 90 genes, including 40 interchromosomal/intrachromosomal translocations, within 2 days of treatment. Collectively, these data demonstrate that this drug-induced genome instability test (DiGIT) can reveal potential adverse effects of drugs not otherwise informed by standard genetic toxicology testing batteries. These efforts are aligned with the food and drug administration's (FDA's) predictive toxicology roadmap initiative.  相似文献   

14.
Although analysis of genome rearrangements was pioneered by Dobzhansky and Sturtevant 65 years ago, we still know very little about the rearrangement events that produced the existing varieties of genomic architectures. The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they reveal that a large number of microrearrangements is required to explain the differences in draft human and mouse sequences. Here we describe a new algorithm for constructing synteny blocks, study arrangements of synteny blocks in human and mouse, derive a most parsimonious human-mouse rearrangement scenario, and provide evidence that intrachromosomal rearrangements are more frequent than interchromosomal rearrangements. Our analysis is based on the human-mouse breakpoint graph, which reveals related breakpoints and allows one to find a most parsimonious scenario. Because these graphs provide important insights into rearrangement scenarios, we introduce a new visualization tool that allows one to view breakpoint graphs superimposed with genomic dot-plots.  相似文献   

15.
16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号