首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A method is described for performing global alignment of noncoding DNA sequences based on an evolutionary model parameterized by the frequency distribution of lengths of insertion/deletion events (indels) and their rate relative to nucleotide substitutions. A stochastic hill-climbing algorithm is used to search for the most probable alignment between a pair of sequences or three sequences of known phylogenetic relationship. The performance of the procedure, parameterized according to the empirical distribution of indel lengths in noncoding DNA of Drosophila species, is investigated by simulation. We show that there is excellent agreement between true and estimated alignments over a wide range of sequence divergences, and that the method outperforms other available alignment methods.  相似文献   

2.
It is presently accepted that, in mammals, due to the greater number of cell divisions in the male germline than in the female germline, nucleotide substitutions occur more frequently in males. The data on mutation bias in insertions and deletions (indels) are contradictory, with some studies indicating no sex bias and others indicating either female or male bias. The sequenced rat and mouse genomes provide a unique opportunity to investigate a potential sex bias for different types of mutations. Indeed, mutation rates can be accurately estimated from a large number of orthologous loci in organisms similar in generation time and in the number of germline cell divisions. Here we compare the mutation rates between chromosome X and autosomes for likely neutral sites in eutherian ancestral interspersed repetitive elements present at orthologous locations in the rat and mouse genomes. We find that small indels are male biased: The male-to-female mutation rate ratio (alpha) for indels in rodents is approximately 2. Similarly, our whole-genome analysis in rodents indicates an approximately twofold excess of nucleotide substitutions originating in males over that in females. This is the same as the male-to-female ratio of the number of germline cell divisions in rat and mouse. Thus, this is consistent with nucleotide substitutions and small indels occurring primarily during DNA replication.  相似文献   

3.
A distinctive feature of the avian genome is the large heterogeneity in the size of chromosomes, which are usually classified into a small number of macrochromosomes and numerous microchromosomes. These chromosome classes show characteristic differences in a number of interrelated features that could potentially affect the rate of sequence evolution, such as GC content, gene density, and recombination rate. We studied the effects of these factors by analyzing patterns of nucleotide substitution in two sets of chicken-turkey sequence alignments. First, in a set of 67 orthologous introns, divergence was significantly higher in microchromosomes (chromosomes 11-38; 11.7% divergence) than in both macrochromosomes (chromosomes 1-5; 9.9% divergence; P = 0.016) and intermediate-sized chromosomes (chromosomes 6-10; 9.5% divergence; P = 0.026). At least part of this difference was due to the higher incidence of CpG sites on microchromosomes. Second, using 155 orthologous coding sequences we noted a similar pattern, in which synonymous substitution rates on microchromosomes (13.1%) were significantly higher than were rates on macrochromosomes (10.3%; P = 0.024). Broadly assuming neutrality of introns and synonymous sites, or constraints on such sequences do not differ between chromosomal classes, these observations imply that microchromosomal genes are exposed to more germ line mutations than those on other chromosomes. We also find that dN/dS ratios for genes located on microchromosomes (average, 0.094) are significantly lower than those of macrochromosomes (average, 0.185; P = 0.025), suggesting that the proteins of genes on microchromosomes are under greater evolutionary constraint.  相似文献   

4.
The analysis of conservation between the human and mouse genomes resulted in the identification of a large number of conserved nongenic sequences (CNGs). The functional significance of this nongenic conservation remains unknown, however. The availability of the sequence of a third mammalian genome, the dog, allows for a large-scale analysis of evolutionary attributes of CNGs in mammals. We have aligned 1638 previously identified CNGs and 976 conserved exons (CODs) from human chromosome 21 (Hsa21) with their orthologous sequences in mouse and dog. Attributes of selective constraint, such as sequence conservation, clustering, and direction of substitutions were compared between CNGs and CODs, showing a clear distinction between the two classes. We subsequently performed a chromosome-wide analysis of CNGs by correlating selective constraint metrics with their position on the chromosome and relative to their distance from genes. We found that CNGs appear to be randomly arranged in intergenic regions, with no bias to be closer or farther from genes. Moreover, conservation and clustering of substitutions of CNGs appear to be completely independent of their distance from genes. These results suggest that the majority of CNGs are not typical of previously described regulatory elements in terms of their location. We propose models for a global role of CNGs in genome function and regulation, through long-distance cis or trans chromosomal interactions.  相似文献   

5.
A complex of high-molecular-mass proteins (PfRhopH) of the human malaria parasite Plasmodium falciparum induces host protective immunity and therefore is a candidate for vaccine development. Understanding the level of polymorphism and the evolutionary processes is important for advancements in both vaccine design and knowledge of the evolution of cell invasion in this parasite. In the present study, we sequenced the entire open reading frames of seven genes encoding the proteins of the PfRhopH complex (rhoph2, rhoph3, and five rhoph1/clag gene paralogs). We found that four rhoph1/clag genes (clag2, 3.1, 3.2, and 8) were highly polymorphic. Amino acid substitutions and indels are predominantly clustered around amino acid positions 1000-1200 of these four rhoph1/clag genes. An excess of nonsynonymous substitutions over synonymous substitutions was detected for clag8 and 9, indicating positive selection. The McDonald-Kreitman test with a Plasmodium reichenowi orthologous sequence also supports positive selection on clag8. Based on the ratio of interspecific genetic distance to intraspecific distance, the time to the most recent common ancestor of the clag2 and 8 polymorphisms was estimated to be 1.89 and 0.87 million years ago, respectively, assuming divergence of P. falciparum and P. reichenowi 6 million years ago. In addition to a copy number polymorphism, gene conversion events were detected for the rhoph1/clag genes on chromosome 3, which likely play a role in increasing the diversity of each locus. Our results indicate that a high diversity of the PfRhopH1/Clag multigene family is maintained by diversifying selection forces over a considerably long period.  相似文献   

6.
Despite being the second most frequent type of polymorphism in the genome, diallelic insertion-deletion polymorphisms (indels) have received far less attention in the study of sequence variation. In this report, we describe an approach that can detect indels in the heterozygous state and can comprehensively identify indels in the target sequence. Using this approach, we identified 2393 indels in a set of 330 candidate genes, i.e. an average of seven indels per gene with about two indels per gene being common (minor allele frequency >or=0.1). We compared the population genetic characteristics of indels with substitutions in this data. Our data supported the findings that deletions occur more frequently in the human genome. 5'-UTR and coding regions of the genes showed a significantly lower diversity for indels compared with other regions, suggesting differences in effects of selection on indels and substitutions. Sequence diversity and pairwise linkage disequilibrium (LD) findings of the different populations were similar to earlier results and included a greater skew towards low-frequency variants and a faster rate of LD decay in the African-descent population compared with the non-African populations. Within populations, the allele frequency spectra and LD-decay profiles for indels were similar to substitutions. Overall, the findings suggest that, although the mechanisms giving rise to indels may be different from those causing substitutions, the evolutionary histories of indels and substitutions are similar, and that indels can play a valuable role in association studies and marker selection strategies.  相似文献   

7.
Comparison of splice sites in mammals and chicken   总被引:5,自引:2,他引:3  
We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites.  相似文献   

8.
Non-coding DNA comprises approximately 80% of the euchromatic portion of the Drosophila melanogaster genome. Non-coding sequences are known to contain functionally important elements controlling gene expression, but the proportion of sites that are selectively constrained is still largely unknown. We have compared the complete D. melanogaster and Drosophila simulans genome sequences to estimate mean selective constraint (the fraction of mutations that are eliminated by selection) in coding and non-coding DNA by standardizing to substitution rates in putatively unconstrained sequences. We show that constraint is positively correlated with intronic and intergenic sequence length and is generally remarkably strong in non-coding DNA, implying that more than half of all point mutations in the Drosophila genome are deleterious. This fraction is also likely to be an underestimate if many substitutions in non-coding DNA are adaptively driven to fixation. We also show that substitutions in long introns and intergenic sequences are clustered, such that there is an excess of substitutions <8 bp apart and a deficit farther apart. These results suggest that there are blocks of constrained nucleotides, presumably involved in gene expression control, that are concentrated in long non-coding sequences. Furthermore, we infer that there is more than three times as much functional non-coding DNA as protein-coding DNA in the Drosophila genome. Most deleterious mutations therefore occur in non-coding DNA, and these may make an important contribution to a wide variety of evolutionary processes.  相似文献   

9.
Nucleotide insertion and deletion (indel) events, together with substitutions, represent the major mutational processes of gene evolution. Through the alignment of 8148 orthologous genes from human, mouse, and rat, we have identified 1743 indel events within rodent protein-coding sequences. Using human as an out-group, we reconstructed the mutational event underlying each of these indels. Overall, we found an excess of deletions over insertions, particularly for the rat lineage (70% excess). Sequence slippage accounts for at least 52% of insertions and 38% of deletions. We have also evaluated the selective tolerance of identifiable protein structures to indels. Transmembrane domains are the least, and low complexity regions, the most tolerant. Mapping of indels onto known protein structures demonstrated that structural cores are markedly less tolerant to indels than are loop regions. There is a specific enrichment of CpG dinucleotides in close proximity to insertion events, and both insertions and deletions are more common in higher G+C content sequences.  相似文献   

10.
We present an analysis of rates and patterns of microevolutionary phenomena that have shaped the human, mouse, and rat genomes since their last common ancestor. We find evidence for a shift in the mutational spectrum between the mouse and rat lineages, with the net effect being a relative increase in GC content in the rat genome. Our estimate for the neutral point substitution rate separating the two rodents is 0.196 substitutions per site, and 0.65 substitutions per site for the tree relating all three mammals. Small insertions and deletions of 1-10 bp in length ("microindels") occur at approximately 5% of the point substitution rate. Inferred regional correlations in evolutionary rates between lineages and between types of sites support the idea that rates of evolution are influenced by local genomic or cell biological context. No substantial correlations between rates of point substitutions and rates of microindels are found, however, implying that the influences that affect these processes are distinct. Finally, we have identified those regions in the human genome that are evolving slowly, which are likely to include functional elements important to human biology. At least 5% of the human genome is under substantial constraint, most of which is noncoding.  相似文献   

11.
Distribution and intensity of constraint in mammalian genomic sequence   总被引:8,自引:5,他引:8  
Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the results of such an analysis on an alignment of sequences from 29 mammalian species. The alignment captures approximately 3.9 neutral substitutions per site and spans approximately 1.9 Mbp of the human genome. We identify constrained elements from 3 bp to over 1 kbp in length, covering approximately 5.5% of the human locus. Our estimate for the total amount of nonexonic constraint experienced by this locus is roughly twice that for exonic constraint. Constrained elements tend to cluster, and we identify large constrained regions that correspond well with known functional elements. While constraint density inversely correlates with mobile element density, we also show the presence of unambiguously constrained elements overlapping mammalian ancestral repeats. In addition, we describe a number of elements in this region that have undergone intense purifying selection throughout mammalian evolution, and we show that these important elements are more numerous than previously thought. These results were obtained with Genomic Evolutionary Rate Profiling (GERP), a statistically rigorous and biologically transparent framework for constrained element identification. GERP identifies regions at high resolution that exhibit nucleotide substitution deficits, and measures these deficits as "rejected substitutions". Rejected substitutions reflect the intensity of past purifying selection and are used to rank and characterize constrained elements. We anticipate that GERP and the types of analyses it facilitates will provide further insights and improved annotation for the human genome as mammalian genome sequence data become richer.  相似文献   

12.
Lineage-specific gene loss, to a large extent, accounts for the differences in gene repertoires between genomes, particularly among eukaryotes. We derived a parsimonious scenario of gene losses for eukaryotic orthologous groups (KOGs) from seven complete eukaryotic genomes. The scenario involves substantial gene loss in fungi, nematodes, and insects. Based on this evolutionary scenario and estimates of the divergence times between major eukaryotic phyla, we introduce a numerical measure, the propensity for gene loss (PGL). We explore the connection among the propensity of a gene to be lost in evolution (PGL value), protein sequence divergence, the effect of gene knockout on fitness, the number of protein-protein interactions, and expression level for the genes in KOGs. Significant correlations between PGL and each of these variables were detected. Genes that have a lower propensity to be lost in eukaryotic evolution accumulate fewer substitutions in their protein sequences and tend to be essential for the organism viability, tend to be highly expressed, and have many interaction partners. The dependence between PGL and gene dispensability and interactivity is much stronger than that for sequence evolution rate. Thus, propensity of a gene to be lost during evolution seems to be a direct reflection of its biological importance.  相似文献   

13.
Liu HJ  Lee LH  Hsu HW  Kuo LC  Liao MH 《Virology》2003,314(1):336-349
Nucleotide sequences of the S-class genome segments of 17 field-isolates and vaccine strains of avian reovirus (ARV) isolated over a 23-year period from different hosts, pathotypes, and geographic locations were examined and analyzed to define phylogenetic profiles and evolutionary mechanism. The S1 genome segment showed noticeably higher divergence than the other S-class genes. The sigma C-encoding gene has evolved into six distinct lineages. In contrast, the other S-class genes showed less divergence than that of the sigma C-encoding gene and have evolved into two to three major distinct lineages, respectively. Comparative sequence analysis provided evidence indicating extensive sequence divergence between ARV and other orthoreoviruses. The evolutionary trees of each gene were distinct, suggesting that these genes evolve in an independent manner. Furthermore, variable topologies were the result of frequent genetic reassortment among multiple cocirculating lineages. Results showed genetic diversity correlated more closely with date of isolation and geographic sites than with host species and pathotypes. This is the first evidence demonstrating genetic variability among circulating ARVs through a combination of evolutionary mechanisms involving multiple cocirculating lineages and genetic reassortment. The evolutionary rates and patterns of base substitutions were examined. The evolutionary rate for the sigma C-encoding gene and sigma C protein was higher than for the other S-class genes and other family of viruses. With the exception of the sigma C-encoding gene, which nonsynonymous substitutions predominate over synonymous, the evolutionary process of the other S-class genes can be explained by the neutral theory of molecular evolution. Results revealed that synonymous substitutions predominate over nonsynonymous in the S-class genes, even though genetic diversity and substitution rates vary among the viruses.  相似文献   

14.
Molecular evolution of eastern equine encephalomyelitis virus in North America   总被引:15,自引:0,他引:15  
S C Weaver  T W Scott  R Rico-Hesse 《Virology》1991,182(2):774-784
We examined the rate and spatial pattern of eastern equine encephalomyelitis virus (EEEV) evolution in North America using primer-extension sequencing of viral RNA. Nucleotide sequences of the entire 26 S structural gene region of four EEEV strains revealed remarkable conservation between 1933 and 1985, with an estimated 0.7% divergence or 1.4 x 10(-4) nucleotide substitutions per site per year. Sequences from smaller 26 S regions of nine additional strains suggested that EEEV evolves in North America in a single lineage, with genetic exchange regularly occurring among enzootic transmission foci. In these limited 26 S genome regions, only synonymous nucleotide substitutions became fixed between 1933 and 1988, implying a high degree of conservation in protein structure. Short nucleotide sequences from a Panamanian, South American variety isolate revealed a relatively distant relationship to North American serotype viruses. This suggested genetic divergence between antigenic varieties, and independent evolution of EEEV in North and South America. Factors related to replication and epidemiology of EEEV, which may constrain its evolution in nature, are discussed. Possible mechanisms of genetic exchange among enzootic foci are also considered.  相似文献   

15.
Origin and evolution of new exons in rodents   总被引:9,自引:0,他引:9  
Gene number difference among organisms demonstrates that new gene origination is a fundamental biological process in evolution. Exon shuffling has been universally observed in the formation of new genes. Yet to be learned are the ways new exons originate and evolve, and how often new exons appear. To address these questions, we identified 2695 newly evolved exons in the mouse and rat by comparing the expressed sequences of 12,419 orthologous genes between human and mouse, using 743,856 pig ESTs as the outgroup. The new exon origination rate is about 2.71 x 10(-3) per gene per million years. These new exons have markedly accelerated rates both of nonsynonymous substitutions and of insertions/deletions (indels). A much higher proportion of new exons have K(a)/K(s) ratios >1 (where K(a) is the nonsynonymous substitution rate and K(s) is the synonymous substitution rate) than do the old exons shared by human and mouse, implying a role of positive selection in the rapid evolution. The majority of these new exons have sequences unique in the genome, suggesting that most new exons might originate through "exonization" of intronic sequences. Most of the new exons appear to be alternative exons that are expressed at low levels.  相似文献   

16.
17.
Huang S  Yu T  Chen Z  Yuan S  Chen S  Xu A 《Human mutation》2012,33(7):1099-1106
Early studies have shown that single-nucleotide mutation rates increase close to insertions and deletions, but it is not fully understood how natural selection shapes genome-wide patterns of indels and their nearby single-nucleotide mutations. In this study, we find that, in primates, more single-nucleotide mutations surround small insertions than small deletions. This pattern affects <150 base pair (bp) sequences close to indels and persists under different genomic properties, such as exon/intron/intergenic contexts, repeated/nonrepeated sequences, replication timing, recombination rates, indel density, and guanine-cytosine (GC) content. We propose two different, but not mutually exclusive, hypothetical mechanisms to explain the pattern. One mechanism is that the sequence context preferring insertion formation may also favor nucleotide substitutions. Another mechanism is related to a hypothesis in which indel heterozygosity tends to increase nearby nucleotide substitution rates. It means that if insertions spend more time in heterozygotes, insertions may accumulate more surrounding single-nucleotide changes. In conclusion, we characterize a special genome-wide evolutionary pattern for indels and nearby single-nucleotide changes. This pattern may be driven by natural selection and bias primates' genome evolution and phenotypic variations.  相似文献   

18.
Andolfatto P 《Genome research》2007,17(12):1755-1762
Several recent studies have estimated that a large fraction of amino acid divergence between species of Drosophila was fixed by positive selection, using statistical approaches based on the McDonald-Kreitman test. However, little is known about associated selection coefficients of beneficial amino acid mutations. Recurrent selective sweeps associated with adaptive substitutions should leave a characteristic signature in genome variability data that contains information about the frequency and strength of selection. Here, I document a significant negative correlation between the level and the frequency of synonymous site polymorphism and the rate of protein evolution in highly recombining regions of the X chromosome of D. melanogaster. This pattern is predicted by recurrent adaptive protein evolution and suggests that adaptation is an important determinant of patterns of neutral variation genome-wide. Using a maximum likelihood approach, I estimate the product of the rate and strength of selection under a recurrent genetic hitchhiking model, lambda2N(e)s approximately 3 x 10(-8). Using an approach based on the McDonald-Kreitman test, I estimate that approximately 50% of divergent amino acids were driven to fixation by positive selection, implying that beneficial amino acid substitutions are of weak effect on average, on the order of 10(-5) (i.e., 2N(e)s approximately 40). Two implications of these results are that most adaptive substitutions will be difficult to detect in genome scans of selection and that population size (and genetic drift) may be an important determinant of the evolutionary dynamics of protein adaptation.  相似文献   

19.
Ke S  Zhang XH  Chasin LA 《Genome research》2008,18(4):533-543
We have used comparative genomics to characterize the evolutionary behavior of predicted splicing regulatory motifs. Using base substitution rates in intronic regions as a calibrator for neutral change, we found a strong avoidance of synonymous substitutions that disrupt predicted exonic splicing enhancers or create predicted exonic splicing silencers. These results attest to the functionality of the hexameric motif set used and suggest that they are subject to purifying selection. We also found that synonymous substitutions in constitutive exons tend to create exonic splicing enhancers and to disrupt exonic splicing silencers, implying positive selection for these splicing promoting events. We present evidence that this positive selection is the result of splicing-positive events compensating for splicing-negative events as well as for mutations that weaken splice-site sequences. Such compensatory events include nonsynonymous mutations, synonymous mutations, and mutations at splice sites. Compensation was also seen from the fact that orthologous exons tend to maintain the same number of predicted splicing motifs. Our data fit a splicing compensation model of exon evolution, in which selection for splicing-positive mutations takes place to counter the effect of an ongoing splicing-negative mutational process, with the exon as a whole being conserved as a unit of splicing. In the course of this analysis, we observed that synonymous positions in general are conserved relative to intronic sequences, suggesting that messenger RNA molecules are rich in sequence information for functions beyond protein coding and splicing.  相似文献   

20.
To estimate the species-specific mutation rates at the DRB1 locus in humans and chimpanzee, we analyzed the nucleotide sequence of a 37.6-kb chimpanzee chromosomal segment containing the entire Patr-DRB1*0701 allele and the flanking nongenic region and we compared it with two corresponding human sequences containing the HLA-DRB1*070101 allele using the sequence of HLA-DRB1*04011 as an outgroup. Because the allelic pair of HLA-DRB1*070101 and Patr-DRB1*0701 shows the lowest number of substitutions between the two species, it appears that these sequences diverged close to the time of the humans-chimpanzee divergence (6 million years ago). Alignment of the nucleotide sequences for HLA-DRB1*070101 and Patr-DRB1*0701 alleles showed that they share a high degree of similarity, suggesting that the studied chromosomal segments with these sequences have not been subjected to recombination since the humans-chimpanzee divergence. Comparison of the flanking 10.6 kb of nongenic sequences revealed an average of 41.5 and 83 single nucleotide substitutions in humans and chimpanzee, respectively. Thus, the species-specific nucleotide substitution rates in the flanking nongenic region were estimated to be 6.53 x 10(-10) and 1.31 x 10(-9) per site per year in humans and chimpanzee, respectively. Unexpectedly, the estimated rate in humans was twofold lower than in chimpanzee (P < 10(-3), Tajima's relative rate test) and lower than the average substitution rate in the human genome. Because the nucleotide substitution rate in nongenic regions free from selection is expected to be equal to the mutation rate, the estimated substitution rate should correspond to the species-specific mutation rate at the DRB1 locus. Our results strongly suggest that the mutation rate at DRB1 locus differs among species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号