首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The combined actions of natural selection, mutation, and recombination forge the landscape of genetic variation across genomes. One frequently observed manifestation of these processes is a positive association between neutral genetic variation and local recombination rates. Two selective mechanisms and/or recombination-associated mutation (RAM) could generate this pattern, and the relative importance of these alternative possibilities remains unresolved generally. Here we quantify nucleotide differences within populations, between populations, and between species to test for genome-wide effects of selection and RAM in the partially selfing nematode Caenorhabditis briggsae. We find that nearly half of genome-wide variation in nucleotide polymorphism is explained by differences in local recombination rates. By quantifying divergence between several reproductively isolated lineages, we demonstrate that ancestral polymorphism generates a spurious signal of RAM for closely related lineages, with implications for analyses of humans and primates; RAM is, at most, a minor factor in C. briggsae. We conclude that the positive relation between nucleotide polymorphism and the rate of crossover represents the footprint of natural selection across the C. briggsae genome and demonstrate that background selection against deleterious mutations is sufficient to explain this pattern. Hill-Robertson interference also leaves a signature of more effective purifying selection in high-recombination regions of the genome. Finally, we identify an emerging contrast between widespread adaptive hitchhiking effects in species with large outcrossing populations (e.g., Drosophila) versus pervasive background selection effects on the genomes of organisms with self-fertilizing lifestyles and/or small population sizes (e.g., Caenorhabditis elegans, C. briggsae, Arabidopsis thaliana, Lycopersicon, human). These results illustrate how recombination, mutation, selection, and population history interact in important ways to shape molecular heterogeneity within and between genomes.Central to the aims of evolutionary genomics is the characterization of the relative roles of positive and negative selective forces, mutation, recombination, and population history as determinants of the distribution of genetic variants across the genome. The landmark study of Begun and Aquadro (1992) laid bare the power of natural selection in combination with linkage, in particular, to shape genome patterns of genetic variation in Drosophila melanogaster. It is now clear that adaptation via positive selection plays a critical role in Drosophila genome evolution (Begun et al. 2007; Hahn 2008; Sella et al. 2009), but the generality of this result across taxa remains tenebrous, with mounting evidence for important roles of recombination-associated mutation (RAM) and purifying selection for humans and other organisms (Lercher and Hurst 2002; Cutter and Payseur 2003; Hellmann et al. 2003; Innan and Stephan 2003; Nordborg et al. 2005; Cai et al. 2009; Kaiser and Charlesworth 2009; McVicker et al. 2009). In the most sophisticated analysis of selection at linked sites for humans to date, McVicker et al. (2009) found that ongoing and ancestral background selection effects can account for genome-wide patterns of polymorphism, although this does not preclude a contributing role of recurrent genetic hitchhiking. Here we use patterns of nucleotide differences within populations, between isolated populations, and between species to investigate the relative importance of mutation and of selection in current and ancestral populations as drivers of genome sequence polymorphism and divergence for the nematode Caenorhabditis briggsae.Linkage of neutral genetic variants to targets of natural selection is predicted to result in depressed levels of neutral polymorphism, and disproportionately so in regions of the genome that experience little recombination. Theory details how this effect can result from recurrent selective sweeps, due to adaptive evolution causing stronger genetic hitchhiking effects in low-recombination portions of the genome (Maynard Smith and Haigh 1974; Wiehe and Stephan 1993; Andolfatto 2001). But the action of pervasive purifying selection that drives extinct deleterious mutations and the variants linked to them (“background selection”) will generate a similar pattern (Charlesworth et al. 1993; Hudson and Kaplan 1995). Despite some unique predictions (Braverman et al. 1995; Charlesworth et al. 1995; Innan and Stephan 2003), it often is difficult to distinguish these two models of selection at linked sites, and both may operate simultaneously (Kim and Stephan 2000). Moreover, should the process of mutation be tied to recombination (i.e., RAM), a similar disparity in polymorphism levels between high- and low-recombination regions would result (Lercher and Hurst 2002; Hellmann et al. 2003). Experiments in yeast implicate RAM as a real phenomenon (Strathern et al. 1995; Rattray et al. 2002; but see Noor 2008b). If prevalent, this selectively neutral process could undermine selective explanations for patterns of higher population variation in high-recombination regions.For closely related species, the added complication arises of ancestral polymorphism possibly contributing to genomic heterogeneity in measures of divergence (Li 1977; Charlesworth et al. 2005). Specifically, selection at linked sites in the common ancestor could generate a spurious signal of RAM because genealogies will be deeper for genomic regions with more recombination, with a correspondingly elevated contribution of ancestral polymorphism to naive measures of divergence (Hellmann et al. 2003; Begun et al. 2007; Kulathinal et al. 2008; Noor 2008a; McVicker et al. 2009). Indeed, ancestral polymorphism is implicated as the source of correlations between divergence and recombination rate for human–chimpanzee comparisons (McVicker et al. 2009) and at least in part for Drosophila pseudoobscuraDrosophila persimilis (Kulathinal et al. 2008; Noor 2008a). A potential solution to this difficulty in distinguishing neutral from selective processes is to quantify divergence with respect to multiple reproductively isolated lineages that span a range of times to common ancestry. When ancestral polymorphism drives the correlation between divergence and recombination, the association will become weaker with increasing evolutionary distance, provided that the overall recombination environment is conserved. We take this approach here by quantifying sequence differences among individuals within a population, between several reproductively isolated populations, and between distantly related species (Fig. 1).Open in a separate windowFigure 1.Relationships among populations of C. briggsae and species of Caenorhabditis.Populations of the nematode C. briggsae are separated into phylogeographically distinct entities, in which hermaphrodites reproduce predominantly by self-fertilization (Cutter et al. 2006). Self-fertilization has caused extensive linkage disequilibrium across the genome, by reducing the effective recombination rate (Nordborg 2000), potentially resulting in more striking consequences of linked selection than in outcrossing taxa (Baudry et al. 2001). Indeed, compared to outcrossing relatives, overall levels of genetic variation in C. briggsae are roughly 10-fold lower than would be expected from the standard twofold predicted effects of selfing (Graustein et al. 2002; Cutter et al. 2009). Chromosomal regions with low crossover rates also exhibit higher gene density in C. briggsae (Hillier et al. 2007), which may further facilitate linked selection effects in the genome (Payseur and Nachman 2002)—and the lack of defined centromeres for Caenorhabditis chromosomes precludes potential complicating centromeric effects on low-recombination regions (Henikoff et al. 2001). The genomic distribution of single nucleotide differences between two strains of the related species, Caenorhabditis elegans, is consistent with selection having shaped genome patterns (Cutter and Payseur 2003). However, the lack of a close outgroup prevents a strong test of selective and neutral mutational models for C. elegans, whereas C. briggsae presents itself as exceptionally tractable for understanding the forces that govern the distribution of genetic variants along chromosomes. Here we demonstrate how the C. briggsae genome reveals that natural selection provides the dominant force shaping heterogeneity in patterns of neutral polymorphism among genomic regions with disparate crossover rates and that ancestral polymorphism generates spurious signals of RAM in closely related lineages.  相似文献   

2.
3.
Saccharomyces cerevisiae, a well-established model for species as diverse as humans and pathogenic fungi, is more recently a model for population and quantitative genetics. S. cerevisiae is found in multiple environments—one of which is the human body—as an opportunistic pathogen. To aid in the understanding of the S. cerevisiae population and quantitative genetics, as well as its emergence as an opportunistic pathogen, we sequenced, de novo assembled, and extensively manually edited and annotated the genomes of 93 S. cerevisiae strains from multiple geographic and environmental origins, including many clinical origin strains. These 93 S. cerevisiae strains, the genomes of which are near-reference quality, together with seven previously sequenced strains, constitute a novel genetic resource, the “100-genomes” strains. Our sequencing coverage, high-quality assemblies, and annotation provide unprecedented opportunities for detailed interrogation of complex genomic loci, examples of which we demonstrate. We found most phenotypic variation to be quantitative and identified population, genotype, and phenotype associations. Importantly, we identified clinical origin associations. For example, we found that an introgressed PDR5 was present exclusively in clinical origin mosaic group strains; that the mosaic group was significantly enriched for clinical origin strains; and that clinical origin strains were much more copper resistant, suggesting that copper resistance contributes to fitness in the human host. The 100-genomes strains are a novel, multipurpose resource to advance the study of S. cerevisiae population genetics, quantitative genetics, and the emergence of an opportunistic pathogen.Research on Saccharomyces cerevisiae, the most extensively characterized model eukaryote, has historically focused on a very small number of strains, or genetic backgrounds. In particular, most research has focused on the laboratory strain S288c, the first eukaryotic genome to be completely sequenced, assembled, and annotated (Goffeau et al. 1996) and thus the reference S. cerevisiae genome (Engel et al. 2014). However, as with all species, there is more to S. cerevisiae than one strain. For example, array analyses (Muller and McCusker 2009b, 2011; Schacherer et al. 2009; Muller et al. 2011; Dunn et al. 2012), low coverage sequencing (Liti et al. 2009), and higher coverage sequencing (Wei et al. 2007; Doniger et al. 2008; Dowell et al. 2010; Skelly et al. 2013; Bergstrom et al. 2014) of a limited number of additional S. cerevisiae strains identified extensive sequence variation. Studies of S. cerevisiae genetic variation and its influence on phenotypic variation have been limited by the modest number of high quality, complete, assembled, and annotated genome sequences. To address these limitations, we describe here the sequencing, and subsequent de novo, high quality, and extensively manually edited assembly and annotation of the genomes of 93 S. cerevisiae strains of multiple geographic and environmental origins.In addition to isolation from traditional, often human-associated environments (Mortimer and Johnston 1986; Mortimer and Polsinelli 1999; Sniegowski et al. 2002; Cromie et al. 2013), S. cerevisiae is isolated clinically, consistent with its being an emerging opportunistic pathogen (Murphy and Kavanagh 1999; Ponton et al. 2000; Silva et al. 2004; Enache-Angoulvant and Hennequin 2005; Munoz et al. 2005; McCusker 2006; Skovgaard 2007; Pfaller and Diekema 2010; Miceli et al. 2011; Chitasombat et al. 2012). Because a reasonable hypothesis is that human environment-associated S. cerevisiae give rise to clinical S. cerevisiae, we compare 57 nonclinical, mostly human environment-associated strains with 43 clinical strains to gain insight into the emergence of S. cerevisiae as an opportunistic pathogen.These 93 highly accurate, assembled, and annotated genome sequences, together with the genome sequences of S288c (Goffeau et al. 1996), YJM789 (Wei et al. 2007), RM11-1a (RM11 2004), SK1 (Nishant et al. 2010), Σ1278b (Dowell et al. 2010), YPS163 (Doniger et al. 2008), and M22 (Doniger et al. 2008), constitute a novel, multipurpose genetic resource, the “100-genomes” strains. In addition to describing the sequences of the 93 genomes, we describe for the 100-genomes strains their population structure, multiple types of polymorphisms, chromosome rearrangements, aneuploidy, specific phenotypes, genotype-phenotype associations, as well as phenotypic differentiation between strains varying in population ancestry and in nonclinical vs. clinical origin.  相似文献   

4.
5.
6.
An important objective for inferring the evolutionary history of gene families is the determination of orthologies and paralogies. Lineage-specific paralog loss following whole-genome duplication events can cause anciently related homologs to appear in some assays as orthologs. Conserved synteny—the tendency of neighboring genes to retain their relative positions and orders on chromosomes over evolutionary time—can help resolve such errors. Several previous studies examined genome-wide syntenic conservation to infer the contents of ancestral chromosomes and provided insights into the architecture of ancestral genomes, but did not provide methods or tools applicable to the study of the evolution of individual gene families. We developed an automated system to identify conserved syntenic regions in a primary genome using as outgroup a genome that diverged from the investigated lineage before a whole-genome duplication event. The product of this automated analysis, the Synteny Database, allows a user to examine fully or partially assembled genomes. The Synteny Database is optimized for the investigation of individual gene families in multiple lineages and can detect chromosomal inversions and translocations as well as ohnologs (paralogs derived by whole-genome duplication) gone missing. To demonstrate the utility of the system, we present a case study of gene family evolution, investigating the ARNTL gene family in the genomes of Ciona intestinalis, amphioxus, zebrafish, and human.An important objective for inferring the evolutionary history of gene families and chromosome segments is the determination of orthology and paralogy. A stepwise approach generally uses BLAST (basic local alignment search tool) (Altschul et al. 1997) to define coarse relationships among genes, followed by phylogenetic reconstruction to suggest more detailed hypotheses of descent. Events such as gene duplications or whole genome duplications (WGD), with associated differential gene loss, introduce noise into these analyses. Anomalies, such as lineage-specific paralog loss, can cause anciently related homologs to appear to be orthologs, thereby confusing sequence similarity with functional homology (Postlethwait 2007). Such errors can confound attempts to create nonhuman animal disease models and can obscure recent, species-specific evolutionary change among sister lineages.Orthologs are two genes, one in each of two species, that descended from a single gene in the last common ancestor of those two species. Paralogs are a set of genes derived by duplication within a lineage, and together, a group of paralogs can be co-orthologous to their unduplicated ortholog in a related species. Ohnologs are a special subset of paralogs that result from a whole-genome duplication event (Wolfe 2000). The differential loss of genes that follows a duplication event can create ohnologs gone missing when different ohnologs are lost reciprocally in different lineages.Understanding and distinguishing ohnologs gone missing from orthologs is a pervasive problem in vertebrate genomics due to multiple genome duplication events. Two rounds of whole-genome duplication events, called R1 and R2, likely occurred at the base of the vertebrate lineage after the divergence of non-vertebrate chordates and prior to the appearance of jawed vertebrates (Garcia-Fernàndez and Holland 1994; Spring 1997; Dehal and Boore 2005). A third duplication, called R3, likely occurred in the teleost lineage after the divergence of ray-finned and lobe-finned fishes (Amores et al. 1998; Taylor et al. 2003; Jaillon et al. 2004), but before the radiation of the teleosts. Additional genome duplications punctuated the evolution of other lineages, like salmonids, catastomids, goldfish, Xenopus laevis, and even a rodent (Uyeno and Smith 1972; Allendorf and Thorgaard 1984; Schmid and Steinlein 1991; Risinger and Larhammar 1993; Larhammar and Risinger 1994; Gallardo et al. 1999; David et al. 2003; Mungpakdee et al. 2008a,b). Given the pervasive nature of genome duplication in chordates and the importance of teleost fish and Xenopus laevis as model organisms, it is important to develop automated methods to identify true orthologs among groups of paralogs and to distinguish them from more ancient, nonorthologous homologs.Figure 1 illustrates the problem of distinguishing orthologs following duplication and lineage-specific loss of a gene g and some of its neighboring genes after WGD (R1), speciation (S), and a second WGD event (R2) in one of the descendant lineages. In an idealized case, chromosomes would experience few changes in gene order or gene content, as illustrated by genes of the same color in Figure 1. The most common fate of genes created by a WGD event, however, is pseudogenization and nonfunctionalization (Li 1980; Watterson 1983). Surviving duplicates can develop new functions (Ohno 1970) or partition or lose their existing functions (Force et al. 1999; Lynch and Force 2000; Winkler et al. 2003; Postlethwait et al. 2004; Jovelin et al. 2007; Chain et al. 2008; Conant and Wolfe 2008; Jarinova et al. 2008). From the time of the duplication event to the present, duplicated genes can alter their expression patterns (Force et al. 1999) or their exon structure (Altschmied et al. 2002), or their activities (Zhang et al. 2002; Zhang 2003), and such changes can alter protein–protein interactions or subsequent developmental or physiological functions.Open in a separate windowFigure 1.Differential gene loss following whole-genome duplication creates ohnologs gone missing. This image shows the evolutionary history of a gene g and neighbors undergoing a whole-genome duplication event (R1), a speciation event (S), and a second WGD event (R2) that occurs in only one of the descendant lineages, S2. If no changes to the order or composition of genes on the chromosomes occur over time, most algorithms would find that g1a and g1b are co-orthologous to g1, and that g2a and g2b are co-orthologous to g2. In a more realistic evolutionary history, gene losses and chromosomal rearrangements follow the genome duplication event, including a loss of g1 from the S1 lineage and g2a and g2b from the S2 lineage. Due to these losses, orthology assignment algorithms will usually infer that g1a and g1b are co-orthologous to g2, incorrectly assigning co-orthology where there is none. Extinct genes are shown in gray.In the case of differential gene loss and gene rearrangements in lineages S1 and S2, most reciprocal best-hit BLAST algorithms (Wall et al. 2003) would associate gene g2 with g1a and g1b, and most phylogenetic methods, due to a lack of data, would find that the most likely hypothesis of descent was that genes g2, g1a, and g1b shared their most recent common ancestor; in other words, these methods would incorrectly infer that g1a and g1b were orthologs of g2. The erroneous assignment of orthology presents a problem because it implies that the last common ancestor at time S had a single gene with a set of functions that evolved to g1 (and its subsequent duplicates, g1a and g1b) in S2 and g2 in S1, but in fact, no such gene actually existed.To address this problem and to better infer orthologies and paralogies, we can take advantage of conserved synteny—the tendency of neighboring genes to retain their relative positions and orders on chromosomes over evolutionary time. In a WGD event, duplicated chromosomes (homeologs) initially have gene orders identical to each other and to their immediate ancestor. Between the time of duplication and speciation events, however, genes can be lost from one homeolog or the other (unless preserved by structures such as embedded regulatory elements) (Kikuta et al. 2007), and inversions and other chromosome rearrangements can occur independently on the two duplicated homeologs. These events occurring in the chromosomal vicinity of the gene in question give an identity to all of the genes in the neighborhood. In the example given in Figure 1, we could test the hypothesis that genes g1a and g1b are co-orthologous to gene g2 by first examining the neighbors of g1a and g1b—ensuring that a sufficient number of gene neighbors are also paralogous—and then by checking those neighboring paralogs to ensure that they are orthologous to the neighbors of g2. The conserved syntenic region defined by such genes would confirm (or in this case, reject) the co-orthology of genes g1a and g1b to g2. This approach complements the use of BLAST and phylogenetic reconstruction and provides additional evidence to infer the evolutionary history of gene families independent of sequence identities.Several previous studies examined syntenic conservation at a genomic level to determine the nature of the ancestral chromosomes for that organism''s lineage. Evidence for two rounds of genome duplication in stem vertebrates came from a whole-genome analysis of human, mouse, and fugu pufferfish using the urochordate Ciona intestinalis as an outgroup (Dehal and Boore 2005). Analysis of the Tetraodon nigroviridis (green spotted pufferfish) genome and the construction of a dense meiotic map for medaka supported earlier conclusions (Amores et al. 1998; Postlethwait et al. 1998; Woods et al. 2000; Postlethwait et al. 2002; Taylor et al. 2003; Van de Peer et al. 2003) that a third genome duplication had occurred in the teleost fish. Analysis of Tetraodon and medaka provided evidence for a 12-chromosome ancestral vertebrate genome by calculating conserved syntenic regions between the fish and human genomes (Jaillon et al. 2004; Naruse et al. 2004). Subsequent work reconstructed the ancestral vertebrate genome using data from human, chicken, and medaka genomes (Nakatani et al. 2007) and, in opposition to earlier work (Jaillon et al. 2004; Naruse et al. 2004; Woods et al. 2005), concluded that the osteichthyan ancestor had ∼40 chromosomes. These studies provided insights into the architecture of the ancestral genome, but were not convenient for the study of the evolution of individual gene families, because the methods used did not form individual syntenic clusters (Jaillon et al. 2004; Dehal and Boore 2005; Nakatani et al. 2007); instead, they used hand-curated data (Jaillon et al. 2004; Nakatani et al. 2007) or they downplayed portions of the genome that did not fit into the analysis (Dehal and Boore 2005).We have developed an automated system to identify conserved syntenic regions in a primary genome using an outgroup genome that diverged from the investigated lineage before a whole-genome duplication. Our Synteny Database allows for the analysis of fully or partially assembled genomes (Bridgham et al. 2008) and is optimized for the investigation of individual gene families in multiple lineages. The Synteny Database specializes in comparing genomes that have undergone one or more whole-genome duplications; it is able to detect chromosome inversions and translocations, as well as ohnologs gone missing in the gene families investigated. To demonstrate the utility and use of the system, we present a case study of the evolution of the ARNTL gene family in the amphioxus, Ciona intestinalis, zebrafish, and human genomes.  相似文献   

7.
Notwithstanding their biological importance, Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences; due to its high content of repetitive DNA, in most genome projects, the Y chromosome sequence is fragmented into a large number of small, unmapped scaffolds. Identification of Y-linked genes among these fragments has yielded important insights about the origin and evolution of Y chromosomes, but the process is labor intensive, restricting studies to a small number of species. Apart from these fragmentary assemblies, in a few mammalian species, the euchromatic sequence of the Y is essentially complete, owing to painstaking BAC mapping and sequencing. Here we use female short-read sequencing and k-mer comparison to identify Y-linked sequences in two very different genomes, Drosophila virilis and human. Using this method, essentially all D. virilis scaffolds were unambiguously classified as Y-linked or not Y-linked. We found 800 new scaffolds (totaling 8.5 Mbp), and four new genes in the Y chromosome of D. virilis, including JYalpha, a gene involved in hybrid male sterility. Our results also strongly support the preponderance of gene gains over gene losses in the evolution of the Drosophila Y. In the intensively studied human genome, used here as a positive control, we recovered all previously known genes or gene families, plus a small amount (283 kb) of new, unfinished sequence. Hence, this method works in large and complex genomes and can be applied to any species with sex chromosomes.Y chromosomes play a major role in sexual reproduction by harboring master sex-determination genes in many species and male fertility factors in most of them (Bull 1983; Carvalho et al. 2009; Kaiser and Bachtrog 2010; Ezaz and Graves 2012; Hughes and Rozen 2012). Analysis of their origin and evolution has revealed unexpected biological phenomena (Rozen et al. 2003; Carvalho and Clark 2005; Koerich et al. 2008; Lemos et al. 2008; Murtagh et al. 2012), as well as general principles of evolutionary genetics, including the role of recombination and sex-antagonistic genes (Rice 1996; Charlesworth and Charlesworth 2000; Zhou and Bachtrog 2012). However, despite their importance, little is known about Y chromosomes because in many species they are heterochromatic, being composed of highly repetitive DNA that cannot be fully assembled with current technologies (Carvalho et al. 2003; Hoskins et al. 2007). The same issues apply to W chromosomes in ZZ/ZW sex-determination systems (Bull 1983; International Chicken Genome Sequencing Consortium 2004). Mammalian Y chromosomes contain a large euchromatic portion that nonetheless is also very repetitive; in a few species (human, chimp, and macaque), its sequence is nearly complete, owing to painstaking BAC mapping and sequencing (Skaletsky et al. 2003; Hughes and Rozen 2012). These formidable achievements demanded a huge investment of time and resources and placed these Y chromosomes apart (in all other species, only fragmentary assemblies are available, at best). A similar effort successfully assembled the less repetitive portion of the D. melanogaster heterochromatin (Hoskins et al. 2007). It is telling that even in the finished human genome most heterochromatic regions remain unassembled (International Human Genome Sequencing Consortium 2004).Although it is not possible to fully assemble heterochromatic Y chromosomes, Y-linked genes can nonetheless be assembled even if they are deeply buried within repetitive DNA, and this partial genomic data is very informative (Carvalho et al. 2000; Carvalho and Clark 2005; Koerich et al. 2008; Murtagh et al. 2012). In “whole genome shotgun” projects (WGS), which comprise the majority of recent genome projects, the euchromatic portion of chromosomes assemble into large and easily studied scaffolds, whereas heterochromatic regions are represented by thousands of small unmapped scaffolds (International Chicken Genome Sequencing Consortium 2004; Hoskins et al. 2007; Levy et al. 2007). Exons of heterochromatic genes and other islands of unique sequence are faithfully assembled but appear as isolated scaffolds because the repeat-laden introns and intergenic regions cannot be assembled. Further assembly fragmentation in the Y-chromosome is caused by its low coverage (compared to the autosomes) (Carvalho et al. 2003), a consequence of its hemizygosity. A major obstacle to the study of the Y chromosome is to identify among the many unmapped scaffolds those that are Y-linked. This has been done by a combination of computational methods that suggest candidates and a PCR test to confirm Y-linkage (Carvalho et al. 2000; Carvalho and Clark 2005; Koerich et al. 2008; see Chen et al. 2012 for W-linkage). The experimental verification is labor intensive when applied to hundreds of scaffolds but is necessary owing to the high rate of false positives of current computational methods. Nearly all known Drosophila Y-linked genes were identified using this approach (Carvalho et al. 2000; Carvalho and Clark 2005; Carvalho et al. 2009; Krsticevic et al. 2010). When technically feasible, Y-linked scaffolds can be identified by the preparation of separate male and female DNA libraries before WGS sequencing, as these scaffolds would contain only male reads (Krzywinski et al. 2004). This approach is not possible for the majority of the available genome sequences because they employed mixed-sex libraries (also, in mammals, sequencing of a single homogametic female is common practice).Here we show that Y chromosome sequences can be identified with a simple, efficient, and inexpensive method (Y chromosome Genome Scan or YGS) (Fig. 1) suitable for all genome projects that include the heterogametic sex, and apply it to Drosophila and humans.Open in a separate windowFigure 1.Outline of the YGS (Y chromosome Genome Scan) method. Y-linked sequences can be efficiently identified by a comparison of the assembled genome with inexpensive short-reads obtained from female DNA: The Y-linked sequences should get no match, whereas autosomal and X-linked sequences should be nearly completely matched. Efficient removal of all types of repetitive sequences is critical because they are shared between the Y chromosome and the female DNA, and was accomplished by a straight comparison of the short DNA words (k-mers) present in the assembled genome and female short-reads. We successfully applied the YGS method to two very different genomes, D. virilis and human.  相似文献   

8.
9.
Corynebacterium diphtheriae is the causative agent of diphtheria. In 2003, the complete genomic nucleotide sequence of an isolate (NCTC13129) from a large outbreak in the former Soviet Union was published, in which the presence of 13 putative pathogenicity islands (PAIs) was demonstrated. In contrast, earlier work on diphtheria mainly employed the C7(−) strain for genetic analysis; therefore, current knowledge of the molecular genetics of the bacterium is limited to that strain. However, genomic information on the NCTC13129 strain has scarcely been compared to strain C7(−). Another important C. diphtheriae strain is Park-Williams no. 8 (PW8), which has been the only major strain used in toxoid vaccine production and for which genomic information also is not available. Here, we show by comparative genomic hybridization that at least 37 regions from the reference genome, including 11 of the 13 PAIs, are considered to be absent in the C7(−) genome. Despite this, the C7(−) strain still retained signs of pathogenicity, showing a degree of adhesion to Detroit 562 cells, as well as the formation of and persistence in abscesses in animal skin comparable to that of the NCTC13129 strain. In contrast, the PW8 strain, suggested to lack 14 genomic regions, including 3 PAIs, exhibited more reduced signs of pathogenicity. These results, together with great diversity in the presence of the 37 genomic regions among various C. diphtheriae strains shown by PCR analyses, suggest great heterogeneity of this pathogen, not only in genome organization, but also in pathogenicity.Corynebacterium diphtheriae is the causative agent of diphtheria. In 2003, the genomic nucleotide sequence of NCTC13129 (equivalent to ATCC 700971, here referred as the reference strain)—isolated in 1997 during a large outbreak in the former Soviet Union—was published, and 13 putative pathogenicity islands (PAIs) were shown to be present in its genome (9). The 13 PAIs have been annotated based on their unusual GC contents (9). The PAIs include tox (the genetic determinant for diphtheria toxin)-bearing corynebacteriophages, sortase genes (srtA to -E [34]), pilin genes (spaA to -G [34]), lantibiotic synthesis-related genes, and iron uptake-related genes. However, the contributions of these genes to C. diphtheriae pathogenesis have not yet been experimentally determined, except for the toxin, the minor pilins, and some of the sortases (34).In earlier research on diphtheria, the nontoxigenic strain C7(−) (equivalent to ATCC 27010)—isolated in 1949 from a diphtheria contact in California as “culture 770” and later renamed (5, 17)—has been one of the “standard” strains used for analyses of C. diphtheriae in bacteriology and pathogenicity studies (4, 5, 35, 55, 56, 60), including the molecular biology of bacteriophages and diphtheria toxin genes (22, 38, 49-51, 62). More importantly, C7(−) is, in fact, pathogenic to humans. Barksdale et al. documented two cases of laboratory personnel infected with the C7(−) strain and suffering typical clinical manifestations of diphtheria, such as sore throat and pseudomembrane formation (3). The strain is still important in molecular analysis of the bacterium (7, 24, 31, 42). However, information from the reference genome sequence has not yet been fully integrated with other research results, except the proteomics approach of Hansmeier and colleagues (24).The strain Park-Williams no. 8 (PW8), originally isolated from a very mild diphtheria case during the 1890s (46), has been widely used for toxoid vaccine production because of its great ability to secrete diphtheria toxin into the culture supernatant (46). As PW8 is effectively the only strain employed for vaccine production, its importance in public health and the vaccine industry is incomparable. Despite its high toxin-producing activity, the PW8 strain has been regarded as avirulent, as shown by Lampidis and Barksdale by the fact that their experience with this strain for more than 20 years did not show any detectable rise in the serum antibody titer (32).Toxigenic strains of C. diphtheriae produce a potent extracellular protein toxin, i.e., diphtheria toxin (44). The toxin is recognized as the main virulence factor of the bacterium and has been employed for toxoid vaccine with remarkable success (44). The mode of action of this toxin has been extensively studied (37, 40, 41). In contrast to research and application involving diphtheria toxin, our understanding of other factors and mechanisms underlying C. diphtheriae infections remains largely deficient. Nevertheless, several experimental systems have been constructed to clarify the mechanisms, in vitro employing HEp-2 and Detroit 562 cells (6, 26, 27, 34, 43) and in vivo using rabbits and guinea pigs (3, 18, 29, 33, 39).In the present paper, we aimed to relate the genome information of the reference strain to that of the C7(−) and PW8 strains using comparative genomic hybridization (CGH), and we demonstrate that most of the PAIs found in NCTC13129 are considered to be absent in C7(−) but present in PW8. The implications of these findings are discussed in relation to the results of in vivo and in vitro assays of pathogenicity.  相似文献   

10.
Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (30×) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.Structural variants (SVs) in the human genome (Iafrate et al. 2004; Sebat et al. 2004; Feuk et al. 2006a), including copy number variants (CNVs) and balanced rearrangements such as inversions and translocations, play an important role in the genetics of complex disease. Analysis of CNV in diseases such as cancer (Lucito et al. 2000; Pollack et al. 2002; Albertson and Pinkel 2003), and in developmental and neuropsychiatric disorders (Feuk et al. 2006b; Sebat et al. 2007; Kirov et al. 2008, 2009; Marshall et al. 2008; Mefford et al. 2008; Rujescu et al. 2008; Stefansson et al. 2008; Stone et al. 2008; Walsh et al. 2008; Zhang et al. 2008), has led to the identification of novel disease-causing mutations, thus contributing important new insights into the genetics of these disorders.Our current power to detect SVs in disease studies is limited by the resolution of microarray analysis. Currently available array platforms that consist of more than 1 million probes have a lower limit of detection of ∼10–25 kb (McCarroll et al. 2008; Cooper et al. 2008). More comprehensive studies of individual genomes using sequencing-based approaches are capable of detecting CNVs <1 kb in size (Tuzun et al. 2005; Korbel et al. 2007; Bentley et al. 2008; Wang et al. 2008). Thus, new sequencing technologies promise to enable more comprehensive detection of SVs as well as indels and point mutations (Mardis 2008).New computational methods are needed that can reliably identify SVs using next-generation sequencing platforms. To date, multiple approaches have been developed for the detection of SVs that are based on paired-end read mapping (PEM), which detects insertions and deletions by comparing the distance between mapped read pairs to the average insert size of the genomic library (Tuzun et al. 2005; Korbel et al. 2007). Advantages of this approach include the sensitivity for detecting deletions <1 kb in size, and localizing the breakpoint within the region of a small fragment. This approach also has certain limitations. In particular, PEM-based methods have poor ascertainment of SVs in complex genomic regions rich in segmental duplications and have limited ability to detect insertions larger than the average insert size of the library (Tuzun et al. 2005).We sought to develop an alternative approach to the detection of SVs from sequence data that compliments existing methods. Here we used the depth of coverage in sequence data from the Illumina Genome Analyzer to look for genomic regions that differ in copy number between individuals. This method is based on the depth of single reads and, hence, is orthogonal to methods that are based on the mapping of paired-end sequences.To detect CNVs based on read depth (RD), we developed a pipeline consisting of three steps, as illustrated in Figure 1: (1) First, we estimated the coverage or RD in nonoverlapping intervals across an individual genome, (2) we implemented a novel CNV-calling algorithm to detect events, and (3) we compared data from multiple individuals to distinguish events that are polymorphic (i.e., CNVs) from those that show similarly increased or decreased copy number in all individuals in this study (i.e., mononomorphic events). Here we demonstrate the feasibility of this approach and its unique advantages in comparison with other methods of SV detection.Open in a separate windowFigure 1.Pipeline for the detection of CNVs based on analysis of read depth (RD). (A) RD was determined by counting the start position of reads in nonoverlapping windows of 100 bp. (B) Events were detected using a custom CNV-calling algorithm, event-wise testing (EWT). (C) Each event was examined in multiple genomes in order to distinguish polymorphic events (CNVs) from the majority of events that were found to show a similar copy number change in all five genomes in this study (i.e., monomorphic events).  相似文献   

11.
《Genome research》2015,25(6):792-801
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1–20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.Genomic mutations drive human evolution and phenotypic diversity. Comparative genomics studies highlighted important small base-level and large-scale differences between human and chimpanzee genomes and noted a larger impact of segmental duplications compared to single nucleotide variations (SNVs) (Cheng et al. 2005). Whereas interspecies comparisons provide us with insight into long-range processes such as genetic drift and selection, the information derived from direct measurements of the de novo mutation spectrum and rates across generations is crucial for understanding mechanisms of mutation formation and inter-individual differences (Scally and Durbin 2012). While several projects have started to investigate the rates and characteristics of de novo SNVs (Kong et al. 2012; Michaelson et al. 2012; Francioli et al. 2014; Besenbacher et al. 2015), those of de novo short insertions and deletions (indels) and large structural variants (SVs) have been much less studied (Campbell and Eichler 2013).Copy number variations (CNVs) and SVs contribute substantially to human genetic variation (Iafrate et al. 2004; Sebat et al. 2004; Tuzun et al. 2005; Korbel et al. 2007), and the phenotypic impact of CNVs may be larger than that of SNVs (Redon et al. 2006; Stranger et al. 2007; Conrad et al. 2010). The impact of novel changes in genome structure is further illustrated by their role in human genetic disease (Stankiewicz and Lupski 2010; Cooper et al. 2011). Copy number variations are widely studied and have been implicated in a variety of neurological disorders, such as autism (Sebat et al. 2007), schizophrenia (Walsh et al. 2008), and intellectual disability (Cooper et al. 2011). Recent large-scale exome sequencing studies have uncovered de novo SNVs and short indels causing various disease phenotypes, ranging from complex neurological disease to rare Mendelian disorders (Veltman and Brunner 2012).Given the significant contribution of de novo mutations to human disease and evolution, studying genome-wide mutation rates and patterns is important for understanding mutation origins, locating hotspots, estimating disease risk, and interpreting novel disease-associated mutations. Here, we surveyed the entire spectrum of de novo indels (1–20 bp) and SVs (>20 bp) in the human population at nucleotide-resolution using whole-genome sequencing data of 250 families from the Genome of the Netherlands (GoNL) Project (Boomsma et al. 2014; Francioli et al. 2014).  相似文献   

12.
13.
14.
15.
16.
17.
18.
The development of high-throughput genomic technologies has impacted many areas of genetic research. While many applications of these technologies focus on the discovery of genes involved in disease from population samples, applications of genomic technologies to an individual’s genome or personal genomics have recently gained much interest. One such application is the identification of relatives from genetic data. In this application, genetic information from a set of individuals is collected in a database, and each pair of individuals is compared in order to identify genetic relatives. An inherent issue that arises in the identification of relatives is privacy. In this article, we propose a method for identifying genetic relatives without compromising privacy by taking advantage of novel cryptographic techniques customized for secure and private comparison of genetic information. We demonstrate the utility of these techniques by allowing a pair of individuals to discover whether or not they are related without compromising their genetic information or revealing it to a third party. The idea is that individuals only share enough special-purpose cryptographically protected information with each other to identify whether or not they are relatives, but not enough to expose any information about their genomes. We show in HapMap and 1000 Genomes data that our method can recover first- and second-order genetic relationships and, through simulations, show that our method can identify relationships as distant as third cousins while preserving privacy.The field of human genetics has undergone a revolution within the past 10 yr with the advent of high-throughput genomic technologies, which can measure human genetic variation at ever-decreasing costs (Matsuzaki et al. 2004; Gunderson et al. 2005; Wheeler et al. 2008). The development of these technologies was driven by the goal to perform genome-wide association studies (GWASs), where genetic variation information is collected from tens of thousands of individuals and correlated with disease status (Risch and Merikangas 1996; Manolio et al. 2008; Hardy and Singleton 2009). These studies have linked thousands of new genes to dozens of diseases (Hindorff et al. 2009). While GWASs have been the most visible application of high-throughput genotyping technologies, other areas have been revolutionized as well. For example, these technologies have allowed researchers to ask fundamental questions about human history (Liu et al. 2006; Reich et al. 2009; Tishkoff et al. 2009), to identify genetic relationships between individuals (Stankovich et al. 2005; Pemberton et al. 2010; Kyriazopoulou-Panagiotopoulou et al. 2011), and to characterize an individual’s ancestry (Royal et al. 2010). Over the past few years, a personal genomics industry has been established that provides genetic sequencing, genotyping, and analysis services directly to consumers (Genetics and Public Policy Center 2011).One service that is currently provided by several personal genomics companies is the identification of relatives. The idea behind this service is that individuals provide genetic samples that are genotyped and then stored in a database. Each of the samples is compared to the other samples, and any pair of individuals that appears to be genetically related is then notified of a genetic match. Unfortunately, this application requires that individuals release or share their genetic data with other individuals or organizations that they may not necessarily trust. Individual-level genetic data are extremely sensitive, because they are considered health information about an individual. Furthermore, since each individual’s genetic makeup is unique, an individual can be identified even from only a small fraction of his or her genetic data.The genetics community has already been shaken by privacy issues with the discovery by Homer et al. (2008) showing that individuals can be identified within a pool of DNA based only on aggregate statistics about the pool (in this case the frequency of variants). This result surprised the genetics community and the National Institutes of Health (NIH), which, in an effort to make the results of NIH research available to the public, had been publicly releasing variant frequency information on GWAS disease and healthy populations. Given an individual’s DNA information, the observation of Homer et al. (2008) can be exploited to ascertain if the individual was part of any public GWAS studies and if the individual happened to be in a disease cohort. This would expose the disease status of that individual. Understandably, these observations changed the NIH policy overnight, were widely reported in the media (DNA databases shut down after identities were compromised [Editorial] 2008; Genetic privacy [Editorial] 2013), and initiated much research in the area (McGuire 2008; Jacobs et al. 2009; Sankararaman et al. 2009; Heeney et al. 2011; Kahn 2011; Knoppers et al. 2011). More recently, Gymrek et al. (2013) showed that they can reveal the identity of individuals in genetic reference data sets by combining their genetic data with small amounts of data from the individuals—such as their approximate age—and taking advantage of publicly available genetic databases and other data available on the Internet. While it is critically important to protect an individual’s privacy, restrictions on sharing genetic data severely limit the promise of high-throughput genomic technologies for personal genomics and medicine (Wang 2011).In this article, we present a technological solution to the natural tension between privacy and the application of personal genomics technologies by capitalizing on recent breakthroughs in cryptography. We describe a framework that enables individuals who have access to their genomes to identify other individuals to whom they are related while keeping their genetic data private. In this framework, individuals release special-purpose cryptographically protected information about their genome which allows others to determine whether or not they are related to the individual. However, the released information does not contain any useful information about the individual’s genome. We demonstrate our methods by inferring relationships in several HapMap populations (The International HapMap 3 Consortium 2010) and 1000 Genomes populations (The 1000 Genomes Project Consortium 2010). Through simulations, we show that our approach can detect relationships as distant as third cousins while preserving privacy.  相似文献   

19.
20.
Adam Siepel 《Genome research》2009,19(11):1929-1941
Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics.The genome sequence of “Susie,” a female Sumatran orangutan from the Gladys Porter Zoo in Brownsville, Texas, will soon be published (Orangutan Genome Sequencing and Analysis Consortium, in prep.), bringing the total number of sequenced primate species to four (human, chimpanzee, rhesus macaque, and orangutan). Preliminary genome assemblies, with various levels of sequencing coverage, are also available for the gorilla, marmoset, bushbaby, mouse lemur, and tarsier genomes, and work is underway to sequence the gibbon and baboon genomes. Moreover, four additional primate species have been approved for sequencing by the National Human Genome Research Institute (NHGRI) (Fig. 1; Open in a separate windowaOnly approved targets are listed. Proposals are pending for several others, including the owl monkey, Chinese rhesus macaque, pigtail macaque, and sooty mangabey. For the latest information, see http://www.genome.gov/10002154.b(GA) Great Apes; (LA) Lesser Apes (Gibbons); (OWM) Old World Moneys; (NWM) New World Monkeys; (Pro) Prosimians.cThe goal is a high-quality draft assembly in all cases except human (which is finished) and bonobo (which will be surveyed with fosmid-end sequencing).d(BI/MIT) Broad Institute of MIT and Harvard University; (WUGSC) Washington University Genome Sequencing Center; (BCM-HGSC) Baylor College of Medicine Human Genome Sequencing Center; (TIGR/JTC) The Institute for Genomic Research/J. Craig Venter Institute; (WTSI) Wellcome Trust Sanger Institute. All projects are NHGRI-funded except Gorilla.eRefinement in process.fWith targeted BAC finishing.gPreliminary draft assembly available.hLow-coverage (2× Sanger sequencing coverage) assembly complete.Open in a separate windowFigure 1.Phylogeny of primates, showing species for which sequencing is complete, in process, or approved but pending. Three nonprimates—the flying lemur, treeshrew, and mouse—are shown as outgroups. (Cyn. macaque) Cynomolgous macaque, (Rhe. macaque) Rhesus macaque, (Sq. monkey) Squirrel monkey. An approximate time scale, based on estimated dates of divergence from Janecka et al. (2007) (dates >25 Mya), Goodman (1999) (dates 3–25 Mya), Caswell et al. (2008) (chimpanzee/bonobo), and Morales and Melnick (1998) (rhesus/cynomolgous macaque) is shown at the bottom of the figure. Note that the estimated numbers of years before the present reflect DNA sequence divergences and represent upper bounds on speciation times. Nodes are indicated by circles to emphasize that the phylogeny represents both ancestral and extant species, as well as their evolutionary relationships. Note that the prosimians do not form a proper clade but are paraphyletic.Among other things, these new genome sequences will help to identify the genetic basis of differences between primate species, including the genomic features that differentiate humans from other primates (Clark et al. 2003; Pollard et al. 2006b; Prabhakar et al. 2008), to identify and characterize functional sequences present in primates but not other mammals (Boffelli et al. 2003), and to catalog the genomic similarities and differences between humans and nonhuman primates widely used in biomedical research, such as the baboon and rhesus macaque (Rhesus Macaque Genome Sequencing and Analysis Consortium 2007). They will also help to clarify the molecular evolutionary context for human diseases such as AIDS, Alzheimer''s, cancer, and malaria (McConkey and Varki 2000; Rhesus Macaque Genome Sequencing and Analysis Consortium 2007; Degenhardt et al. 2009). In short, these new sequence data will put within reach the grand vision of comprehensive genomic resources for primates that was first articulated nearly a decade ago (McConkey and Varki 2000; Boffelli et al. 2003; Enard and Paabo 2004; Goodman et al. 2005).Perhaps the most informative approach available for comparative genomic analyses of multiple closely related species is to take an evolutionary and phylogenetic perspective—a technique that has been dubbed “phylogenomics” (Eisen and Fraser 2003; Murphy et al. 2004). By explicitly considering the phylogeny by which the species in question are related, phylogenomic methods not only capture the relationships among present-day genomes, but also reveal information about ancestral genomes, and about the lineages on which evolutionary changes have occurred. Moreover, phylogenomics opens up a two-way street between functional and evolutionary analyses, with evolutionary patterns providing information about the potential functions of genomic elements, and functional annotations allowing for richer and more realistic models of evolutionary dynamics. Phylogenomics has been applied widely in many groups of species, including mammals (e.g., Thomas et al. 2003; Rat Genome Sequencing Project Consortium 2004; The ENCODE Project Consortium 2007), yeasts (Cliften et al. 2003; Kellis et al. 2003), drosophilids (Clark et al. 2007; Stark et al. 2007), nematode worms (Stein et al. 2003), and various plants (Yu et al. 2002; Wang et al. 2008). It has already been used extensively within the primates (Boffelli et al. 2003; Rhesus Macaque Genome Sequencing and Analysis Consortium 2007) and is expected to be applied broadly as additional primate genomes become available.Nevertheless, there is an important—and, perhaps, underappreciated—challenge in applying phylogenomic methods to groups of closely related species such as the primates. Most phylogenomic methods inherit from phylogenetics the assumption that there is a single “correct” species phylogeny that holds across the genomes in question, and that present-day genomes have arisen by a stochastic process that operates along the branches of this phylogeny. This modeling approach ignores variation among individuals of the same species, implicitly assuming that it is negligible relative to variation across species. Within the primates, however, this assumption does not hold. Because species divergence times are short relative to ancestral population sizes, population genetic effects become significant, and variation in local genealogies across loci can be considerable. To take one prominent example, it has been estimated that the canonical ((human chimp) gorilla) species phylogeny holds across only about two-thirds of the genome, with the two alternative tree topologies occurring about one-third of the time, due to deep coalescences of ancestral lineages (Patterson et al. 2006; Hobolth et al. 2007; Burgess and Yang 2008). Population genetic effects, of course, are not limited to the primates—they also impact comparative genomics of other groups of interest, such as the drosophilids (e.g., Pollard et al. 2006a)—but my focus here will be on their implications in primate phylogenomics.In this article, I will examine the assumptions that underlie phylogenomic analyses from a population genetic point of view, and discuss their limitations within groups of species, such as the primates, that have experienced short intervals between ancestral speciation events relative to their population sizes. These limitations potentially have important consequences for inferences of rates and patterns of mutation, of positive or negative selection, and of the locations of functional elements. After introducing some basic concepts, I will review several pioneering papers from an emerging literature on “population-aware” phylogenomics, which not only consider interspecies comparisons in a more accurate and realistic way, but also shed light on modes of speciation, ancestral populations, and selective forces within the primates. Finally, I will discuss remaining challenges and future prospects at the intersection of phylogenetics and population genetics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号