首页 | 本学科首页   官方微博 | 高级检索  
     


Efficient identification of Y chromosome sequences in the human and Drosophila genomes
Authors:Antonio Bernardo Carvalho  Andrew G. Clark
Affiliation:1.Departamento de Genética, Universidade Federal do Rio de Janeiro, Caixa Postal 68011, CEP 21941-971, Rio de Janeiro, Brazil;;2.Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;;3.Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
Abstract:
Notwithstanding their biological importance, Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences; due to its high content of repetitive DNA, in most genome projects, the Y chromosome sequence is fragmented into a large number of small, unmapped scaffolds. Identification of Y-linked genes among these fragments has yielded important insights about the origin and evolution of Y chromosomes, but the process is labor intensive, restricting studies to a small number of species. Apart from these fragmentary assemblies, in a few mammalian species, the euchromatic sequence of the Y is essentially complete, owing to painstaking BAC mapping and sequencing. Here we use female short-read sequencing and k-mer comparison to identify Y-linked sequences in two very different genomes, Drosophila virilis and human. Using this method, essentially all D. virilis scaffolds were unambiguously classified as Y-linked or not Y-linked. We found 800 new scaffolds (totaling 8.5 Mbp), and four new genes in the Y chromosome of D. virilis, including JYalpha, a gene involved in hybrid male sterility. Our results also strongly support the preponderance of gene gains over gene losses in the evolution of the Drosophila Y. In the intensively studied human genome, used here as a positive control, we recovered all previously known genes or gene families, plus a small amount (283 kb) of new, unfinished sequence. Hence, this method works in large and complex genomes and can be applied to any species with sex chromosomes.Y chromosomes play a major role in sexual reproduction by harboring master sex-determination genes in many species and male fertility factors in most of them (Bull 1983; Carvalho et al. 2009; Kaiser and Bachtrog 2010; Ezaz and Graves 2012; Hughes and Rozen 2012). Analysis of their origin and evolution has revealed unexpected biological phenomena (Rozen et al. 2003; Carvalho and Clark 2005; Koerich et al. 2008; Lemos et al. 2008; Murtagh et al. 2012), as well as general principles of evolutionary genetics, including the role of recombination and sex-antagonistic genes (Rice 1996; Charlesworth and Charlesworth 2000; Zhou and Bachtrog 2012). However, despite their importance, little is known about Y chromosomes because in many species they are heterochromatic, being composed of highly repetitive DNA that cannot be fully assembled with current technologies (Carvalho et al. 2003; Hoskins et al. 2007). The same issues apply to W chromosomes in ZZ/ZW sex-determination systems (Bull 1983; International Chicken Genome Sequencing Consortium 2004). Mammalian Y chromosomes contain a large euchromatic portion that nonetheless is also very repetitive; in a few species (human, chimp, and macaque), its sequence is nearly complete, owing to painstaking BAC mapping and sequencing (Skaletsky et al. 2003; Hughes and Rozen 2012). These formidable achievements demanded a huge investment of time and resources and placed these Y chromosomes apart (in all other species, only fragmentary assemblies are available, at best). A similar effort successfully assembled the less repetitive portion of the D. melanogaster heterochromatin (Hoskins et al. 2007). It is telling that even in the finished human genome most heterochromatic regions remain unassembled (International Human Genome Sequencing Consortium 2004).Although it is not possible to fully assemble heterochromatic Y chromosomes, Y-linked genes can nonetheless be assembled even if they are deeply buried within repetitive DNA, and this partial genomic data is very informative (Carvalho et al. 2000; Carvalho and Clark 2005; Koerich et al. 2008; Murtagh et al. 2012). In “whole genome shotgun” projects (WGS), which comprise the majority of recent genome projects, the euchromatic portion of chromosomes assemble into large and easily studied scaffolds, whereas heterochromatic regions are represented by thousands of small unmapped scaffolds (International Chicken Genome Sequencing Consortium 2004; Hoskins et al. 2007; Levy et al. 2007). Exons of heterochromatic genes and other islands of unique sequence are faithfully assembled but appear as isolated scaffolds because the repeat-laden introns and intergenic regions cannot be assembled. Further assembly fragmentation in the Y-chromosome is caused by its low coverage (compared to the autosomes) (Carvalho et al. 2003), a consequence of its hemizygosity. A major obstacle to the study of the Y chromosome is to identify among the many unmapped scaffolds those that are Y-linked. This has been done by a combination of computational methods that suggest candidates and a PCR test to confirm Y-linkage (Carvalho et al. 2000; Carvalho and Clark 2005; Koerich et al. 2008; see Chen et al. 2012 for W-linkage). The experimental verification is labor intensive when applied to hundreds of scaffolds but is necessary owing to the high rate of false positives of current computational methods. Nearly all known Drosophila Y-linked genes were identified using this approach (Carvalho et al. 2000; Carvalho and Clark 2005; Carvalho et al. 2009; Krsticevic et al. 2010). When technically feasible, Y-linked scaffolds can be identified by the preparation of separate male and female DNA libraries before WGS sequencing, as these scaffolds would contain only male reads (Krzywinski et al. 2004). This approach is not possible for the majority of the available genome sequences because they employed mixed-sex libraries (also, in mammals, sequencing of a single homogametic female is common practice).Here we show that Y chromosome sequences can be identified with a simple, efficient, and inexpensive method (Y chromosome Genome Scan or YGS) (Fig. 1) suitable for all genome projects that include the heterogametic sex, and apply it to Drosophila and humans.Open in a separate windowFigure 1.Outline of the YGS (Y chromosome Genome Scan) method. Y-linked sequences can be efficiently identified by a comparison of the assembled genome with inexpensive short-reads obtained from female DNA: The Y-linked sequences should get no match, whereas autosomal and X-linked sequences should be nearly completely matched. Efficient removal of all types of repetitive sequences is critical because they are shared between the Y chromosome and the female DNA, and was accomplished by a straight comparison of the short DNA words (k-mers) present in the assembled genome and female short-reads. We successfully applied the YGS method to two very different genomes, D. virilis and human.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号