首页 | 本学科首页   官方微博 | 高级检索  
检索        


Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling
Authors:Jun Inoue  Yukuto Sato  Robert Sinclair  Katsumi Tsukamoto  Mutsumi Nishida
Abstract:Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis.The recent rapid growth of genome data has made it possible to clarify major evolutionary events that have shaped eukaryote genomes, such as gene duplication, chromosomal rearrangement, and whole-genome duplication (WGD) (1). In particular, WGD events, known to have occurred in several major lineages of flowering plants (2), budding yeasts (3), and vertebrates (4) (Fig. 1A), are considered to have had a major impact on genomic architecture and consequently organismal features.Open in a separate windowFig. 1.Inferred spatiotemporal process of gene loss and persistence after TGD in teleost ancestors. (A) The estimated numbers of gene loss events in the teleost phylogeny, time-scaled tree of vertebrates (11, 41) with the timing of genome duplication events at the base of vertebrates (VGD1/2) and teleosts (TGD), and the number of extant species (26). Species used in this study are connected by solid branches. The numbers were parsimoniously inferred from the presence or absence of TGD-derived gene lineage pairs belonging to 6,892 orthogroups and mapped onto the time points of TGD (306 Mya), nodes ag (a: 245 Mya; b: 158; c: 120; d: 105; e: 41; f: 164; g: 86) (11), and h (74 Mya) (28). On the left side of the tree, ortholog arrangements are compared between representatives (connected by bold branches in the tree) by CIRCOS (circos.ca) using orthology information for 5,655 orthogroups belonging to the 1to1 category (Fig. S2). (B) Definition of terms relating to WGD events. An orthogroup is a monophyletic group containing WGD-derived paralogs (gene lineages) of all focal species (Sp1) and orthologs of their sister species (Sp2), ignoring lineage-specific gene duplications (GeneA-1′ and -1″) or gene loss (GeneA-1″). (C) Approximation of the pattern of the number of gene loss and persistence events associated with TGD. The estimated number of retained paired gene lineages at nodes a to h and current teleosts (Ca, Ze, Co, Ti, Pl, Me, St, Te, and Fu) were used to compare the fit of the one-phase αe–2μt (14)] and two-phase models. (D) Region of C detailing the recent pattern of gene loss. The solid and dashed curves have been corrected upward to remove the bias expected to result from parsimony analysis. These approximations are effectively insensitive to fluctuations in the estimated numbers of gene lineage pairs and times for the TGD event and ancestral nodes (a to h) (SI Text). The evolutionary scenario is essentially unchanged if the number of gene lineage pairs estimated without the BS 70% criterion or the divergence times estimated by nuclear gene (28)/mitochondrial genome (42) data were used. Note that the two-phase model can be roughly approximated by a double-exponential curve.Duplicate genes generated by WGD are typically assumed to be redundant and therefore subsequently lost in a stochastic manner. Comparative genome studies have suggested that 90% of duplicate genes were rapidly lost (5) by a neutral process (6) after WGD in budding yeast, but 20–30% of them were retained in human (7) even after several hundred million years. However, few genome-wide studies have addressed the temporal pattern of gene loss or persistence after WGD with reference to a reliable timescale (but see refs. 6 and 8). Such examination is indispensable for understanding when duplicate genes were lost and, consequently, genome structures were reshaped, during vertebrate diversification after the WGD (Fig. 1).To examine the detailed process of duplicate gene loss after WGD, one needs to estimate the number (proportion) of remaining duplicates in extant and ancestral species. For this purpose, both (i) reliably time-calibrated phylogenetic trees of species and (ii) well-annotated genomes are required. These two requirements have been met for several vertebrate lineages, including some teleost fishes. Given this, the next step should be to accurately estimate orthology and paralogy relationships of all of the genes that experienced WGD. For the analysis of gene orthology and paralogy, a homology search- or synteny-based approach has usually been used (9). In addition to the homology search-based approach (e.g., COGs and OrthoDB), a phylogenetic tree-based approach has also been introduced (e.g., Ensembl and PhylomeDB) (9). Recent developments of tree search algorithms and increased computing power allow a sophisticated tree-based approach, comparing each gene tree with the species tree. Such an approach is indispensable for the effective analysis of gene orthology and paralogy across many species, providing us with a powerful opportunity to investigate genome evolution after WGD.Here, we aim to investigate the gene loss/persistence pattern using genome-wide data, focusing on what is known as the teleost genome duplication (TGD). TGD is estimated to have occurred in an ancestor of teleosts (Fig. 1A) but after the divergence of tetrapods and teleosts (10). Thus, it is a relatively recent WGD shared by a large vertebrate group, i.e., the Teleostei. For teleosts, reliably time-calibrated phylogenies, including phylogenetic position and timing of the TGD event, are available (e.g., ref. 11). In addition, well-annotated whole-genome data from at least nine phylogenetically representative teleost species (cave fish, zebrafish, cod, tilapia, platyfish, medaka, stickleback, Tetraodon, and fugu) are now available from Ensembl (12). In the present study, we inferred the timing of rapid genome reshaping through gene loss after TGD by estimating the temporal and genomic positional (spatiotemporal) loss/persistence pattern of TGD-derived gene lineage pairs (Fig. 1B) over the past several hundred million years, using accurate tree-based orthology estimation (Fig. S1) and a reliable time-calibrated teleost tree. We investigated the mechanism of rapid gene loss after TGD by fitting a newly developed model for the observed temporal pattern of gene loss. This new model is necessary because standard models, based upon random and independent loss of duplicate genes, fail to fit our data. Our model analysis explicitly includes both the possibility of the loss of multiple genes in single events, and also the known phylogeny of the relevant species. The significance of the inclusion of events that result in the loss of multiple genes is that it reproduces the two phases of loss. The inclusion of known phylogeny allows us to correct for the bias associated with parsimony analysis.
Keywords:orthologous gene  bony vertebrates  post-WGD genome evolution
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号