首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Teleost fishes comprise one-half of all vertebrate species and possess a duplicated genome. This whole-genome duplication (WGD) occurred on the teleost stem lineage in an ancient common ancestor of all living teleosts and is hypothesized as a trigger of their exceptional evolutionary radiation. Genomic and phylogenetic data indicate that WGD occurred in the Mesozoic after the divergence of teleosts from their closest living relatives but before the origin of the extant teleost groups. However, these approaches cannot pinpoint WGD among the many extinct groups that populate this 50- to 100-million-y lineage, preventing tests of the evolutionary effects of WGD. We infer patterns of genome size evolution in fossil stem-group teleosts using high-resolution synchrotron X-ray tomography to measure the bone cell volumes, which correlate with genome size in living species. Our findings indicate that WGD occurred very early on the teleost stem lineage and that all extinct stem-group teleosts known so far possessed duplicated genomes. WGD therefore predates both the origin of proposed key innovations of the teleost skeleton and the onset of substantial morphological diversification in the clade. Moreover, the early occurrence of WGD allowed considerable time for postduplication reorganization prior to the origin of the teleost crown group. This suggests at most an indirect link between WGD and evolutionary success, with broad implications for the relationship between genomic architecture and large-scale evolutionary patterns in the vertebrate Tree of Life.

Whole-genome duplication (WGD) has occurred independently in multiple lineages of plants, fungi, and animals (13). This represents a major change to genomic architecture, with hypothesized impacts on evolutionary diversification (4, 5) caused by the origin of new gene functions from duplicate copies, expanding the genetic toolbox available for evolutionary “tinkering” (6). However, despite its mechanistic plausibility, this hypothesis is so far supported by only limited and contradictory empirical evidence (710). Teleost fishes—comprising more than one-half of modern vertebrates—are a key example, with their spectacular variety of form and kind (ranging from eels to seahorses) often viewed as prima facie evidence for the role of WGD in triggering evolutionary diversification (6, 11). Teleosts also show an incredible diversity of genome biology, demonstrating particularly high rates of evolution of protein-coding genes (12) and noncoding elements (13), a broad range of genome sizes including the smallest known in vertebrates (14), and multiple polyploid lineages (15).The genome of all living teleosts derives from an ancient WGD event that occurred before the last common ancestor of modern species (16). Additional duplication events occurred more recently in several teleost subgroups (9, 17) but are not generally proposed as drivers of diversification (9). Studies of the role of WGD in contributing to teleost diversity so far have analyzed the distribution of species richness among extant lineages and morphometric data for fossil phenotypes, with potentially conflicting results: extant teleosts have high rates of lineage diversification compared to other ray-finned fishes (7), but early fossil members of the teleost crown group do not show increased rates of morphological evolution (18).Molecular phylogenetic studies indicate that WGD occurred on the teleost stem lineage: after the divergence of teleosts from their extant sister taxon (Holostei) but before the most recent common ancestor of all living teleosts (19, 20). However, these bounds encompass a large phylogenetic diversity of extinct groups that diverged during an interval of 50 to 100 million y, from the initial origin of the teleost total group by the Triassic (21), up to the first appearance of crown-group teleosts in the Late Jurassic (18, 22). Molecular-clock estimates provide only broad constraints on the precise timing of duplication [316 to 226 Ma (23); ∼310 Ma (24)] and offer no information on its phylogenetic position on the teleost stem lineage. The imprecision of these estimates and the sometimes-considerable incongruence of molecular clocks with the teleost fossil record question the reliability of these inferences in the absence of further evidence.Patterns of genome-size evolution on the teleost stem lineage could provide alternative and independent evidence on the timing and phylogenetic position of the teleost WGD. However, stem lineages, by definition, comprise entirely extinct species that are known only from fossils, for which genomic data are absent. Nevertheless, some information about vertebrate genome size is preserved within fossil bone (2527). Living organisms show a positive correlation between cell size and genome size (2830), such that the volumes of bone cell spaces (osteocyte lacunae) allow estimates of genome size. This relationship has been demonstrated in ray-finned fishes, including teleosts, and is predictive for large-scale variation in genome size (31). The precision of this approach is sufficient for inferring the large change (presumably, doubling) of genome size involved in WGD (31). Here, we use this relationship to trace the evolution of genome size in extinct ray-finned fishes using osteocyte lacuna volumes as a proxy for genome size. Our sample includes a broad range of stem- and crown-group teleosts, providing information on patterns of teleost genome-size evolution during the deep evolutionary history of the teleost total group.Three-dimensional measurement of fossil bone cell spaces with μm-scale diameters presents considerable technical challenges. We used propagation phase contrast synchrotron radiation X-ray microcomputed tomography (PPC-SRµCT) to address this, collecting standardized measurements of osteocyte lacuna volumes for 61 fossil ray-finned fish species ranging from 2.5 to 252 million y in age (SI Appendix, section I). This fossil evidence is complemented by data from a previous study including 34 modern ray-finned fish species with known genome sizes (31). Our fossil sample includes all major groups of stem-group teleosts, members of both living and extinct lineages within the teleost crown group, and several nonteleost ray-finned fishes. This sample allows us to estimate relative genome sizes in extinct groups, providing information on the absolute timing and specific phylogenetic position of the teleost WGD as well as the timescale of postduplication reductions in genome size (24). Both statistical analysis and qualitative observations demonstrate the effectiveness of lacuna size for inferring large evolutionary increases in genome size: known polyploid lineages such as catostomids and salmonids, which underwent additional rounds of WGD, both show large osteocyte lacuna volumes compared to their close relatives (31).  相似文献   

2.
We compared whole-exome sequencing (WES) and whole-genome sequencing (WGS) in six unrelated individuals. In the regions targeted by WES capture (81.5% of the consensus coding genome), the mean numbers of single-nucleotide variants (SNVs) and small insertions/deletions (indels) detected per sample were 84,192 and 13,325, respectively, for WES, and 84,968 and 12,702, respectively, for WGS. For both SNVs and indels, the distributions of coverage depth, genotype quality, and minor read ratio were more uniform for WGS than for WES. After filtering, a mean of 74,398 (95.3%) high-quality (HQ) SNVs and 9,033 (70.6%) HQ indels were called by both platforms. A mean of 105 coding HQ SNVs and 32 indels was identified exclusively by WES whereas 692 HQ SNVs and 105 indels were identified exclusively by WGS. We Sanger-sequenced a random selection of these exclusive variants. For SNVs, the proportion of false-positive variants was higher for WES (78%) than for WGS (17%). The estimated mean number of real coding SNVs (656 variants, ∼3% of all coding HQ SNVs) identified by WGS and missed by WES was greater than the number of SNVs identified by WES and missed by WGS (26 variants). For indels, the proportions of false-positive variants were similar for WES (44%) and WGS (46%). Finally, WES was not reliable for the detection of copy-number variations, almost all of which extended beyond the targeted regions. Although currently more expensive, WGS is more powerful than WES for detecting potential disease-causing mutations within WES regions, particularly those due to SNVs.Whole-exome sequencing (WES) is routinely used and is gradually being optimized for the detection of rare and common genetic variants in humans (18). However, whole-genome sequencing (WGS) is becoming increasingly attractive as an alternative, due to its broader coverage and decreasing cost (911). It remains difficult to interpret variants lying outside the protein-coding regions of the genome. Diagnostic and research laboratories, whether public or private, therefore tend to search for coding variants, most of which can be detected by WES, first. Such variants can also be detected by WGS, and several studies previously compared WES and WGS for different types of variations and/or in different contexts (9, 1116), but none of them in a really comprehensive manner. Here, we compared WES and WGS, in terms of detection rates and quality, for single-nucleotide variants (SNVs), small insertions/deletions (indels), and copy-number variants (CNVs) within the regions of the human genome covered by WES, using the most recent next-generation sequencing (NGS) technologies. We aimed to identify the most efficient and reliable approach for identifying these variants in coding regions of the genome, to define the optimal analytical filters for decreasing the frequency of false-positive variants, and to characterize the genes that were either hard to sequence by either approach or were poorly covered by WES kits.  相似文献   

3.
An approximation to the ∼4-Mbp basic genome shared by 32 strains of Escherichia coli representing six evolutionary groups has been derived and analyzed computationally. A multiple alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ∼90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single base-pair mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome pairs have one or two recombinant transfers of length ∼40–115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4–1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kilobase pairs. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by coevolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.The increasing availability of complete genome sequences of many different bacterial and archaeal species, as well as metagenomic sequencing of mixed populations from natural environments, has stimulated theoretical and computational approaches to understand mechanisms of speciation and how prokaryotic species should be defined (18). Much genome analysis and comparison has been at the level of gene content, identifying core genomes (the set of genes found in most or all genomes in a group) and the continually expanding pan-genome. Population genomics of Escherichia coli has been particularly well studied because of its long history in laboratory research and because many pathogenic strains have been isolated and completely sequenced (914). Proposed models of how related groups or species form and evolve include isolation by ecological niche (79, 11, 15), decreased homologous recombination as divergence between isolated populations increases (24, 8, 14, 16), and coevolving phage and bacterial populations (6).E. coli genomes are highly variable, containing an array of phage-related mobile elements integrated at many different sites (17), random insertions of multiple transposable elements (18), and idiosyncratic genome rearrangements that include inversions, translocations, duplications, and deletions. Although E. coli grows by binary cell division, genetic exchange by homologous recombination has come to be recognized as a significant factor in adaptation and genome evolution (9, 10, 19). Of particular interest has been the relative contribution to genome variability of random mutations (single base-pair differences referred to as SNPs) and replacement of genome regions by homologous recombination with fragments imported from other genomes (here referred to as recombinant transfers or transferred regions). Estimates of the rate, extent, and average lengths of recombinant transfers in the core genome vary widely, as do methods for detecting transferred regions and assessing their impact on phylogenetic relationships (1214, 20, 21).In a previous comparison of complete genome sequences of the K-12 reference strain MG1655 and the reconstructed genome of the B strain of Delbrück and Luria referred to here as B-DL, we observed that SNPs are not randomly distributed among 3,620 perfectly matched pairs of coding sequences but rather have two distinct regimes: sharply decreasing numbers of genes having 0, 1, 2, or 3 SNPs, and an abrupt transition to a much broader exponential distribution in which decreasing numbers of genes contain increasing numbers of SNPs from 4 to 102 SNPs per gene (22). Genes in the two regimes of the distribution are interspersed in clusters of variable lengths throughout what we referred to as the basic genome, namely, the ∼4 Mbp shared by the two genomes after eliminating mobile elements. We speculated that genes having 0 to 3 SNPs may primarily have been inherited clonally from the last common ancestor, whereas genes comprising the exponential tail may primarily have been acquired by horizontal transfer from diverged members of the population.The current study was undertaken to extend these observations to a diverse set of 32 completely sequenced E. coli genomes and to analyze how SNP distributions in the basic genome change as a function of evolutionary divergence between the 496 pairs of strains in this set. We have taken a simpler approach than those of Touchon et al. (13), Didelot et al. (14), and McNally et al. (21), who previously analyzed multiple alignments of complete genomes of E. coli strains. The appreciably larger basic genome derived here is not restricted to protein-coding sequences and retains positional information.  相似文献   

4.
5.
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold’s purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5 and 3 untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.

RNA plays important roles in many cellular processes (1, 2). To maintain their functions, secondary structures of RNA homologs are conserved across evolution (35). These conserved structures provide critical targets for diagnostics and treatments. Thus, there is a need for developing fast and accurate computational methods to identify structurally conserved regions.Commonly, conserved structures involve compensatory base pair changes, where two positions in primary sequences mutate across evolution and still conserve a base pair; for instance, an AU or a CG pair replaces a GC pair in homologous sequences. These compensatory changes provide strong evidence for evolutionarily conserved structures (610). Meanwhile, they make it harder to align sequences when structures are unknown. Initially, the process of determining a conserved structure, termed comparative sequence analysis, was manual and required substantial insight to identify the conserved structure. A notable early achievement was the determination of the conserved transfer RNA (tRNA) secondary structure (11). Comparative analysis was also demonstrated to be 97% accurate compared to crystal structures for ribosomal RNAs, where the models were refined carefully over time (12).To automate comparative analysis, three distinct algorithmic approaches were developed (13, 14). The first, “joint fold-and-align” method, seeks to simultaneously predict structures and a structural alignment for two or more sequences. This was first proposed by Sankoff (15) using a dynamic programming algorithm. The major limitation of this approach is that the algorithm runs in O(n3k) against k sequences with the average sequence length n. Several software packages provide implementations of the Sankoff algorithm (1621) that use simplifications to reduce runtime. The second, “align-then-fold” approach, is to input a sequence alignment and predict the conserved structure that can be identified across sequences in the alignment. This was described by Waterman (22) and was subsequently refined and popularized by RNAalifold (23). The third, “fold-then-align” approach, is to predict plausible structures for the sequences and then align the structures to determine the sequence alignment and the optimal conserved structures. This was described by Waterman (24) and implemented in RNAforester (25) and MARNA (26) (SI Appendix, Fig. S1).As an alternative, TurboFold II (27), an extension of TurboFold (28), provides a more computationally efficient method to align and fold sequences. Taking multiple unaligned sequences as input, TurboFold II iteratively refines alignments and structure predictions so that they conform more closely to each other and converge on conserved structures. TurboFold II is significantly more accurate than other methods (16, 18, 23, 29, 30) when tested on RNA families with known structures and alignments.However, the cubic runtime and quadratic memory usage of TurboFold II prevent it from scaling to longer sequences such as full-length severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes, which contain ∼30,000 nucleotides; in fact, no joint-align-and-fold methods can scale to these genomes, which are the longest among RNA viruses. As a (not very principled) workaround, most existing efforts for modeling SARS-CoV-2 structures (3136) resort to local folding methods (37, 38) with sliding windows plus a limited pairing distance, abandoning all long-range interactions, and only consider one SARS-CoV-2 genome (Fig. 1 B and C), ignoring signals available in multiple homologous sequences. To address this challenge, we designed a linearized version of TurboFold II, LinearTurboFold (Fig. 1A), which is a global homologous folding algorithm that scales linearly with sequence length. This linear runtime makes it, to our knowledge, the first joint-fold-and-align algorithm scale to full-length coronavirus genomes without any constraints on window size or pairing distance, taking about 13 h to analyze a group of 25 SARS-CoV homologs. It also leads to significant improvement on secondary structure prediction accuracy as well as an alignment accuracy comparable to or higher than all benchmarks.Open in a separate windowFig. 1.(A) The LinearTurboFold framework. Like TurboFold II, LinearTurboFold takes multiple unaligned homologous sequences as input and outputs a secondary structure for each sequence and a multiple-sequence alignment (MSA). But unlike TurboFold II, LinearTurboFold employs two linearizations to ensure linear runtime: a linearized alignment computation (module 1) to predict posterior coincidence probabilities (red squares) for all pairs of sequences (first four sections in Methods) and a linearized partition function computation (module 2) to estimate base-pairing probabilities (yellow triangles) for all the sequences (Methods, Extrinsic Information Calculation and Methods, LinearPartition for Base Pairing Probabilities Estimation with Extrinsic Information). These two modules take advantage of information from each other and iteratively refine predictions (SI Appendix, Fig. S2). After several iterations, module 3 generates the final multiple-sequence alignments (Methods, MSA Generation and Secondary Structure Prediction), and module 4 predicts secondary structures. Module 5 can stochastically sample structures. (B and C) Prior studies (3136) [except for the purely experimental work by Ziv et al. (39)] used local folding methods with limited window size and maximum pairing distance. B shows the local folding of the SARS-CoV-2 genome by Huston et al. (32), which used a window of 3,000 nt that was advanced 300 nt. It also limited the distance between nucleotides that can form base pair at 500. Some studies also used homologous sequences to identify conserved structures (3236), but they predicted only structures for one genome and utilized sequence alignments to identify mutations. By contrast, LinearTurboFold is a global folding method without any limitations on sequence length or paring distance, and it jointly folds and aligns homologs to obtain conserved structures. Consequently, LinearTurboFold can capture long-range interactions even across the whole genome (the long arc in B and Fig. 3).Over a group of 25 SARS-CoV-2 and SARS-related homologous genomes, LinearTurboFold predictions are close to the canonical structures (40) and structures modeled with the aid of experimental data (3234) for several well-studied regions. Due to global rather than local folding, LinearTurboFold discovers a long-range interaction involving 5 and 3 untranslated regions (UTRs) (∼29,800 nt apart), which is consistent with recent purely experimental work (39) and yet is out of reach for local folding methods used by existing studies (Fig. 1 B and C). In short, our in silico method of folding multiple homologs can achieve results similar to, and sometimes more accurate than, those of experimentally guided models for one genome. Moreover, LinearTurboFold identifies conserved structures supported by compensatory mutations, which are potential targets for small-molecule drugs (41) and antisense oligonucleotides (ASOs) (36). We further identify regions that are 1) sequence-level conserved; 2) at least 15 nt long; and 3) accessible (i.e., likely to be completely unpaired) as potential targets for ASOs (42), small interfering RNA (siRNA) (43), CRISPR-Cas13 guide RNA (gRNA) (44), and RT-PCR primers (45). LinearTurboFold is a general technique that can also be applied to other RNA viruses (e.g., influenza, Ebola, HIV, Zika, etc.) and full-length genome studies.  相似文献   

6.
Microchromosomes, once considered unimportant shreds of the chicken genome, are gene-rich elements with a high GC content and few transposable elements. Their origin has been debated for decades. We used cytological and whole-genome sequence comparisons, and chromosome conformation capture, to trace their origin and fate in genomes of reptiles, birds, and mammals. We find that microchromosomes as well as macrochromosomes are highly conserved across birds and share synteny with single small chromosomes of the chordate amphioxus, attesting to their origin as elements of an ancient animal genome. Turtles and squamates (snakes and lizards) share different subsets of ancestral microchromosomes, having independently lost microchromosomes by fusion with other microchromosomes or macrochromosomes. Patterns of fusions were quite different in different lineages. Cytological observations show that microchromosomes in all lineages are spatially separated into a central compartment at interphase and during mitosis and meiosis. This reflects higher interaction between microchromosomes than with macrochromosomes, as observed by chromosome conformation capture, and suggests some functional coherence. In highly rearranged genomes fused microchromosomes retain most ancestral characteristics, but these may erode over evolutionary time; surprisingly, de novo microchromosomes have rapidly adopted high interaction. Some chromosomes of early-branching monotreme mammals align to several bird microchromosomes, suggesting multiple microchromosome fusions in a mammalian ancestor. Subsequently, multiple rearrangements fueled the extraordinary karyotypic diversity of therian mammals. Thus, microchromosomes, far from being aberrant genetic elements, represent fundamental building blocks of amniote chromosomes, and it is mammals, rather than reptiles and birds, that are atypical.

Classic cytological studies described mammalian chromosomes of a size easily visible under the microscope. Bird and reptile karyotypes were strikingly different, with a size discontinuity between macrochromosomes, with sizes (3 to 6 µm) in the range of mammalian chromosomes and microchromosomes (<0.5 µm) which looked more like specks of dust (e.g., refs. 13). These microchromosomes stained oddly and occupied a central position at mitosis (4).An early view of microchromosomes as inconstant heterochromatic elements (5), or even not chromosomes at all, was thoroughly debunked (1, 68). Like macrochromosomes, they possess a centromere and telomeres at each end (with extralong subtelomeric repeats) (9) and segregate regularly at mitosis. Microchromosomes are GC-rich and gene-dense with a low content of repetitive sequence (10) and have high rates of recombination. They replicate early and are hyperacetylated compared to macrochromosomes, suggesting they are highly transcribed.At the cytological level, most birds have extremely conserved karyotypes, including 9 pairs of macrochromosomes and 30 to 32 pairs of microchromosomes (3), defined by relative sizes where there is no abrupt size discontinuity. Chromosome constitutions of birds are listed in ref. 11. Although there are some spectacular exceptions, especially in the highly rearranged falcon and the parrot genomes, even distantly related birds such as chicken and emu share nine macrochromosome pairs identified by banding patterns, chromosome painting, and gene mapping (8, 1215).Microchromosomes are too small to distinguish morphologically, let alone by G-band patterns, and pairing them is mostly guesswork. However, their number is usually constant and even, as expected for paired autosomes in diploids. Cytological examination, using specific DNA probes, suggests conservation of microchromosomes across 22 avian species (16), and comparative gene mapping and whole-genome analysis attests to considerable conservation among distantly related bird groups (17, 18). A few bird species have more microchromosomes, but chromosome painting reveals their recent origin from fission (14). Genome sequencing of many bird species now provides unprecedented detail sufficient to compare microchromosomes across avian species (19).Fewer comparative studies of microchromosome conservation have been done in reptiles, but their genome structures are similar to those of birds, many with an abrupt distinction between a few macrochromosomes and many microchromosomes (reviewed in refs. 2022). However, turtles and snakes have fewer microchromosomes than birds. There is G-band and chromosome painting homology between the macrochromosomes of birds and turtles (23), and a close relationship between the chromosomes of birds and squamates (snakes and lizards) was noted early (24). Gene mapping and sequence comparisons reveal many homologous synteny blocks (8, 25), and sequence comparisons show that several microchromosomes are conserved at the sequence level (26). Lizard karyotypes are more variable; some species have clearly demarcated macro- and microchromosomes, whereas others show no clear distinction.There are exceptional reptile and bird clades in which no abrupt size difference defines microchromosomes, and the size range of microchromosomes can also vary between clades. For example, eagle and parrot genomes have few microchromosomes (27) and crocodilians have five very large macrochromosomes and few chromosomes in the microchromosome size range (28).The origin of microchromosomes has been debated for decades. Initially they were thought to represent some sort of breakdown product of “normal” mammalian-like macrochromosomes that existed in amniote, even tetrapod, ancestors (29), and this view is still expressed (e.g., ref. 30). The alternative view is that at least some of them represent the small chromosomes of a vertebrate ancestor 400 Ma, retained intact by several vertebrate clades (8, 26, 31). Similarities with the small chromosomes of amphioxus (the lancelet, an early branching chordate) now suggest a much earlier origin (32), dating back to at least 684 My since they last shared a common ancestor with vertebrates.With the availability of several chromosome-scale assemblies of bird and reptile genomes (10, 19), it is now possible to trace the origin and fate of microchromosomes in birds, reptiles, and mammals. We compared the genomes of 7 birds and 10 reptiles with chromosome-level assemblies, as well as three mammals and an amphioxus (Fig. 1). These comparisons provide evidence that, indeed, microchromosomes represent a set of highly conserved ancient animal chromosomes, whereas macrochromosomes, which are considered “normal” because of their ubiquity in mammals, have undergone multiple lineage-specific rearrangements, especially in mammals. We gather evidence that microchromosomes retain a high frequency of interchromosome interaction inside the nucleus and regularly locate together at interphase and division, suggesting retention of an ancestral functional coherence between a set of small ancestral chromosomes.Open in a separate windowFig. 1.Phylogenetic relationships of reptiles, birds, mammals, and amphioxus genome assemblies compared in this study. Cytological chromosome numbers (n) are shown, along with the number of assembled macrochromosomes and microchromosomes (their percentage of the anchored genome) and genome size. Species names and full common names are given in SI Appendix, Table S1; in the text they are referred to by abbreviated common names.  相似文献   

7.
Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis.The recent rapid growth of genome data has made it possible to clarify major evolutionary events that have shaped eukaryote genomes, such as gene duplication, chromosomal rearrangement, and whole-genome duplication (WGD) (1). In particular, WGD events, known to have occurred in several major lineages of flowering plants (2), budding yeasts (3), and vertebrates (4) (Fig. 1A), are considered to have had a major impact on genomic architecture and consequently organismal features.Open in a separate windowFig. 1.Inferred spatiotemporal process of gene loss and persistence after TGD in teleost ancestors. (A) The estimated numbers of gene loss events in the teleost phylogeny, time-scaled tree of vertebrates (11, 41) with the timing of genome duplication events at the base of vertebrates (VGD1/2) and teleosts (TGD), and the number of extant species (26). Species used in this study are connected by solid branches. The numbers were parsimoniously inferred from the presence or absence of TGD-derived gene lineage pairs belonging to 6,892 orthogroups and mapped onto the time points of TGD (306 Mya), nodes ag (a: 245 Mya; b: 158; c: 120; d: 105; e: 41; f: 164; g: 86) (11), and h (74 Mya) (28). On the left side of the tree, ortholog arrangements are compared between representatives (connected by bold branches in the tree) by CIRCOS (circos.ca) using orthology information for 5,655 orthogroups belonging to the 1to1 category (Fig. S2). (B) Definition of terms relating to WGD events. An orthogroup is a monophyletic group containing WGD-derived paralogs (gene lineages) of all focal species (Sp1) and orthologs of their sister species (Sp2), ignoring lineage-specific gene duplications (GeneA-1′ and -1″) or gene loss (GeneA-1″). (C) Approximation of the pattern of the number of gene loss and persistence events associated with TGD. The estimated number of retained paired gene lineages at nodes a to h and current teleosts (Ca, Ze, Co, Ti, Pl, Me, St, Te, and Fu) were used to compare the fit of the one-phase [αe–2μt (14)] and two-phase models. (D) Region of C detailing the recent pattern of gene loss. The solid and dashed curves have been corrected upward to remove the bias expected to result from parsimony analysis. These approximations are effectively insensitive to fluctuations in the estimated numbers of gene lineage pairs and times for the TGD event and ancestral nodes (a to h) (SI Text). The evolutionary scenario is essentially unchanged if the number of gene lineage pairs estimated without the BS 70% criterion or the divergence times estimated by nuclear gene (28)/mitochondrial genome (42) data were used. Note that the two-phase model can be roughly approximated by a double-exponential curve.Duplicate genes generated by WGD are typically assumed to be redundant and therefore subsequently lost in a stochastic manner. Comparative genome studies have suggested that 90% of duplicate genes were rapidly lost (5) by a neutral process (6) after WGD in budding yeast, but 20–30% of them were retained in human (7) even after several hundred million years. However, few genome-wide studies have addressed the temporal pattern of gene loss or persistence after WGD with reference to a reliable timescale (but see refs. 6 and 8). Such examination is indispensable for understanding when duplicate genes were lost and, consequently, genome structures were reshaped, during vertebrate diversification after the WGD (Fig. 1).To examine the detailed process of duplicate gene loss after WGD, one needs to estimate the number (proportion) of remaining duplicates in extant and ancestral species. For this purpose, both (i) reliably time-calibrated phylogenetic trees of species and (ii) well-annotated genomes are required. These two requirements have been met for several vertebrate lineages, including some teleost fishes. Given this, the next step should be to accurately estimate orthology and paralogy relationships of all of the genes that experienced WGD. For the analysis of gene orthology and paralogy, a homology search- or synteny-based approach has usually been used (9). In addition to the homology search-based approach (e.g., COGs and OrthoDB), a phylogenetic tree-based approach has also been introduced (e.g., Ensembl and PhylomeDB) (9). Recent developments of tree search algorithms and increased computing power allow a sophisticated tree-based approach, comparing each gene tree with the species tree. Such an approach is indispensable for the effective analysis of gene orthology and paralogy across many species, providing us with a powerful opportunity to investigate genome evolution after WGD.Here, we aim to investigate the gene loss/persistence pattern using genome-wide data, focusing on what is known as the teleost genome duplication (TGD). TGD is estimated to have occurred in an ancestor of teleosts (Fig. 1A) but after the divergence of tetrapods and teleosts (10). Thus, it is a relatively recent WGD shared by a large vertebrate group, i.e., the Teleostei. For teleosts, reliably time-calibrated phylogenies, including phylogenetic position and timing of the TGD event, are available (e.g., ref. 11). In addition, well-annotated whole-genome data from at least nine phylogenetically representative teleost species (cave fish, zebrafish, cod, tilapia, platyfish, medaka, stickleback, Tetraodon, and fugu) are now available from Ensembl (12). In the present study, we inferred the timing of rapid genome reshaping through gene loss after TGD by estimating the temporal and genomic positional (spatiotemporal) loss/persistence pattern of TGD-derived gene lineage pairs (Fig. 1B) over the past several hundred million years, using accurate tree-based orthology estimation (Fig. S1) and a reliable time-calibrated teleost tree. We investigated the mechanism of rapid gene loss after TGD by fitting a newly developed model for the observed temporal pattern of gene loss. This new model is necessary because standard models, based upon random and independent loss of duplicate genes, fail to fit our data. Our model analysis explicitly includes both the possibility of the loss of multiple genes in single events, and also the known phylogeny of the relevant species. The significance of the inclusion of events that result in the loss of multiple genes is that it reproduces the two phases of loss. The inclusion of known phylogeny allows us to correct for the bias associated with parsimony analysis.  相似文献   

8.
Most genetic changes have negligible reversion rates. As most mutations that confer resistance to an adverse condition (e.g., drug treatment) also confer a growth defect in its absence, it is challenging for cells to genetically adapt to transient environmental changes. Here, we identify a set of rapidly reversible drug-resistance mutations in Schizosaccharomyces pombe that are caused by microhomology-mediated tandem duplication (MTD) and reversion back to the wild-type sequence. Using 10,000× coverage whole-genome sequencing, we identify nearly 6,000 subclonal MTDs in a single clonal population and determine, using machine learning, how MTD frequency is encoded in the genome. We find that sequences with the highest-predicted MTD rates tend to generate insertions that maintain the correct reading frame, suggesting that MTD formation has shaped the evolution of coding sequences. Our study reveals a common mechanism of reversible genetic variation that is beneficial for adaptation to environmental fluctuations and facilitates evolutionary divergence.

Different mechanisms of adaptation have different timescales. Epigenetic changes are often rapid and reversible, while most genetic changes have nearly negligible rates of reversion (1). This poses a challenge for genetic adaptation to transient conditions such as drug treatment; mutations that confer drug resistance are often deleterious in the absence of drug, and the second-site suppressor mutations are required to restore fitness (2, 3). Preexisting tandem repeats (satellite DNA) undergo frequent expansion and contraction (46). While repeats are rare inside of most coding sequences and functional elements, there is some evidence for conserved repetitive regions that undergo expansion and contraction to regulate protein functions or expression (68). RNA interference– or Chromatin-based epigenetic states have been associated with transient drug resistance in fungi (9) and cancer cells (10, 11), and transient resistant states have been characterized by differences in organelle state, growth rate, and gene expression in budding yeast (12, 13). In bacteria and in fungi, copy-number gain and subsequent loss can result in reversible drug resistance (1418). However, all genetic systems developed so far for studying unstable genotypes rely on reporter genes and thus investigate only one genetic locus and only one type of genetic change.Unbiased, next-generation sequencing-based approaches could give a more global view, allowing us to understand the rules that govern unstable genotypes at a genome-wide scale. However, genetic changes with high rates of reversion tend to remain subclonal (1921), and it is challenging to distinguish most types of low-frequency mutations from sequencing errors (22), especially in complex genomes with large amount of repetitive DNA or de novo duplicated genes. Thus, fast-growing organisms with relatively small and simple genomes are particularly well suited for determining whether transient mutations exist, for the genome-wide characterization of such mutations, and for identification of the underlying mechanisms.  相似文献   

9.
10.
11.
Canonical Wnt signaling plays critical roles in development and tissue renewal by regulating β-catenin target genes. Recent evidence showed that β-catenin–independent Wnt signaling is also required for faithful execution of mitosis. However, the targets and specific functions of mitotic Wnt signaling still remain uncharacterized. Using phosphoproteomics, we identified that Wnt signaling regulates the microtubule depolymerase KIF2A during mitosis. We found that Dishevelled recruits KIF2A via its N-terminal and motor domains, which is further promoted upon LRP6 signalosome formation during cell division. We show that Wnt signaling modulates KIF2A interaction with PLK1, which is critical for KIF2A localization at the spindle. Accordingly, inhibition of basal Wnt signaling leads to chromosome misalignment in somatic cells and pluripotent stem cells. We propose that Wnt signaling monitors KIF2A activity at the spindle poles during mitosis to ensure timely chromosome alignment. Our findings highlight a function of Wnt signaling during cell division, which could have important implications for genome maintenance, notably in stem cells.

The canonical Wnt signaling pathway plays essential roles in embryonic development and tissue homeostasis (1, 2). In particular, Wnt signaling governs stem cell maintenance and proliferation in many tissues, and its misregulation is a common cause of tumor initiation (3, 4).Wnt ligands bind Frizzled (FZD) receptors and the coreceptors low-density lipoprotein receptor-related proteins 5 and 6 (LRP5/6) (5). The activated receptor complexes cluster on Dishevelled (DVL) platforms and are internalized via caveolin into endosomes termed LRP6 signalosomes, which triggers sequential phosphorylation of LRP6 by GSK3β and CK1γ (610). LRP6 signalosomes recruit the β-catenin destruction complex, which contains the scaffold proteins AXIN1 and adenomatous polyposis coli, the kinases CK1α and GSK3β, and the E3 ubiquitin ligase β-TrCP (11). This recruitment inhibits GSK3β and releases β-TrCP, which leads to β-catenin stabilization and nuclear translocation in a IFT-A/KIF3A–dependent manner (1216). LRP6 signalosomes mature into multivesicular bodies, sequestering the Wnt receptors together with GSK3β, thereby maintaining long-term activation of the Wnt pathway and promoting macropinocytosis (14, 1721). In contrast to Wnt ligands, the Wnt inhibitor Dickkopf-related protein 1 (DKK1) induces the clathrin-dependent internalization and turnover of LRP5/6 and thereby abrogates canonical Wnt signaling (22).LRP6 signalosome formation peaks in mitosis (23, 24). On the one hand, the LRP6 competence to respond to Wnt ligands is promoted during G2/M by a priming phosphorylation at its intracellular domain by CDK14/16 and CCNY/CCNYL1 (24, 25). On the other hand, CDK1 phosphorylates and recruits B-cell CLL/lymphoma 9 (BCL9) to the mitotic LRP6 signalosomes (23). BCL9 protects the signalosome from clathrin-dependent turnover, thereby sustaining basal Wnt activity on the onset of mitosis.Mitotic Wnt signaling not only modulates β-catenin (24) but increasing evidence suggests that it promotes a complex posttranslational program during mitosis (26). For instance, we have shown that mitotic Wnt signaling promotes stabilization of proteins (Wnt/STOP), which is required for cell growth and ensures chromosome segregation in somatic and embryonic cells (23, 2631). Particularly, basal Wnt/STOP activity maintains proper microtubule plus-end polymerization rates during mitosis, and its misregulation leads to whole chromosome missegregation (31, 32). Furthermore, mitotic Wnt signaling controls the orientation of the spindle (33) and promotes asymmetric division in stem cells through components of the LRP6 signalosome (34). Accordingly, several Wnt components functionally associate with centrosomes, kinetochores, and the spindle during mitosis (25, 33, 35, 36). Consequently, both aberrant up-regulation or down-regulation of Wnt signaling have been associated with chromosome instability (CIN) (31, 32, 35, 37), which is a hallmark of cancer (38). Despite the importance of these processes for tissue renewal and genome maintenance, the targets and specific functions of mitotic Wnt signaling remain largely uncharacterized.Kinesin family member 2A (KIF2A) is a member of the kinesin-13 group (KIF2A,B,C) of minus-end microtubule depolymerases (3941). KIF2A is essential for the scaling of the spindle during early development (42) and plays critical roles in neurogenesis by modulating both cilium disassembly and neuronal wiring (4347). In dividing cells, KIF2A was thought to be required for the assembly of a bipolar spindle due to a small interfering RNA (siRNA) off-target effect (48, 49). Current evidence supports a role of KIF2A in microtubule depolymerization at the spindle poles, which can generate pulling forces on attached kinetochores, thereby ensuring the congression, alignment, and segregation of chromosomes (5056). Genetic depletion of KIF2A in mouse leads to neonatal lethality and to severe brain malformations, including microcephaly (43, 44, 57). KIF2A recruitment to microtubules is tightly coordinated by several protein kinases (45, 47, 5052, 5860). For instance, phosphorylation of KIF2A at several sites by Polo-like kinase 1 (PLK1) stimulates its recruitment to and activity at the spindle (45, 58, 61). On the other hand, Aurora kinase A and B inhibit KIF2A activity and restrict its subcellular localization during mitosis (50, 58, 60).Here, we show that mitotic Wnt signaling promotes chromosome congression and alignment in prometaphase by recruiting KIF2A to the spindle in both somatic cells and pluripotent stem cells. We found that KIF2A is recruited by the LRP6 signalosome during mitosis. Mechanistically, we identified that KIF2A clusters with DVL via the N-terminal and motor domains of the depolymerase. We show that Wnt signaling controls KIF2A interaction with PLK1, which is critical for KIF2A localization at the spindle poles. We propose that basal Wnt signaling ensures timely chromosome congression and alignment prior cell division by modulating the spindle minus-end depolymerization dynamics through KIF2A.  相似文献   

12.
The CRISPR (clustered regularly interspaced short palindromic repeat)/Cas (CRISPR-associated) system has emerged as a powerful tool for targeted gene editing in many organisms, including plants. However, all of the reported studies in plants focused on either transient systems or the first generation after the CRISPR/Cas system was stably transformed into plants. In this study we examined several plant generations with seven genes at 12 different target sites to determine the patterns, efficiency, specificity, and heritability of CRISPR/Cas-induced gene mutations or corrections in Arabidopsis. The proportion of plants bearing any mutations (chimeric, heterozygous, biallelic, or homozygous) was 71.2% at T1, 58.3% at T2, and 79.4% at T3 generations. CRISPR/Cas-induced mutations were predominantly 1 bp insertion and short deletions. Gene modifications detected in T1 plants occurred mostly in somatic cells, and consequently there were no T1 plants that were homozygous for a gene modification event. In contrast, ∼22% of T2 plants were found to be homozygous for a modified gene. All homozygotes were stable to the next generation, without any new modifications at the target sites. There was no indication of any off-target mutations by examining the target sites and sequences highly homologous to the target sites and by in-depth whole-genome sequencing. Together our results show that the CRISPR/Cas system is a useful tool for generating versatile and heritable modifications specifically at target genes in plants.Genome engineering tools are important for plant functional genomics research and plant biotechnology. The CRISPR (clustered regularly interspaced short palindromic repeat)/Cas (CRISPR-associated) system has been successfully used for efficient genome editing in human cell lines, zebrafish, and mouse (13) and recently applied to gene modification in plants (410). In this system a short RNA molecule guides the associated endonuclease Cas9 to generate double strand breaks (DSBs) in the target genomic DNA, which lead to sequence mutations as a result of error-prone nonhomologous end-joining (NHEJ) DNA damage repair or to gene correction or replacement as a result of homology-dependent recombination (HR) (11). It was shown that engineered CRISPR/Cas caused mutations in target genes or corrections in transgenes in transient expression assays in plant protoplasts and tobacco leaves (10). Importantly, stable expression of the CRISPR/Cas in transgenic Arabidopsis, tobacco, and rice plants led to mutations (mostly indels) in target genes and correction of a transgene (49). However, it was not known whether the gene mutations and corrections occurred in somatic cells only or whether some of the mutations and corrections happened in germ-line cells and thus may be heritable. Additionally, it is unclear how specific the CRISPR/Cas is in plants. Previous studies in human cell lines indicated a high frequency of off-target effect of CRISPR/Cas-induced mutagenesis (12, 13) but a lower off-target effect in mice and zebrafish (14, 15). Here we show that the CRISPR/Cas-induced transgene correction or mutations in endogenous plant genes and transgenes detected in Arabidopsis T1 plants occurred mostly in somatic cells. However, some of the gene modifications were transmitted through the germ line and were heritable in Arabidopsis T2 and T3 plants following the classic Mendelian model. Mutations caused during DSB repair were predominantly 1 bp insertion and short deletions. Furthermore, our deep sequencing and analysis did not detect any off-targets in multiple CRISPR/Cas transgenic Arabidopsis lines, indicating that the mutagenesis effect of CRISPR/Cas is highly specific in plants.  相似文献   

13.
14.
We present the complete genomic sequence of the essential symbiont Polynucleobacter necessarius (Betaproteobacteria), which is a valuable case study for several reasons. First, it is hosted by a ciliated protist, Euplotes; bacterial symbionts of ciliates are still poorly known because of a lack of extensive molecular data. Second, the single species P. necessarius contains both symbiotic and free-living strains, allowing for a comparison between closely related organisms with different ecologies. Third, free-living P. necessarius strains are exceptional by themselves because of their small genome size, reduced metabolic flexibility, and high worldwide abundance in freshwater systems. We provide a comparative analysis of P. necessarius metabolism and explore the peculiar features of a genome reduction that occurred on an already streamlined genome. We compare this unusual system with current hypotheses for genome erosion in symbionts and free-living bacteria, propose modifications to the presently accepted model, and discuss the potential consequences of translesion DNA polymerase loss.Symbiosis, defined as a close relationship between organisms belonging to different species (1), is a ubiquitous, diverse, and important mechanism in ecology and evolution (e.g., refs. 24). In extreme cases, through the establishment of symbiotic relationships, quite unrelated lineages can functionally combine their genomes and generate advantageous emergent features or initiate parasite/host arms races. Ciliates, common unicellular protists of the phylum Ciliophora, are extraordinary receptacles for prokaryotic ecto- and endosymbionts (5, 6) that provide varied examples of biodiversity and ecological roles (6). Nevertheless, most of these symbionts are understudied, partially owing to the scarcity of available molecular data and the absence of sequenced genomes. Yet, thanks to their various biologies and the ease of sampling and cultivating their protist hosts, they are excellent potential models for symbioses between bacteria and heterotrophic eukaryotes. Until recently this field was dominated by studies on endosymbionts of invertebrates, especially insects (e.g., ref. 7), although unicellular systems like amoebas (e.g., refs. 8 and 9) have been shown to be suitable models.Polynucleobacter necessarius was first described as a cytoplasmic endosymbiont of the ciliate Euplotes aediculatus (10, 11). Further surveys detected its presence in a monophyletic group of fresh and brackish water Euplotes species (12, 13). All of the investigated strains of these species die soon after being cured of the endosymbiont (10, 12, 13). In the few cases in which P. necessarius is not present, a different and rarer bacterium apparently supplies the same function (12, 14). No attempt to grow symbiotic P. necessarius outside their hosts has yet been successful (15), strongly suggesting that the relationship is obligate for both partners, in contrast to most other known prokaryote/ciliate symbioses (6).Thus, the findings of many environmental 16S rRNA gene sequences similar to that of the symbiotic P. necessarius (16) but belonging to free-living freshwater bacteria came as a surprise. These free-living strains, which have been isolated and cultivated (17), are ubiquitous and abundant in the plankton of lentic environments (17, 18). They are smaller and do not show the most prominent morphological feature of the symbiotic form: the presence of multiple nucleoids, each containing one copy of the genome (10, 11). It is clear that free-living and endosymbiotic P. necessarius are not different life stages of the same organism (15). Nevertheless, these strikingly different bacteria, occupying separate ecological niches, exhibit >99% 16S rRNA gene sequence identity, and phylogenetic analyses fail to separate them into two distinct groups (15). Rather, several lines of evidence point to multiple, recent origins of symbiotic strains from the free-living bacterial pool (14, 15).Thus, the EuplotesPolynucleobacter symbiosis provides a promising system for the study of changes promoting or caused by the shift to an intracellular lifestyle. The remarkably small (2.16 Mbp) genome of the free-living strain QLW-P1DMWA-1 has been sequenced and studied, especially for features that would explain the success of this lineage in freshwater systems worldwide (19, 20). Phylogenies based on the 16S rRNA gene (13, 14) and multiple-gene analyses (19, 21, 22) consistently cluster Polynucleobacter with bacteria of the family Burkholderiaceae (Betaproteobacteria), either in a basal position or as the sister group of Ralstonia and Cupriavidus.Here we provide the complete genomic sequence of a symbiotic P. necessarius harbored in the cytoplasm of E. aediculatus and present a comparative analysis of the two sequenced Polynucleobacter genomes, addressing the possible biological basis of the EuplotesPolynucleobacter symbiosis. We also provide insights into the evolution of the unique two-step genome reduction in this bacterial species: the first step involving streamlining in a free-living ancestor and the second a more recent period of genome erosion confined to the symbiotic lineage.  相似文献   

15.
16.
17.
Many tailed bacteriophages assemble ejection proteins and a portal–tail complex at a unique vertex of the capsid. The ejection proteins form a transenvelope channel extending the portal–tail channel for the delivery of genomic DNA in cell infection. Here, we report the structure of the mature bacteriophage T7, including the ejection proteins, as well as the structures of the full and empty T7 particles in complex with their cell receptor lipopolysaccharide. Our near–atomic-resolution reconstruction shows that the ejection proteins in the mature T7 assemble into a core, which comprises a fourfold gene product 16 (gp16) ring, an eightfold gp15 ring, and a putative eightfold gp14 ring. The gp15 and gp16 are mainly composed of helix bundles, and gp16 harbors a lytic transglycosylase domain for degrading the bacterial peptidoglycan layer. When interacting with the lipopolysaccharide, the T7 tail nozzle opens. Six copies of gp14 anchor to the tail nozzle, extending the nozzle across the lipopolysaccharide lipid bilayer. The structures of gp15 and gp16 in the mature T7 suggest that they should undergo remarkable conformational changes to form the transenvelope channel. Hydrophobic α-helices were observed in gp16 but not in gp15, suggesting that gp15 forms the channel in the hydrophilic periplasm and gp16 forms the channel in the cytoplasmic membrane.

Many double-stranded DNA (dsDNA) viruses, including tailed bacteriophages and herpesviruses, have a portal attached to a unique pentameric vertex of their icosahedral capsid shell (13). The portal is a dodecameric channel for viral DNA packaging and ejection. The tailed bacteriophages and herpesviruses encapsidate DNA in the capsid shell through the portal channel (410), and the last packaged DNA is held by tunnel loops (or β-hairpins for herpesviruses) in the portal (1116). The last packaged DNA in most of the tailed bacteriophages and herpesvirus is the first to be ejected during the genome delivery (17). In tailed bacteriophages, the portal connects to a tail, which serves to recognize host cell receptors and deliver the genome into the cytoplasm (18). Gram-negative bacteriophage in Podoviridae initiate infection through a specific interaction of its receptor-binding protein with the receptor lipopolysaccharide (LPS) on the host cell surface. The phages in Podoviridae have a noncontractile tail that is too short to span the gram-negative bacteria envelope that comprises the outer membrane, the cytoplasmic membrane, and the peptidoglycan layer in the hydrophilic periplasm in between (19). After adsorption, a signal is transmitted for the release of internal ejection proteins to form a channel that extends the tail across the cell envelope and that allows for subsequent genome ejection into the infected cell (2023). In many previous studies, structural analyses have been performed at resolutions of 9 to 40 Å on this highly coordinated dynamic infection process (2126). These studies have provided insights on structural changes of phage particles that accompany the infection steps before and after the genome ejection. However, these studies did not resolve structures of the internal ejection proteins. Furthermore, the relative low resolutions cannot clarify the dynamic genome ejection process orchestrated by the ejection proteins, portal, and tail.Escherichia coli bacteriophage T7, a member of the Podoviridae family, has been used as a model for understanding the DNA packaging and delivery mechanism that are common to tailed phages and related dsDNA viruses (10, 21, 2733). T7 has an icosahedral capsid shell formed by gene product 10 (gp10). The 12-fold portal (gp8) shares a very similar topology with those in other phages and herpesviurses (1416, 30, 34). The tail comprises a 12-fold adaptor protein gp11 assembly, a sixfold nozzle protein gp12 assembly, and six subunits of trimeric tail fiber gp17 (21, 30). These tail fibers are responsible for bacterial receptor recognition and adsorption (21, 33). On top of the portal within the capsid shell is a hollow cylinder-shaped core structure (10, 28) formed by the ejection proteins (core proteins) gp14, gp15, and gp16, which have been suggested to form a transenvelope channel for the genome delivery into the infected cell (20, 35, 36). The gp16 harbors lytic transglycosylase (LTase) activity, which allows for penetration into the bacterial peptidoglycan layer (37).In this study, we present the structure of the mature bacteriophage T7 with internal core proteins at near-atomic resolution and the structures of the full and empty T7 particles in complex with their cell receptor at subnanometer and near-atomic resolutions, respectively. Our reconstruction reveals that the core in the mature T7 is formed by a fourfold gp16 ring, an eightfold gp15 ring, and a putative eightfold gp14 ring. The putative gp14 structures mediate the core–portal interaction. The gp15 and gp16 are mainly composed of helix bundles, and gp16 harbors a LTase domain. When the T7 phage interacts with the LPS, the tail nozzle opens. Six copies of gp14 anchor to the sixfold tail channel, extending the tail across the LPS lipid bilayer. A conformational change in the portal then triggers the genome ejection. Our structures reveal the structural changes of the phage genome-delivery molecular machines after the genome delivery.  相似文献   

18.
The spindle assembly checkpoint (SAC) is a conserved signaling pathway that monitors faithful chromosome segregation during mitosis. As a core component of SAC, the evolutionarily conserved kinase monopolar spindle 1 (Mps1) has been implicated in regulating chromosome alignment, but the underlying molecular mechanism remains unclear. Our molecular delineation of Mps1 activity in SAC led to discovery of a previously unidentified structural determinant underlying Mps1 function at the kinetochores. Here, we show that Mps1 contains an internal region for kinetochore localization (IRK) adjacent to the tetratricopeptide repeat domain. Importantly, the IRK region determines the kinetochore localization of inactive Mps1, and an accumulation of inactive Mps1 perturbs accurate chromosome alignment and mitotic progression. Mechanistically, the IRK region binds to the nuclear division cycle 80 complex (Ndc80C), and accumulation of inactive Mps1 at the kinetochores prevents a dynamic interaction between Ndc80C and spindle microtubules (MTs), resulting in an aberrant kinetochore attachment. Thus, our results present a previously undefined mechanism by which Mps1 functions in chromosome alignment by orchestrating Ndc80C–MT interactions and highlight the importance of the precise spatiotemporal regulation of Mps1 kinase activity and kinetochore localization in accurate mitotic progression.Faithful distribution of the duplicated genome into two daughter cells during mitosis depends on proper kinetochore–microtubule (MT) attachments. Defects in kinetochore–MT attachments result in chromosome missegregation, causing aneuploidy, a hallmark of cancer (1, 2). To ensure accurate chromosome segregation, cells use the spindle assembly checkpoint (SAC) to monitor kinetochore biorientation and to control the metaphase-to-anaphase transition. Cells enter anaphase only after the SAC is satisfied, requiring that all kinetochores be attached to MTs and be properly bioriented (3, 4). The core components of SAC signaling include mitotic arrest deficient-like 1 (Mad1), Mad2, Mad3/BubR1 (budding uninhibited by benzimidazole-related 1), Bub1, Bub3, monopolar spindle 1 (Mps1), and aurora B. The full SAC function requires the correct centromere/kinetochore localization of all SAC proteins (5).Among the SAC components, Mps1 was identified originally in budding yeast as a gene required for duplication of the spindle pole body (6). Subsequently, Mps1 orthologs were found in various species, from fungi to mammals. The stringent requirement of Mps1 for SAC activity is conserved in evolution (613). Human Mps1 kinase (also known as “TTK”) is expressed in a cell-cycle–dependent manner and has highest expression levels and activity during mitosis. Its localization is also dynamic (8, 14). Although the molecular mechanism remains unclear, Mps1 is required to recruit Mad1 and Mad2 to unattached kinetochores, supporting its essential role in SAC activity (1518). It also is clear that aurora B kinase activity and the outer-layer kinetochore protein nuclear division cycle 80 (Ndc80)/Hec1 are required for Mps1 localization to kinetochores, as evidenced by recent work, including ours (17, 1924). How Mps1 activates the SAC is now becoming clear. Mps1 recruits Bub1/Bub3 and BubR1/Bub3 to kinetochores through phosphorylation of KNL1, the kinetochore receptor protein of Bub1 and BubR1 (2530).Despite much progress in understanding Mps1 functions, it remains unclear how Mps1 is involved in regulating chromosome alignment. In budding yeast mitosis, Mps1 regulates mitotic chromosome alignment by promoting kinetochore biorientation independently of Ipl1 (aurora B in humans) (31), but in budding yeast meiosis Mps1 must collaborate with Ipl1 to mediate meiotic kinetochore biorientation (32). In humans, Mps1 regulates chromosomal alignment by modulating aurora B kinase activity (33), but recent chemical biology studies show that Mps1 kinase activity is important for proper chromosome alignment and segregation, independently of aurora B (22, 3436). Therefore whether Mps1 regulates chromosome alignment through modulation of aurora B kinase activity is still under debate (37).In this study, we reexamined the function of human Mps1 in chromosome alignment. We found that chromosomal alignment is largely achieved in Mps1 knockdown cells, provided that cells are arrested in metaphase in the presence of MG132, a proteasome inhibitor. However, disrupting Mps1 activity via small molecule inhibitors perturbs chromosomal alignment, even in the presence of MG132. This chromosome misalignment is caused by the abnormal accumulation of inactive Mps1 in the kinetochore and the subsequent failure of correct kinetochore–MT attachments. Further, we demonstrate that inactive Mps1 does not depend on the previously reported tetratricopeptide repeat (TPR) domain for localizing to kinetochores, and we identify a previously unidentified region adjacent to the C terminus of the TPR domain that is responsible for localizing inactive Mps1 to kinetochores. Thus, our work highlights that Mps1 kinase activity is necessary in regulating chromosome alignment and that it must be tightly regulated in space and time to ensure proper localization of Mps1 at kinetochores.  相似文献   

19.
Retroviruses package a dimeric genome comprising two copies of the viral RNA. Each RNA contains all of the genetic information for viral replication. Packaging a dimeric genome allows the recovery of genetic information from damaged RNA genomes during DNA synthesis and promotes frequent recombination to increase diversity in the viral population. Therefore, the strategy of packaging dimeric RNA affects viral replication and viral evolution. Although its biological importance is appreciated, very little is known about the genome dimerization process. HIV-1 RNA genomes dimerize before packaging into virions, and RNA interacts with the viral structural protein Gag in the cytoplasm. Thus, it is often hypothesized that RNAs dimerize in the cytoplasm and the RNA–Gag complex is transported to the plasma membrane for virus assembly. In this report, we tagged HIV-1 RNAs with fluorescent proteins, via interactions of RNA-binding proteins and motifs in the RNA genomes, and studied their behavior at the plasma membrane by using total internal reflection fluorescence microscopy. We showed that HIV-1 RNAs dimerize not in the cytoplasm but on the plasma membrane. Dynamic interactions occur among HIV-1 RNAs, and stabilization of the RNA dimer requires Gag protein. Dimerization often occurs at an early stage of the virus assembly process. Furthermore, the dimerization process is probably mediated by the interactions of two RNA–Gag complexes, rather than two RNAs. These findings advance the current understanding of HIV-1 assembly and reveal important insights into viral replication mechanisms.All viruses must encapsidate their genomes into virions to ensure that their genetic information is transferred to the new target cells. In most, if not all, retroviruses, the virion RNA genomes are dimeric, although each RNA encodes all of the genetic information required for replication. Most HIV-1 particles contain two copies of genomes (1), indicating that RNA encapsidation is a highly regulated process. This regulation is achieved by recognizing a dimeric RNA, and not by packaging a certain mass of viral genome (2).Our previous studies showed that HIV-1 RNA dimerization is a critical step in viral RNA genome packaging and virus assembly and that the two copies of copackaged RNA genomes are dimerized before encapsidation (1, 3, 4). The dimerization initiation signal (DIS), a 6-nt palindromic sequence located at the 5′ UTR of the HIV-1 RNA genome (5), most likely initiates the interaction between two HIV-1 RNA genomes (3, 4). When two HIV-1 RNAs contain similar sequences including the same DIS, they are copackaged efficiently at a rate similar to that predicted from random distribution (1, 2). In contrast, when two HIV-1 RNAs contain discordant palindromic sequences that cannot form perfect base pairing, they are not copackaged efficiently into the same viral particle (1, 2). The ability of RNA genomes from different HIV-1 variants to dimerize has important biological consequences. For example, inefficient copackaging is known to be a major barrier for intersubtype HIV-1 recombination (4). Although DIS plays a key role in RNA dimerization, virion RNAs isolated from mutants with DIS deletions remained dimeric, suggesting that other cis-acting element(s) are also involved in the dimerization (6).Despite the importance of RNA dimerization for HIV-1 replication, many aspects of this process are unknown, including the location at which dimerization occurs. Previously, we showed that RNA dimerization leading to HIV-1 genome packaging occurs after viral RNA is exported from the nucleus (7). The viral protein Gag is known to have chaperone activity (8). Additionally, biochemical experiments showed that HIV-1 Gag can interact with viral RNAs in the cytoplasm (9, 10). Thus, it is often hypothesized that two copies of HIV-1 RNAs dimerize in the cytoplasm and that this dimeric RNA is complexed with Gag and travels to the plasma membrane (7, 1115), the major assembly site for virus assembly. The assembly of HIV-1 RNA and Gag was demonstrated in an elegant study using total internal reflection fluorescence (TIRF) microscopy (14), which illuminates a shallow volume near the glass/cell interface and is ideal for studying events near the plasma membrane (16). However, it was difficult to address the monomeric/dimeric state of the viral RNA in this previous study because the RNA was labeled with a single type of fluorescent protein.In the present study, we sought to delineate the location at which HIV-1 RNA dimerization occurs, which leads to genome encapsidation, and whether Gag is required for RNA dimerization. We used a previously described method to label HIV-1 RNA with fluorescent proteins through interactions of sequence-specific RNA-binding proteins. We engineered HIV-1 genomes to contain RNA stem-loops that are recognized by the Escherichia coli BglG protein or the bacteriophage MS2 coat protein; because these sequences are located in the pol gene, they are present only in full-length unspliced HIV-1 RNAs. When introduced into human cells, these constructs express full-length RNAs that can serve as templates for the translation of Gag proteins and as genomes in the viral particles. Most (>90%) of the particles contain RNA genomes, indicating that the full-length viral RNAs derived from these constructs are efficiently packaged. Furthermore, RNAs derived from different constructs can dimerize and copackage at a rate close to random distribution (1), consistent with the genetic analyses from recombination studies (4, 17, 18). By using this method, we were able to detect HIV-1 RNA with single-RNA-molecule sensitivity (1) and tracked HIV-1 RNA movement in the cytoplasm by using live-cell imaging (19). In this report, we tagged two HIV-1 RNAs and Gag, each with a different fluorescent protein, and studied the RNA:RNA and RNA:Gag interactions on the plasma membrane. We found that HIV-1 RNA dimerizes on the plasma membrane and that Gag protein is required for stabilization of the dimer.  相似文献   

20.
Plants produce an array of specialized metabolites, including chemicals that are important as medicines, flavors, fragrances, pigments and insecticides. The vast majority of this metabolic diversity is untapped. Here we take a systematic approach toward dissecting genetic components of plant specialized metabolism. Focusing on the terpenes, the largest class of plant natural products, we investigate the basis of terpene diversity through analysis of multiple sequenced plant genomes. The primary drivers of terpene diversification are terpenoid synthase (TS) “signature” enzymes (which generate scaffold diversity), and cytochromes P450 (CYPs), which modify and further diversify these scaffolds, so paving the way for further downstream modifications. Our systematic search of sequenced plant genomes for all TS and CYP genes reveals that distinct TS/CYP gene pairs are found together far more commonly than would be expected by chance, and that certain TS/CYP pairings predominate, providing signals for key events that are likely to have shaped terpene diversity. We recover TS/CYP gene pairs for previously characterized terpene metabolic gene clusters and demonstrate new functional pairing of TSs and CYPs within previously uncharacterized clusters. Unexpectedly, we find evidence for different mechanisms of pathway assembly in eudicots and monocots; in the former, microsyntenic blocks of TS/CYP gene pairs duplicate and provide templates for the evolution of new pathways, whereas in the latter, new pathways arise by mixing and matching of individual TS and CYP genes through dynamic genome rearrangements. This is, to our knowledge, the first documented observation of the unique pattern of TS and CYP assembly in eudicots and monocots.Plants produce a rich and diverse array of specialized metabolites (1, 2). These compounds have important ecological functions, providing protection against pests, diseases, UV-B damage and other environmental stresses, and serve as attractants for pollinators and seed dispersal agents. They are exploited by humans as pharmaceutics, agrochemicals, and in a wide variety of other industrial applications. Metabolic diversification in higher plants is likely to have been driven by the need to adapt and survive in different ecological niches (3, 4). Although a considerable proportion of the genes in higher plant genomes are predicted to encode enzymes with roles in metabolism (∼20% in Arabidopsis thaliana; ref. 5), most of these are as yet uncharacterized. The availability of a growing number of sequenced plant genomes now makes it possible to exploit knowledge extracted from multiple diverse species to take a more holistic approach toward understanding mechanisms of metabolic diversification in plants (1, 2).The terpenes are the largest class of plant-derived natural products, with over 40,000 structures reported to date (68). As such they provide an excellent entrée for investigation of mechanisms of metabolic diversification. Terpenes range from simple flavor and fragrance compounds such as limonene and cymene to complex triterpenes, and have numerous potential applications across the food and beverage, pharmaceutical, cosmetic and agriculture industries. They include taxol (one of the most widely prescribed anticancer drugs) and artemisinin (the most potent antimalarial compound). This major class of compounds represents tremendous chemical diversity of which only a relatively small fraction has so far been accessed and used by industry (9). This is because the biosynthetic pathways for the vast majority of these compounds are unknown due to the challenges associated with mining large and complex genomes and establishing the function of genes implicated in specialized metabolism. Many of these genes are divergent members of multigene families, making the delineation of new metabolic pathways extremely difficult (1013).The primary drivers of terpene diversification are the terpenoid synthase (TS) “signature” enzymes (which generate scaffold diversity), and the cytochrome P450-dependent monooxygenases (CYPs), which modify and further diversify these scaffolds, also paving the way for subsequent downstream modifications (1015). TSs are defined as the related superfamily of biosynthetic enzymes involved in construction of the basic backbone structure of terpene natural products (16). As such, this includes the trans-isoprenyl diphosphate synthases and squalene synthases (SSs) that form the basic linear chains, as well as terpene synthases (TPSs) and triterpene cyclases (TTCs) that cyclize and rearrange these (16). Our knowledge of how the genes for terpene biosynthetic pathways are organized in plant genomes is limited, because the genomes of plants that produce some of the best characterized terpenoids (e.g., artemisinin and taxol) have not yet been sequenced. However, in a number of cases the genes for terpene biosynthetic pathways have been shown to be organized as metabolic gene clusters (14, 17). These include two diterpene clusters from Oryza sativa (rice) [the momilactone and phytocassane clusters (18, 19)], three triterpene clusters [the thalianol and marneral clusters from A. thaliana (20, 21), the avenacin cluster from Avena strigosa (oat) (22, 23)], and clusters for steroidal glycoalkaloids and other terpenes in the Solanaceae (24, 25). Potential new clusters implicated in terpene synthesis have also been reported in A. thaliana (20, 2628) and cucumber (29). The available evidence indicates that the characterized clusters have arisen within recent evolutionary history by gene duplication, acquisition of new function and genome reorganization, and that they are not products of horizontal gene transfer from microbes (reviewed in refs. 14, 17, and 30). Clustering has also been shown for other classes of plant natural products and is likely to facilitate coinheritance of beneficial gene combinations and also regulation at the level of chromatin (14, 17, 3032).TSs and CYPs are the core components of terpene biosynthetic pathways and together are responsible for the generation of a vast array of diverse terpene structures (1013, 15, 33). Here we have selected these two enzyme superfamilies as markers to investigate the foundations of terpene synthesis and evolution across 17 sequenced plant genomes. Our analyses shed light on the roots of terpene biosynthesis and diversification in plants. They also reveal that different genomic mechanisms of pathway assembly predominate in eudicots and monocots.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号