首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several plants are known to have acquired a single mitochondrial gene by horizontal gene transfer (HGT), but whether these or any other plants have acquired many foreign genes is entirely unclear. To address this question, we focused on Amborella trichopoda, because it was already known to possess one horizontally acquired gene and because it was found in preliminary analyses to contain several more. We comprehensively sequenced the mitochondrial protein gene set of Amborella, sequenced a variable number of mitochondrial genes from 28 other diverse land plants, and conducted phylogenetic analyses of these sequences plus those already available, including the five sequenced mitochondrial genomes of angiosperms. Results indicate that Amborella has acquired one or more copies of 20 of its 31 known mitochondrial protein genes from other land plants, for a total of 26 foreign genes, whereas no evidence for HGT was found in the five sequenced genomes. Most of the Amborella transfers are from other angiosperms (especially eudicots), whereas others are from nonangiosperms, including six striking cases of transfer from (at least three different) moss donors. Most of the transferred genes are intact, consistent with functionality and/or recency of transfer. Amborella mtDNA has sustained proportionately more HGT than any other eukaryotic, or perhaps even prokaryotic, genome yet examined.  相似文献   

2.
Microsporidia of the genus Encephalitozoon are widespread pathogens of animals that harbor the smallest known nuclear genomes. Complete sequences from Encephalitozoon intestinalis (2.3 Mbp) and Encephalitozoon cuniculi (2.9 Mbp) revealed massive gene losses and reduction of intergenic regions as factors leading to their drastically reduced genome size. However, microsporidian genomes also have gained genes through horizontal gene transfers (HGT), a process that could allow the parasites to exploit their hosts more fully. Here, we describe the complete sequences of two intermediate-sized genomes (2.5 Mbp), from Encephalitozoon hellem and Encephalitozoon romaleae. Overall, the E. hellem and E. romaleae genomes are strikingly similar to those of Encephalitozoon cuniculi and Encephalitozoon intestinalis in both form and content. However, in addition to the expected expansions and contractions of known gene families in subtelomeric regions, both species also were found to harbor a number of protein-coding genes that are not found in any other microsporidian. All these genes are functionally related to the metabolism of folate and purines but appear to have originated by several independent HGT events from different eukaryotic and prokaryotic donors. Surprisingly, the genes are all intact in E. hellem, but in E. romaleae those involved in de novo synthesis of folate are all pseudogenes. Overall, these data suggest that a recent common ancestor of E. hellem and E. romaleae assembled a complete metabolic pathway from multiple independent HGT events and that one descendent already is dispensing with much of this new functionality, highlighting the transient nature of transferred genes.  相似文献   

3.
4.
It has been suggested that horizontal gene transfer (HGT) is the "essence of phylogeny." In contrast, much data suggest that this is an exaggeration resulting in part from a reliance on inadequate methods to identify HGT events. In addition, the assumption that HGT is a ubiquitous influence throughout evolution is questionable. Instead, rampant global HGT is likely to have been relevant only to primitive genomes. In modern organisms we suggest that both the range and frequencies of HGT are constrained most often by selective barriers. As a consequence those HGT events that do occur most often have little influence on genome phylogeny. Although HGT does occur with important evolutionary consequences, classical Darwinian lineages seem to be the dominant mode of evolution for modern organisms.  相似文献   

5.
Horizontal gene transfer (HGT) can radically alter the genomes of microorganisms, providing the capacity to adapt to new lifestyles, environments, and hosts. However, the extent of HGT between eukaryotes is unclear. Using whole-genome, gene-by-gene phylogenetic analysis we demonstrate an extensive pattern of cross-kingdom HGT between fungi and oomycetes. Comparative genomics, including the de novo genome sequence of Hyphochytrium catenoides, a free-living sister of the oomycetes, shows that these transfers largely converge within the radiation of oomycetes that colonize plant tissues. The repertoire of HGTs includes a large number of putatively secreted proteins; for example, 7.6% of the secreted proteome of the sudden oak death parasite Phytophthora ramorum has been acquired from fungi by HGT. Transfers include gene products with the capacity to break down plant cell walls and acquire sugars, nucleic acids, nitrogen, and phosphate sources from the environment. Predicted HGTs also include proteins implicated in resisting plant defense mechanisms and effector proteins for attacking plant cells. These data are consistent with the hypothesis that some oomycetes became successful plant parasites by multiple acquisitions of genes from fungi.  相似文献   

6.
Horizontal gene transfer (HGT) is one of the most dominant forces molding prokaryotic gene repertoires. These repertoires can be as small as ≈200 genes in intracellular organisms or as large as ≈9,000 genes in large, free-living bacteria. In this article we ask what is the impact of HGT from phylogenetically distant sources, relative to the size of the gene repertoire. Using different approaches for HGT detection and focusing on both cumulative and recent evolutionary histories, we find a surprising pattern of nonlinear enrichment of long-distance transfers in large genomes. Moreover, we find a strong positive correlation between the sizes of the donor and recipient genomes. Our results also show that distant horizontal transfers are biased toward those functional groups that are enriched in large genomes, showing that the trends in functional gene content and the impact of distant transfers are interdependent. These results highlight the intimate relationship between environmental and genomic complexity in microbes and suggest that an ecological, as opposed to phylogenetic, signal in gene content gains relative importance in large-genomed bacteria.  相似文献   

7.
8.
The best known outcome of horizontal gene transfer (HGT) is the introduction of novel genes, but other outcomes have been described. When a transferred gene has a homolog in the recipient genome, the native gene may be functionally replaced (and subsequently lost) or partially overwritten by gene conversion with transiently present foreign DNA. Here we report the discovery, in two lineages of plant mitochondrial genes, of novel gene combinations that arose by conversion between coresident native and foreign homologs. These lineages have undergone intricate conversion between native and foreign copies, with conversion occurring repeatedly and differentially over the course of speciation, leading to radiations of mosaic genes involved in respiration and intron splicing. Based on these findings, we develop a model--the duplicative HGT and differential gene conversion model--that integrates HGT and ongoing gene conversion in the context of speciation. Finally, we show that one of these HGT-driven gene-conversional radiations followed two additional types of conversional chimerism, namely, intramitochondrial retroprocessing and interorganellar gene conversion across the 2 billion year divide between mitochondria and chloroplasts. These findings expand our appreciation of HGT and gene conversion as creative evolutionary forces, establish plant mitochondria as a premiere system for studying the evolutionary dynamics of HGT and its genetic reverberations, and recommend careful examination of bacterial and other genomes for similar, likely overlooked phenomena.  相似文献   

9.
The analysis of completely sequenced genomes uncovers an astonishing variability between species in terms of gene content and order. During genome history, the genes are frequently rear-ranged, duplicated, lost, or transferred horizontally between genomes. These events appear to be stochastic, yet they are under selective constraints resulting from the functional interactions between genes. These genomic constraints form the basis for a variety of techniques that employ systematic genome comparisons to predict functional associations among genes. The most powerful techniques to date are based on conserved gene neighborhood, gene fusion events, and common phylogenetic distributions of gene families. Here we show that these techniques, if integrated quantitatively and applied to a sufficiently large number of genomes, have reached a resolution which allows the characterization of function at a higher level than that of the individual gene: global modularity becomes detectable in a functional protein network. In Escherichia coli, the predicted modules can be bench-marked by comparison to known metabolic pathways. We found as many as 74% of the known metabolic enzymes clustering together in modules, with an average pathway specificity of at least 84%. The modules extend beyond metabolism, and have led to hundreds of reliable functional predictions both at the protein and pathway level. The results indicate that modularity in protein networks is intrinsically encoded in present-day genomes.  相似文献   

10.
The amount of lateral gene transfer (LGT) that has occurred in microbial evolution is heavily debated. Efforts to quantify LGT through gene-tree comparisons have delivered estimates that between 2% and 60% of all prokaryotic genes have been affected by LGT, the 30-fold discrepancy reflecting differences among gene samples studied and uncertainties inherent in phylogenetic reconstruction. Here we present a simple method that is independent of gene-tree comparisons to estimate the LGT rate among sequenced prokaryotic genomes. If little or no LGT has occurred during evolution, ancestral genome sizes would become unrealistically large, whereas too much LGT would render them far too small. We determine the amount of LGT that is necessary and sufficient to bring the distribution of inferred ancestral genome sizes into agreement with that observed among modern microbes. Rather than testing for phylogenetic congruence or lack thereof across genes, we assume that all gene trees are compatible; hence, our method delivers very conservative lower-bound estimates of the average LGT rate. The results indicate that among 57,670 gene families distributed across 190 sequenced genomes, at least two-thirds and probably all, have been affected by LGT at some time in their evolutionary past. A component of common ancestry nonetheless remains detectable in gene distribution patterns. We estimate the minimum lower bound for the average LGT rate across all genes as 1.1 LGT events per gene family and gene family lifespan and this minimum rate increases sharply when genes present in only a few genomes are excluded from the analysis.  相似文献   

11.
Multicellular animals use a three-part molecular toolkit to mediate phospho-tyrosine signaling: Tyrosine kinases (TyrK), protein tyrosine phosphatases (PTP), and Src Homology 2 (SH2) domains function, respectively, as “writers,” “erasers,” and “readers” of phospho-tyrosine modifications. How did this system of three components evolve, given their interdependent function? Here, we examine the usage of these components in 41 eukaryotic genomes, including the newly sequenced genome of the choanoflagellate, Monosiga brevicollis, the closest known unicellular relative to metazoans. This analysis indicates that SH2 and PTP domains likely evolved earliest—a handful of these domains are found in premetazoan eukaryotes lacking tyrosine kinases, most likely to deal with limited tyrosine phosphorylation cross-catalyzed by promiscuous Ser/Thr kinases. Modern TyrK proteins, however, are only observed in two lineages, metazoans and choanoflagellates. These two lineages show a dramatic coexpansion of all three domain families. Concurrent expansion of the three domain families is consistent with a stepwise evolutionary model in which preexisting SH2 and PTP domains were of limited utility until the appearance of the TyrK domain in the last common ancestor of metazoans and choanoflagellates. The emergence of the full three-component signaling system, with its dramatically increased encoding potential, may have contributed to the advent of metazoan multicellularity.  相似文献   

12.
Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids. To increase word accuracy in speech recognition, language models are used to capture the information that certain word combinations are more likely than others, thus improving detection based on context. However, to date, these context techniques have not been applied to protein domain discovery. Here we show that the application of statistical language modeling methods can significantly enhance domain recognition in protein sequences. As an example, we discover an unannotated Tf_Otx Pfam domain on the cone rod homeobox protein, which suggests a possible mechanism for how the V242M mutation on this protein causes cone-rod dystrophy.  相似文献   

13.
We have analyzed conserved domains in t-SNAREs [soluble N-ethylmaleimide-sensitive factor (NSF) attachment protein (SNAP) receptors in the target membrane], proteins that are believed to be involved in the fusion of transport vesicles with their target membrane. By using a sensitive computer method, the generalized profile method, we were able to identify a new homology domain that is common in the two protein families previously identified to act as t-SNAREs, the syntaxin and SNAP-25 (synaptosome-associated protein of 25 kDa) families, which therefore constitute a new superfamily. This homology domain of approximately 60 amino acids is predicted to form a coiled-coil structure. The significance of this homology domain could be demonstrated by a partial suppression of the coiled-coil properties of the domain profile. In proteins belonging to the syntaxin family, a single homology domain is located near the transmembrane domain, whereas the members of the SNAP-25 family possess two homology domains. This domain was also identified in several proteins that have been implicated in vesicular transport but do not belong to any of the t-SNARE protein families. Several new yeast, nematode, and mammalian proteins were identified that belong to the new superfamily. The evolutionary conservation of the SNARE coiled-coil homology domain suggests that this domain has a similar function in different membrane fusion proteins.  相似文献   

14.
Information derived from metagenome sequences through deep-learning techniques has significantly improved the accuracy of template free protein structure modeling. However, most of the deep learning–based modeling studies are based on blind sequence database searches and suffer from low efficiency in computational resource utilization and model construction, especially when the sequence library becomes prohibitively large. We proposed a MetaSource model built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil, and Fermentor) to decode the inherent linkage of microbial niches with protein homologous families. Large-scale protein family folding experiments on 8,700 unknown Pfam families showed that a microbiome targeted approach with multiple sequence alignment constructed from individual MetaSource biomes requires more than threefold less computer memory and CPU (central processing unit) time but generates contact-map and three-dimensional structure models with a significantly higher accuracy, compared with that using combined metagenome datasets. These results demonstrate an avenue to bridge the gap between the rapidly increasing metagenome databases and the limited computing resources for efficient genome-wide database mining, which provides a useful bluebook to guide future microbiome sequence database and modeling development for high-accuracy protein structure and function prediction.

Given the rapid explosion of protein sequences, computer-based approaches play an increasingly important role in protein structure determination and structure-based function annotations (1, 2). Two types of strategies have been widely considered for protein three-dimensional (3D) structure prediction (2): the first is template-based modeling, which constructs structural models using solved structures as templates, where its success requests for the availability of homologous templates in the Protein Data Bank (PDB); the second is template free modeling (FM) approach (or ab initio modeling), which dedicates to model the “Hard” proteins that do not have close homologous structures in the PDB. Due to the lack of reliable physics-based force fields, the most efficient FM methods, including Rosetta (3), QUARK (4), and I-TASSER (Iterative Threading ASSEmbly Refinement) (5), and most recently AlphaFold (6) and trRosetta (7), rely on a prior spatial restraints derived, usually through deep neural-network learning (8, 9), from the coevolution information based on multiple sequence alignments (MSAs) of homologous sequences (10). Hence, to model 3D structure of the “Hard” proteins, a sufficient number of homologous sequences is critical to ensure the accuracy of deep machine-learning models and the quality of subsequent 3D structure constructions (11).Considerable effort was recently paid to the utilization of metagenome sequence data to enhance the MSA and FM model constructions. For example, Ovchinnikov et al. used the Integrated Microbial Genomes database to generate contact-map predictions and create high-confidence models for 614 Pfam protein families that lack homologous structures in the PDB (12). Using UniRef20 (13), Michel et al. combined contact-map prediction with the CNS (Crystallography & NMR System) folding method (14) to model protein structure for 558 Pfam families of unknown structure with an estimated 90% specificity. Most recently, Wang et al. examined the usefulness of the Tara Oceans microbial genomes and found that the microbiome genomes can provide additional help on high-quality MSA construction and protein structure and function modeling (15). This result demonstrated a significant role of the microbiome sequences, which represent one of the largest reservoirs of microbial species on this planet, in FM structural folding and structure-based function annotations.Despite the success of metagenome-assisted 3D structure modeling, there are still thousands of Pfam families whose structure cannot be appropriately modeled with a satisfactory confidence. One critical reason is that despite the rapid accumulation of sequences, the current sequence databases are far from complete, and very few homologous sequences are available for many of the FM targets. On the other hand, the metagenome sequence databases have become extremely large (e.g., the Joint Genome Institute database contains more than 60 billion microbial genes and keeps increasing with at least 20,000 new sequences added per day) (16, 17), which makes a thorough and balanced database search increasingly slow and difficult. In a recent study, Zhang et al. showed that using current data mining tools, the quality of MSAs from metagenome library is not always proportional to the effective number of homologous sequences (Neff, reference SI Appendix, Eq. S1), partly due to the complexity of the sequence family relations and the bias of sequence database searches (10). The recent CASP experiments also witnessed various examples where the folding simulations for FM targets are negatively impacted by the contact/distance predictions due to the biased MSAs from the large metagenome datasets despite the high Neff value (18, 19). Therefore, a balanced sequence mining with accurate MSA construction is of critical importance to help improve the efficiency of sequence database searching and the subsequent 3D structure modeling.In this work, we hypothesize that there exists an inherent evolutionary linkage between microbial niches (biome) and protein families, where a targeted approach built on linked biome families can be used to improve both efficiency and accuracy of MSA construction and protein structure predictions. To examine the hypothesis, we collected a model library of 4.25 billion microbiome sequences from the EBI metagenomic database (MGnify database) (20) that cover four major biomes (Gut, Lake, Soil, and Fermentor). The “marginal effect” analyses showed profoundly different effects of specific biomes on supplementing homologous sequences for different Pfam families. A machine-learning model named MetaSource is then developed to predict the source biome of target proteins, which can significantly improve the contact-map and 3D structure models accuracy with using more than threefold lower computer memory and CPU time. These results have validated the important biome-sequence–Pfam associations, which can lead a way toward better efficiency and effectiveness of the microbiome-based targeted approach to protein structure and function predictions.  相似文献   

15.
F-box proteins are substrate-recognition components of the Skp1-Rbx1-Cul1-F-box protein (SCF) ubiquitin ligases. In plants, F-box genes form one of the largest multigene superfamilies and control many important biological functions. However, it is unclear how and why plants have acquired a large number of F-box genes. Here we identified 692, 337, and 779 F-box genes in Arabidopsis, poplar and rice, respectively, and studied their phylogenetic relationships and evolutionary patterns. We found that the plant F-box superfamily can be divided into 42 families, each of which has a distinct domain organization. We also estimated the number of ancestral genes for each family and identified highly conservative versus divergent families. In conservative families, there has been little or no change in the number of genes since the divergence between eudicots and monocots ≈145 million years ago. In divergent families, however, the numbers have increased dramatically during the same period. In two cases, the numbers of genes in extant species are >100 times greater than that in the most recent common ancestor (MRCA) of the three species. Proteins encoded by highly conservative genes always have the same domain organization, suggesting that they interact with the same or similar substrates. In contrast, proteins of rapidly duplicating genes sometimes have quite different domain structures, mainly caused by unusually frequent shifts of exon-intron boundaries and/or frameshift mutations. Our results indicate that different F-box families, or different clusters of the same family, have experienced dramatically different modes of sequence divergence, apparently having resulted in adaptive changes in function.  相似文献   

16.
The human intestine is an important location for horizontal gene transfer (HGT) due to the presence of a densely populated community of microorganisms which are essential to the health of the human superorganism. HGT in this niche has the potential to influence the evolution of members of this microbial community and to mediate the spread of antibiotic resistance genes from commensal organisms to potential pathogens. Recent culture-independent techniques and metagenomic studies have provided an insight into the distribution of mobile genetic elements (MGEs) and the extent of HGT in the human gastrointestinal tract. In this mini-review, we explore the current knowledge of mobile genetic elements in the gastrointestinal tract, the progress of research into the distribution of antibiotic resistance genes in the gut and the potential role of MGEs in the spread of antibiotic resistance. In the face of reduced treatment options for many clinical infections, understanding environmental and commensal antibiotic resistance and spread is critical to the future development of meaningful and long lasting anti-microbial therapies.  相似文献   

17.
Aminoacyl-tRNA synthetases (aaRSs) are ancient and evolutionary conserved enzymes catalyzing the formation of aminoacyl-tRNAs, that are used as substrates for ribosomal protein biosynthesis. In addition to full length aaRS genes, genomes of many organisms are sprinkled with truncated genes encoding single-domain aaRS-like proteins, which often have relinquished their canonical role in genetic code translation. We have identified the genes for putative seryl-tRNA synthetase homologs widespread in bacterial genomes and characterized three of them biochemically and structurally. The proteins encoded are homologous to the catalytic domain of highly diverged, atypical seryl-tRNA synthetases (aSerRSs) found only in methanogenic archaea and are deprived of the tRNA-binding domain. Remarkably, in comparison to SerRSs, aSerRS homologs display different and relaxed amino acid specificity. aSerRS homologs lack canonical tRNA aminoacylating activity and instead transfer activated amino acid to phosphopantetheine prosthetic group of putative carrier proteins, whose genes were identified in the genomic surroundings of aSerRS homologs. Detailed kinetic analysis confirmed that aSerRS homologs aminoacylate these carrier proteins efficiently and specifically. Accordingly, aSerRS homologs were renamed amino acid:[carrier protein] ligases (AMP forming). The enzymatic activity of aSerRS homologs is reminiscent of adenylation domains in nonribosomal peptide synthesis, and thus they represent an intriguing link between programmable ribosomal protein biosynthesis and template-independent nonribosomal peptide synthesis.  相似文献   

18.
PAirwise Sequence Comparison (PASC) is a tool that uses genome sequence similarity to help with virus classification. The PASC tool at NCBI uses two methods: local alignment based on BLAST and global alignment based on Needleman-Wunsch algorithm. It works for complete genomes of viruses of several families/groups, and for the family of Filoviridae, it currently includes 52 complete genomes available in GenBank. It has been shown that BLAST-based alignment approach works better for filoviruses, and therefore is recommended for establishing taxon demarcation criteria. When more genome sequences with high divergence become available, these demarcations will most likely become more precise. The tool can compare new genome sequences of filoviruses with the ones already in the database, and propose their taxonomic classification.  相似文献   

19.
The determination of core genes in viral and bacterial genomes is crucial for a better understanding of their relatedness and for their classification. CoreGenes5.0 is an updated user-friendly web-based software tool for the identification of core genes in and data mining of viral and bacterial genomes. This tool has been useful in the resolution of several issues arising in the taxonomic analysis of bacteriophages and has incorporated many suggestions from researchers in that community. The webserver displays result in a format that is easy to understand and allows for automated batch processing, without the need for any user-installed bioinformatics software. CoreGenes5.0 uses group protein clustering of genomes with one of three algorithm options to output a table of core genes from the input genomes. Previously annotated “unknown genes” may be identified with homologues in the output. The updated version of CoreGenes is able to handle more genomes, is faster, and is more robust, providing easier analysis of custom or proprietary datasets. CoreGenes5.0 is accessible at coregenes.org, migrating from a previous site.  相似文献   

20.
Viruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotated by comparison with public databases. In order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号