首页 | 本学科首页   官方微博 | 高级检索  
检索        


Recombinant transfer in the basic genome of Escherichia coli
Authors:Purushottam D Dixit  Tin Yau Pang  F William Studier  Sergei Maslov
Institution:Biological, Environmental and Climate Sciences Department, Brookhaven National Laboratory, Upton, NY, 11973
Abstract:An approximation to the ∼4-Mbp basic genome shared by 32 strains of Escherichia coli representing six evolutionary groups has been derived and analyzed computationally. A multiple alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ∼90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single base-pair mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome pairs have one or two recombinant transfers of length ∼40–115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4–1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kilobase pairs. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by coevolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.The increasing availability of complete genome sequences of many different bacterial and archaeal species, as well as metagenomic sequencing of mixed populations from natural environments, has stimulated theoretical and computational approaches to understand mechanisms of speciation and how prokaryotic species should be defined (18). Much genome analysis and comparison has been at the level of gene content, identifying core genomes (the set of genes found in most or all genomes in a group) and the continually expanding pan-genome. Population genomics of Escherichia coli has been particularly well studied because of its long history in laboratory research and because many pathogenic strains have been isolated and completely sequenced (914). Proposed models of how related groups or species form and evolve include isolation by ecological niche (79, 11, 15), decreased homologous recombination as divergence between isolated populations increases (24, 8, 14, 16), and coevolving phage and bacterial populations (6).E. coli genomes are highly variable, containing an array of phage-related mobile elements integrated at many different sites (17), random insertions of multiple transposable elements (18), and idiosyncratic genome rearrangements that include inversions, translocations, duplications, and deletions. Although E. coli grows by binary cell division, genetic exchange by homologous recombination has come to be recognized as a significant factor in adaptation and genome evolution (9, 10, 19). Of particular interest has been the relative contribution to genome variability of random mutations (single base-pair differences referred to as SNPs) and replacement of genome regions by homologous recombination with fragments imported from other genomes (here referred to as recombinant transfers or transferred regions). Estimates of the rate, extent, and average lengths of recombinant transfers in the core genome vary widely, as do methods for detecting transferred regions and assessing their impact on phylogenetic relationships (1214, 20, 21).In a previous comparison of complete genome sequences of the K-12 reference strain MG1655 and the reconstructed genome of the B strain of Delbrück and Luria referred to here as B-DL, we observed that SNPs are not randomly distributed among 3,620 perfectly matched pairs of coding sequences but rather have two distinct regimes: sharply decreasing numbers of genes having 0, 1, 2, or 3 SNPs, and an abrupt transition to a much broader exponential distribution in which decreasing numbers of genes contain increasing numbers of SNPs from 4 to 102 SNPs per gene (22). Genes in the two regimes of the distribution are interspersed in clusters of variable lengths throughout what we referred to as the basic genome, namely, the ∼4 Mbp shared by the two genomes after eliminating mobile elements. We speculated that genes having 0 to 3 SNPs may primarily have been inherited clonally from the last common ancestor, whereas genes comprising the exponential tail may primarily have been acquired by horizontal transfer from diverged members of the population.The current study was undertaken to extend these observations to a diverse set of 32 completely sequenced E. coli genomes and to analyze how SNP distributions in the basic genome change as a function of evolutionary divergence between the 496 pairs of strains in this set. We have taken a simpler approach than those of Touchon et al. (13), Didelot et al. (14), and McNally et al. (21), who previously analyzed multiple alignments of complete genomes of E. coli strains. The appreciably larger basic genome derived here is not restricted to protein-coding sequences and retains positional information.
Keywords:E  coli evolution  basic genome  core genome  recombinant transfer  generalized transduction
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号