Abstract: | The cyanobacterial phylum encompasses oxygenic photosynthetic prokaryotes of a great breadth of morphologies and ecologies; they play key roles in global carbon and nitrogen cycles. The chloroplasts of all photosynthetic eukaryotes can trace their ancestry to cyanobacteria. Cyanobacteria also attract considerable interest as platforms for “green” biotechnology and biofuels. To explore the molecular basis of their different phenotypes and biochemical capabilities, we sequenced the genomes of 54 phylogenetically and phenotypically diverse cyanobacterial strains. Comparison of cyanobacterial genomes reveals the molecular basis for many aspects of cyanobacterial ecophysiological diversity, as well as the convergence of complex morphologies without the acquisition of novel proteins. This phylum-wide study highlights the benefits of diversity-driven genome sequencing, identifying more than 21,000 cyanobacterial proteins with no detectable similarity to known proteins, and foregrounds the diversity of light-harvesting proteins and gene clusters for secondary metabolite biosynthesis. Additionally, our results provide insight into the distribution of genes of cyanobacterial origin in eukaryotic nuclear genomes. Moreover, this study doubles both the amount and the phylogenetic diversity of cyanobacterial genome sequence data. Given the exponentially growing number of sequenced genomes, this diversity-driven study demonstrates the perspective gained by comparing disparate yet related genomes in a phylum-wide context and the insights that are gained from it.The Cyanobacteria are one of the most diverse and widely distributed phyla of bacteria. Among photosynthetic prokaryotes, they uniquely have the ability to perform oxygenic photosynthesis; they are considered to be the progenitor of the chloroplast, the photosynthetic organelle found in eukaryotes. Cyanobacteria contribute greatly to global primary production, fixing a substantial amount of biologically available carbon, especially in nutrient-limited environmental niches, from oligotrophic marine surfaces to desert crusts (1, 2). In addition, cyanobacteria are key contributors to global nitrogen fixation (3), and many produce unique secondary metabolites (4). Despite these important traits and substantial interest in developing cyanobacterial strains for biotechnology, there is a paucity and unbalanced distribution of publicly available genomic information from the Cyanobacteria: 40% (29 of 72 species) of the available genomes fall within the closely related marine Prochlorococcus/Synechococcus subclade. Improvements in coverage of sequenced genomes will enable a more accurate and comprehensive understanding of cyanobacterial morphology, niche-adaptation, and evolution.Taxonomic studies organized the Cyanobacteria into five subsections based on morphological complexity (5). Unicellular forms are split between those that undergo solely binary fission (subsection I, Chroococcales) and those that reproduce through multiple fissions in three planes to create smaller daughter cells, baeocytes (subsection II, Pleurocapsales). Strains in subsection III (Oscillatoriales) divide the vegetative cell solely perpendicular to the growing axis. Organisms in subsections IV (Nostocales) and V (Stigonematales) are able to differentiate specific cells [i.e., heterocysts (for nitrogen fixation)] and may form akinetes (dormant cells) and hormogonia (for dispersal and symbiosis competence). Subsection V is further distinguished by the ability to form branching filaments. Before this study, two subsections (II and V) had no representative genomes, underscoring the dearth in our understanding of these more complex morphological phenotypes.In this study, 54 strains of cyanobacteria were chosen to improve the distribution of sequenced genomes. The approach is modeled on the phylogenetically driven Genomic Encyclopedia of Bacteria and Archaea (GEBA) (6), and so we refer to our data as the CyanoGEBA dataset (SI Appendix, Table S1 and Dataset S1). The results highlight the value of phylum-wide genome sequencing based on phylogenetic coverage. |