Abstract: | Odoriferous terpene metabolites of bacterial origin have been known for many years. In genome-sequenced Streptomycetaceae microorganisms, the vast majority produces the degraded sesquiterpene alcohol geosmin. Two minor groups of bacteria do not produce geosmin, with one of these groups instead producing other sesquiterpene alcohols, whereas members of the remaining group do not produce any detectable terpenoid metabolites. Because bacterial terpene synthases typically show no significant overall sequence similarity to any other known fungal or plant terpene synthases and usually exhibit relatively low levels of mutual sequence similarity with other bacterial synthases, simple correlation of protein sequence data with the structure of the cyclized terpene product has been precluded. We have previously described a powerful search method based on the use of hidden Markov models (HMMs) and protein families database (Pfam) search that has allowed the discovery of monoterpene synthases of bacterial origin. Using an enhanced set of HMM parameters generated using a training set of 140 previously identified bacterial terpene synthase sequences, a Pfam search of 8,759,463 predicted bacterial proteins from public databases and in-house draft genome data has now revealed 262 presumptive terpene synthases. The biochemical function of a considerable number of these presumptive terpene synthase genes could be determined by expression in a specially engineered heterologous Streptomyces host and spectroscopic identification of the resulting terpene products. In addition to a wide variety of terpenes that had been previously reported from fungal or plant sources, we have isolated and determined the complete structures of 13 previously unidentified cyclic sesquiterpenes and diterpenes.Some 50,000 terpenoid metabolites, including monoterpenes, sesquiterpenes, and diterpenes representing nearly 400 distinct structural families, have been isolated from both terrestrial and marine plants, liverworts, and fungi. In contrast, only a relatively minor fraction of these widely occurring metabolites has been identified in prokaryotes. The first study of bacterial terpenes grew out of an investigation of the characteristic odor of freshly plowed soil reported in 1891 by Berthelot and André (1). Berthelot and André noted that a volatile substance apparently responsible for the typical earthy odor of soil could be extracted from soil by steam distillation. Their attempts to assign a structure to the isolated odor constituent failed;, however, when the neutral alcohol resisted oxidative degradation or other conventional chemical modification. The first modern studies of volatile bacterial terpenes were carried out some 75 years later by Gerber and Lechevalier (2) and Gerber (3–7), who speculated that the characteristic odor of cultures of Actinomycetales microorganisms, which are widely distributed in soil, might be caused by volatile terpenes. In addition to determining the structure of Berthelot’s geosmin, shown to be a C12 degraded sesquiterpene alcohol (and giving it its name, which means earth odor) (2, 3), Gerber (4) also isolated and determined the structures of the methylated monoterpene 2-methylisoborneol as well as several other cyclic sesquiterpenes produced by streptomycetes (5–7). In subsequent years, numerous volatile terpenes have been detected in streptomycetes (8–16). The three most commonly detected streptomycetes terpenoids, geosmin, and 2-methylisoborneol and the tricyclic α,β-unsaturated ketone albaflavenone () are well-known as volatile odoriferous microbial metabolites. The two terpene alcohols are, in fact, the most frequently found secondary metabolites in actinomycetes (8, 11, 17), filamentous Cyanobacteria (18–20), and Myxobacteria (21), and they are also produced by a small number of fungi (22–24). The production of 2-methylisoborneol is associated with a characteristic scent, whereas albaflavenone, which was first isolated from cultures of a highly odoriferous Streptomyces albidoflavus species, is best described as earthy and camphor-like (25).Open in a separate windowThe structures of the major known terpenes produced by bacteria.Cyclic monoterpene, sesquiterpene, and diterpene hydrocarbons and alcohols are formed by variations of a universal cyclization mechanism that is initiated by enzyme-catalyzed ionization of the universal acyclic precursors geranyl diphosphate (GPP), farnesyl diphosphate (FPP), and geranylgeranyl diphosphate (GGPP) to form the corresponding allylic cations. These parental branched, linear isoprenoid precursors are themselves synthesized by mechanistically related electrophilic condensations of the 5-carbon building blocks dimethylallyl diphosphate and isopentenyl diphosphate. The several thousand known or suspected terpene synthases from plants and fungi have a strongly conserved level of overall amino acid sequence similarity, thus making possible the application of local alignment methods, such as the widely used BLAST algorithm, for the discovery of genes encoding presumptive terpene synthases from plant and fungal sources. Despite the relatively high level of overall sequence conservation, however, assignment of the actual biosynthetic cyclization product of each fungal or plant terpene synthase has remained beyond the reach of available bioinformatic methods. The discovery and biochemical characterization of bacterial terpene synthases represent an even greater challenge, because unlike the plant and fungal enzymes, bacterial terpene synthases not only exhibit no significant overall amino acid sequence similarity to those from plants and fungi but typically display relatively low levels of mutual sequence similarity. To address this challenge, we recently described the successful application of an alternative genome mining strategy for the discovery of previously unidentified bacterial terpene synthases based on the use of hidden Markov models (HMMs) and protein families database (Pfam) searching methods (26). These initial efforts identified a large number of previously unrecognized bacterial terpene synthase candidates, including the discovery of the previously unidentified synthase for the methylated monoterpene 2-methylisoborneol, and led to the heterologous expression of the relevant genes that produce 2-methylisoborneol and 2-methylenebornane from 2-methylgeranyl diphosphate (27). We subsequently refined and expanded the set of HMM parameters using as a reference set exclusively the group of newly predicted bacterial terpene synthases in distinction to the original HMM model (PF03936), which had been based on plant terpene synthases. Using these newly refined parameters, we then succeeded in identifying a previously unrecognized ortholog of 2-methylisoborneol synthase in the cyanobacterium Pseudanabaena limnetica str. Castaic Lake (28). Application of this second generation HMM model allowed, in total, the discovery of 140 predicted terpene synthases of bacterial origin.We now report the development of a third generation HMM model trained by the previously identified 140 bacterial terpene synthases that has expanded the number of predicted bacterial terpene synthases to 262 from within the most complete set of predicted proteins incorporated in the most recent collection of public databases and in-house draft genome sequences of streptomycete microorganisms. Among the newly identified gene sequences, a subset selected by phylogenetic analysis has been expressed in a specially engineered heterologous Streptomyces host, and the resultant terpenes have been identified and structurally characterized. |