首页 | 本学科首页   官方微博 | 高级检索  
     


Consequences of domain insertion on sequence-structure divergence in a superfold
Authors:Chetanya Pandya  Shoshana Brown  Ursula Pieper  Andrej Sali  Debra Dunaway-Mariano  Patricia C. Babbitt  Yu Xia  Karen N. Allen
Abstract:Although the universe of protein structures is vast, these innumerable structures can be categorized into a finite number of folds. New functions commonly evolve by elaboration of existing scaffolds, for example, via domain insertions. Thus, understanding structural diversity of a protein fold evolving via domain insertions is a fundamental challenge. The haloalkanoic dehalogenase superfamily serves as an excellent model system wherein a variable cap domain accessorizes the ubiquitous Rossmann-fold core domain. Here, we determine the impact of the cap-domain insertion on the sequence and structure divergence of the core domain. Through quantitative analysis on a unique dataset of 154 core-domain-only and cap-domain-only structures, basic principles of their evolution have been uncovered. The relationship between sequence and structure divergence of the core domain is shown to be monotonic and independent of the corresponding type of domain insert, reflecting the robustness of the Rossmann fold to mutation. However, core domains with the same cap type share greater similarity at the sequence and structure levels, suggesting interplay between the cap and core domains. Notably, results reveal that the variance in structure maps to α-helices flanking the central β-sheet and not to the domain–domain interface. Collectively, these results hint at intramolecular coevolution where the fold diverges differentially in the context of an accessory domain, a feature that might also apply to other multidomain superfamilies.The universe of protein structures is vast and diverse, yet these innumerable structures can be categorized into a finite number of folds (1). Ideally, the protein fold has a robust yet evolvable architecture to deliver chemistry, bind interaction partners, or provide scaffolding. A popular strategy for the acquisition of new function(s) is the topological alteration of the fold to provide a new evolutionary platform. More frequently, existing and stable scaffolds are elaborated to attain diversity that is due to accumulation of stochastic, independent, and near-neutral mutations in the protein sequence. In a large number of cases, the expansion of functional space has been achieved by the tandem fusion of two or three domains to form evolutionary modules known as supradomains (2). An analysis of catalytic domains fused to the nucleotide-binding Rossmann domain has revealed that the sequential order of their connections is conserved because each pairing arose from a single recombination event (3). Another common structural embellishment is that of domain insertion(s) into existing folds (4)—a strategy that is ubiquitous in all structural classes, i.e., all α, all β, α + β, and α/β (5). For example, members of the A, B, and Y DNA polymerase superfamilies, Rab geranylgeranyl transferase superfamily, and alcohol dehydrogenase superfamily have inserted different domains into the native fold to fine tune their cellular functions (68). The analysis of such noncontiguous domain organization has been facilitated by the availability of structures bearing insertions of domains that also occur as independent folds. It has been estimated that 9% of domain combinations observed in protein-structure databases are insertions (5). However, the way in which the sequence–structure relationship changes within a protein fold in the context of such domain insertions has yet to be fully understood. In this study, we assess how the insertion of an accessory domain affects the sequence–structure relationship of the Rossmann fold, a superfold used by at least 10 different protein superfamilies (9).Function-driven changes come with their own costs: most molecular modifications of proteins tend to be thermodynamically destabilizing (10). Although long hypothesized (11), it has been shown only recently that the stability of a fold promotes evolvability by allowing a high degree of structural plasticity (12). As a consequence, protein folds follow a power-law distribution where a few intrinsically stable folds, referred to as superfolds, have numerous members, and a multitude of folds have few members (9). Due to this interplay between stability and evolvability, it has been suggested that superfolds are compatible with a much larger set of sequences than other folds (13). This proposal raises the question of how protein sequence and structural diversity are related to one another. Pioneering work by Chothia and Lesk (14) illustrated that structural similarity is correlated with sequence similarity. Although the 3D structure retains the common fold during neutral drift, it undergoes subtle changes as sequence diverges, mainly due to packing modifications and backbone conformational changes. In a focused study, Halaby et al. (15) have shown that sequence diverges to a greater extent than structure in the Ig fold. More recently, Panchenko and coworkers noted a similar trend in a systematic study spanning 81 homologous protein families (16).We queried how large inserts into a protein fold shape the relationship between sequence and structure divergence, using as a model system, the Haloalkanoate Dehalogenase Superfamily (HADSF), where the inserts into a Rossmann catalytic domain impart substrate specificity. The HADSF is a highly successful family, and, with close to 80,000 members to date (17), it is one of the largest enzyme superfamilies. The HADSF is well-represented in all domains of life, and the majority of its members catalyze phosphate ester hydrolysis (18, 19). HADSF members have attained functional diversity via accessorization of the conserved core Rossmann-fold domain by the insertion of a cap domain. The Rossmann fold is a primordial nucleotide-binding fold that plays a significant role in maintenance and evolution of life (6). Structurally, it is organized as a three-layered sandwich made up of multiple α/β units. The fold is similar across superfamilies, apart from stochastic thermal fluctuations and structural divergence. Notably, the inserted cap domains in the HADSF have not yet been observed as independent folds in extant organisms (the term domain is defined here as an apparently stable arrangement of secondary structural units). The presence of the cap domains leads to a natural classification of the superfamily into different structural classes—C0 (no or minimal cap insert), C1 (α-helical cap insert after the first strand), C2 (α-helical and β-strand cap insert after the third strand, further subdivided into C2a and C2b depending on topology), and C1 + C2 (inserts in both positions), based on topology and location of the insert (20) (SI Appendix, Fig. S1). These characteristics make the HADSF an excellent model system for the study of the sequence–structure–function relationship in multidomain proteins.The unique cap–core architecture of the HADSF raises several intriguing biophysical and biochemical questions regarding molecular evolution. In a typical HADSF member, the substrate leaving group and the phosphoryl group are bound by the cap and core domains, respectively. However, binding and catalysis cannot be independent of one another (reflected in the definition of the specificity constant kcat/Km). How this functional codependence between the two domains manifests itself in the structure is an important question with implications for evolution and rational design of multidomain proteins. As cap and core domains form a single polypeptide chain, substrate binding and catalysis are inherently linked although the details of this linkage have yet to be defined. Another perplexing issue is the evolutionary mechanism of cap-domain evolution. Did it involve a rapid stage where the cap domain was grafted onto the core domain, followed by a slow stage where neutral mutations were accumulated? Or was there a gradual and continuous change where a small cap domain was inserted followed by subsequent duplication and elaboration? Herein, we attempt to answer such questions by analyzing a unique dataset of core-domain-only and cap-domain-only structures using quantitative informatics analyses. The relationship between sequence and structure divergence in the core fold is shown to be monotonic, as is generally the case, and notably, to be independent of the corresponding cap type. However, core domains with the same cap type bear a greater similarity at the sequence and structure level than do the core domains with different cap types, suggesting interplay between the cap and core domains. Surprisingly, we find that the variation between cap types maps to the flanking helices of the Rossmann fold rather than to the interface, suggesting that the core has changed more globally to accommodate the cap. Overall, our results suggest that the structure space of a superfamily has an underlying organizing principle despite its diversity.
Keywords:directed evolution   phosphoryl transferase   protein evolution   structural bioinformatics   HAD superfamily
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号