首页 | 本学科首页   官方微博 | 高级检索  
检索        


High-throughput developability assays enable library-scale identification of producible protein scaffold variants
Authors:Alexander W Golinski  Katelynn M Mischler  Sidharth Laxminarayan  Nicole L Neurock  Matthew Fossing  Hannah Pichman  Stefano Martiniani  Benjamin J Hackel
Institution:aDepartment of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN, 55455
Abstract:Proteins require high developability—quantified by expression, solubility, and stability—for robust utility as therapeutics, diagnostics, and in other biotechnological applications. Measuring traditional developability metrics is low throughput in nature, often slowing the developmental pipeline. We evaluated the ability of 10 variations of three high-throughput developability assays to predict the bacterial recombinant expression of paratope variants of the protein scaffold Gp2. Enabled by a phenotype/genotype linkage, assay performance for 105 variants was calculated via deep sequencing of populations sorted by proxied developability. We identified the most informative assay combination via cross-validation accuracy and correlation feature selection and demonstrated the ability of machine learning models to exploit nonlinear mutual information to increase the assays’ predictive utility. We trained a random forest model that predicts expression from assay performance that is 35% closer to the experimental variance and trains 80% more efficiently than a model predicting from sequence information alone. Utilizing the predicted expression, we performed a site-wise analysis and predicted mutations consistent with enhanced developability. The validated assays offer the ability to identify developable proteins at unprecedented scales, reducing the bottleneck of protein commercialization.

A common constraint across diagnostic, therapeutic, and industrial proteins is the ability to manufacture, store, and use intact and active molecules. These protein properties, collectively termed developability, are often associated to quantitative metrics such as recombinant yield, stability (chemical, thermal, and proteolytic), and solubility (15). Despite this universal importance, developability studies are performed late in the commercialization pipeline (2, 4) and limited by traditional experimental capacity (6). This is problematic because 1) proteins with poor developability limit practical assay capacity for measuring primary function, 2) optimal developability is often not observed with proteins originally found in alternative formats such as display or two-hybrid technologies (7)], and 3) engineering efforts are limited by the large gap between observation size (∼102) and theoretical mutational diversity (∼1020). Thus, efficient methods to measure developability would alleviate a significant bottleneck in the lead selection process and accelerate protein discovery and engineering.Prior advances to determine developability have focused on calculating hypothesized proxy metrics from existing sequence and structural data or developing material- and time-efficient experiments. Computational sequence-developability models based on experimental antibody data have predicted posttranslational modifications (8, 9), solubility (10, 11), viscosity (12), and overall developability (13). Structural approaches have informed stability (14) and solubility (10, 15). However, many in silico models require an experimentally solved structure or suffer from computational structure prediction inaccuracies (16). Additionally, limited developability information allows for limited predictive model accuracy (17). In vitro methods have identified several experimental protocols to mimic practical developability requirements e.g., affinity-capture self-interaction nanoparticle spectroscopy (18) and chemical precipitation (19) as metrics for solubility]. However, traditional developability quantification requires significant amounts of purified protein. Noted in both fronts are numerous in silico and/or in vitro metrics to fully quantify developability (1, 5).We sought a protein variant library that would benefit from isolation of proteins with increased developability and demonstrate the broad applicability of the process. Antibodies and other binding scaffolds, comprising a conserved framework and diversified paratope residues, are effective molecular targeting agents (2024). While significant progress has been achieved with regards to identifying paratopes for optimal binding strength and specificity (25, 26), isolating highly developable variants remains plagued. One particular protein scaffold, Gp2, has been evolved into specific binding variants toward multiple targets (2729). Continued study improved charge distribution (30), hydrophobicity (31), and stability (28). While these studies have suggested improvements for future framework and paratope residues (including a disulfide-stabilized loop), a poor developability distribution is still observed (32) (Fig. 1 A and B). Assuming the randomized paratope library will lack similar primary functionality, the Gp2 library will simulate the universal applicability of the proposed high-throughput (HT) developability assays.Open in a separate windowFig. 1.HT assays were evaluated for the ability to identify protein scaffold variants with increased developability. (A and B) Gp2 variant expression, commonly measured via low-throughput techniques such as the dot blot shown, highlights the rarity of ideal developability. (C and D) The HT on-yeast protease assay measures the stability of the POI by proteolytic extent. (E and F) The HT split-GFP assay measures POI expression via recombination of a genetically fused GFP fragment. (G and H) The HT split β-lactamase assay measures the POI stability by observing the change in cell-growth rates when grown at various antibiotic concentrations. (I and J) Assay scores, assigned to each unique sequence via deep sequencing, were evaluated by predicting expression (Fig. 3). (K and L) HT assay capacity enables large-scale developability evaluation and can be used to identify beneficial mutations (Fig. 4).We sought HT assays that allow protein developability differentiation via cellular properties to improve throughput. Variations of three primary assays were examined: 1) on-yeast stability (Fig. 1 C and D)—previously validated to improve the stability of de novo proteins (33), antimicrobial lysins (34), and immune proteins (35)—measures proteolytic cleavage of the protein of interest (POI) on the yeast cell surface via fluorescence-activated cell sorting (FACS). We extend the assay by performing the proteolysis at various denaturing combinations to determine if different stability attributes (thermal, chemical, and protease specificity) can be resolved; 2) Split green fluorescent protein (GFP, Fig. 1 E and F)—previously used to determine soluble protein concentrations (36)—measures the assembled GFP fluorescence emerging from a 16–amino acid fragment (GFP11) fused to the POI after recombining with the separably expressed GFP1-10. We extend the assay by utilizing FACS to separate cells with differential POI expression to increase throughput over the plate-based assay; and 3) Split β-lactamase (Fig. 1 G and H)—previously used to improve thermodynamic stability (37) and solubility (38)—measures cell growth inhibition via ampicillin to determine functional lactamase activity achieved from reconstitution of two enzyme fragments flanking the POI. We expand assay capacity by deep sequencing populations grown at various antibiotic concentrations to relate change in cell frequency to functional enzyme concentration.In this paper, we determined the HT assays’ abilities to predict Gp2 variant developability. We deep sequenced the stratified populations and calculated assay scores (correlating to hypothesized developability) for ∼105 Gp2 variants (Fig. 1I). We then converted the assay scores into a traditional developability metric by building a model that predicts recombinant yield (Fig. 1J). The assays’ capacity enabled yield evaluations for >100-fold traditional assay capacity (Fig. 1K, compared to Fig. 1B) and provide an introductory analysis of factors driving protein developability by observing beneficial mutations via predicted developable proteins (Fig. 1L).
Keywords:developability  protein engineering  predictive modeling
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号