首页 | 本学科首页   官方微博 | 高级检索  
     


Peptide-binding specificity prediction using fine-tuned protein structure prediction networks
Authors:Amir Motmaen  Justas Dauparas  Minkyung Baek  Mohamad H. Abedi  David Baker  Philip Bradley
Affiliation:aDepartment of Biochemistry, University of Washington, Seattle, WA 98195;bInstitute for Protein Design, University of Washington, Seattle, WA 98195;cBioengineering Graduate Program, University of Washington, Seattle, WA 98195;dHoward Hughes Medical Institute, University of Washington, Seattle, WA 98195;eDivision of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109
Abstract:Peptide-binding proteins play key roles in biology, and predicting their binding specificity is a long-standing challenge. While considerable protein structural information is available, the most successful current methods use sequence information alone, in part because it has been a challenge to model the subtle structural changes accompanying sequence substitutions. Protein structure prediction networks such as AlphaFold model sequence-structure relationships very accurately, and we reasoned that if it were possible to specifically train such networks on binding data, more generalizable models could be created. We show that placing a classifier on top of the AlphaFold network and fine-tuning the combined network parameters for both classification and structure prediction accuracy leads to a model with strong generalizable performance on a wide range of Class I and Class II peptide-MHC interactions that approaches the overall performance of the state-of-the-art NetMHCpan sequence-based method. The peptide-MHC optimized model shows excellent performance in distinguishing binding and non-binding peptides to SH3 and PDZ domains. This ability to generalize well beyond the training set far exceeds that of sequence-only models and should be particularly powerful for systems where less experimental data are available.

Sequence-based methods utilize large sets of experimentally validated binding and non-binding peptides to assemble position specific weight matrices or more sophisticated neural networks with several layers to discriminate binder from non-binder peptides (17). Methods such as NetMHCpan are the current state of the art to address key biological challenges like major histocompatibility complex (MHC)-peptide-binding specificity which is central to the adaptive immune system (T cell surveillance, differentiation, etc.), since they can readily optimize parameters over large sets of binding and non-binding peptides. However, sequence-based methods are limited by their inability to incorporate detailed structural information, and as a result, they have reduced generalizability, particularly in cases where there is less training data. While structure-based methods have shown promise to fill this “gap”, they have been limited by their inability to accurately predict protein and peptide backbone changes which can affect both affinity and specificity, and more importantly, they lack a way to optimize many model parameters on the large amounts of peptide-binding data that are often available (8).
Keywords:AlphaFold   fine-tuning   binding specificity prediction   peptide-MHC interactions   structure modeling networks
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号