MF-SuP-pKa: Multi-fidelity modeling with subgraph pooling mechanism for pKa prediction期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

MF-SuP-pKa: Multi-fidelity modeling with subgraph pooling mechanism for pKa prediction

Affiliation:	1. Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China;2. CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, China;3. Tencent Quantum Laboratory, Tencent, Shenzhen 518057, China;4. Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, China

Abstract:	Acid-base dissociation constant (pK_a) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_a prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_a (multi-fidelity modeling with subgraph pooling for pK_a prediction), a novel pK_a prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledge-aware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_a prediction. To overcome the scarcity of accurate pK_a data, low-fidelity data (computational pK_a) was used to fit the high-fidelity data (experimental pK_a) through transfer learning. The final MF-SuP-pK_a model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_a achieves superior performances to the state-of-the-art pK_a prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_a achieves 23.83% and 20.12% improvement in terms of mean absolute error (MAE) on the acidic and basic sets, respectively.

Keywords:	Graph neural network Subgraph pooling Multi-fidelity learning Data augmentation
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏