HYUNJUNG LEE 1,2, Hyeonseok Kang 3, Woojin Jung 4,5, Hyojin Cho 2, Hwi-yeol Yun 1,2,4, Min-Gul Kim 6, Jung-woo CHAE 1,2,4, Sang-Min Park 2, Sangkeun Jung 3, Soyong Lee 2, Jae Hyun Kim 7
1 Department of Bio-AI convergence, Chungnam National University (Daejeon, Repubilc of Korea), 2 College of Pharmacy, Chungnam National University (Daejeon, Repubilc of Korea), 3 Department of Computer Science and Engineering, Chungnam National University (Daejeon, Repubilc of Korea), 4 Senior Health Convergence Research Center, Chungnam National University (Daejeon, Repubilc of Korea), 5 Graduate School of Clinical Pharmacy, CHA University (Pocheon, Republic of Korea), 6 Jeonbuk National University Medical school (Jeonju, Republic of Korea), 7 School of Pharmacy and Institute of New Drug Development, Jeonbuk National University (Jeonju, Republic of Korea)
Objectives: Drug development requires scalable and traceable pharmacokinetic (PK) prediction frameworks that support compound prioritization and model-informed translation across development stages. 1–6. Although physiologically based pharmacokinetic (PBPK) modeling provides mechanistic interpretability, conventional workflows still rely on experimentally derived inputs and iterative manual parameterization 7–11. We aimed to develop a modular hybrid AI–PBPK framework that generates PK profiles directly from molecular structure (SMILES) by integrating: (i) prediction of PBPK-relevant ADME properties from heterogeneous real-world datasets in which many compounds were measured for only a subset of endpoints, using a multi-task, multi-head architecture with feature-wise masking; (ii) estimation of clearance as a PBPK input parameter by incorporating metabolism-related interaction features inferred from drug–enzyme representations together with physicochemical descriptors; and (iii) propagation of AI-derived inputs through a PBPK simulator to generate concentration–time profiles and PK summary metrics. We also organized the ADMET layer into a PBPK-input track for exposure simulation and a liability-screening track covering CYP inhibitor status, Ames, carcinogenicity, hERG, and biodegradation.
Methods: We developed an end-to-end hybrid AI–PBPK pipeline within this two-track ADMET framework. In the PBPK-input track, a pretrained transformer-based molecular encoder with multi-task, multi-head regression heads was used to predict lipophilicity, ionization, solubility, permeability, plasma protein binding, and unbound fraction from SMILES. Because the integrated ADME datasets had a partial-label structure, feature-wise masking was applied so that each endpoint contributed to training only when observed. To estimate clearance, a metabolism-focused drug–target interaction module used drug SMILES and protein sequence representations to infer metabolism-related interaction features across major biotransformation enzyme families. These features were combined with physicochemical descriptors to predict intrinsic hepatic clearance and translated into PBPK elimination terms using IVIVE-style assumptions. The PBPK layer was implemented using PhysioSim, our in-house PBPK simulation framework, with ACAT-style oral absorption and tissue distribution based on published partitioning methods. External evaluation of the integrated workflow used an oral Drug Interaction Database (DIDB) dataset of 98 drugs, with additional comparative analysis between PhysioSim and PK-Sim on a 9-drug subset. In the liability-screening track, separate ADMET classification models for CYP inhibitor status, Ames, carcinogenicity, hERG, and biodegradation were trained using DrugBank-derived data and evaluated using DIDB-based data.
Results: The framework enabled direct generation of PBPK-relevant input parameters and concentration–time profiles from SMILES. Within the PBPK-input track, the merged-data multi-head ADME model achieved test-set R²/Pearson r values of 0.810/0.900 for LogP, 0.628/0.797 for solubility, 0.549/0.744 for permeability, 0.355/0.620 for plasma protein binding, 0.213/0.517 for unbound fraction, and 0.177/0.446 for pKa. The metabolism-focused interaction module generated enzyme-level interaction features that were incorporated with physicochemical descriptors for intrinsic hepatic clearance prediction. In external evaluation of intrinsic clearance prediction (N = 172), the model achieved RMSE = 2.413 and AFE = 1.116, with 52.97% and 70.79% of predictions within 2-fold and 3-fold error, respectively. In external validation of the integrated ML–PBPK workflow using the oral DIDB dataset (N = 98), predictive performance for PK endpoints was low overall; R² was 0.021 for Tmax and 0.219 for Cmax, Pearson correlation coefficients ranged from 0.145 to 0.468, 10.1%–40.0% of predictions were within 2-fold error, and 17.4%–65.7% were within 3-fold error. In the 9-drug comparative analysis, PhysioSim showed overall performance comparable to PK-Sim, with higher fold-error performance clearance (%2-fold 44.4% vs. 11.1%; %3-fold 55.6% vs. 11.1%) and Vd (%3-fold 22.2% vs. 0%), while PK-Sim showed lower MSE and higher R² for some parameters, including Cmax.
Conclusions: We developed a modular hybrid AI–PBPK framework for PK prediction directly from SMILES by integrating multi-task ADME prediction, metabolism-aware feature inference, clearance estimation, and PBPK simulation. The two-track ADMET design separates PBPK parameter generation from developability-related screening, enabling exposure prediction and early risk review within a unified workflow. Although DIDB external validation showed limited predictive accuracy for some PK endpoints, the PhysioSim-based framework reproduced concentration–time profiles from AI-derived inputs and demonstrated broadly PK-Sim–comparable performance in comparative evaluation, particularly with favorable fold-error performance for clearance and Vd. The framework provides a scalable and traceable foundation for model-informed PK prediction and translation from molecular design to downstream development decisions, while identifying areas for further refinement.
H. Lee, H. Kang, W. Jung, and H. Cho contributed equally to this work; corresponding authors: H.-y. Yun, M.-G. Kim, J.-w. Chae, S.-M. Park, S. Jung, S. Lee, and J. H. Kim
References:
1. Wong CH, et al. Estimation of clinical trial success rates and related parameters. Biostatistics. 2019;20(2):273-286.
2. DiMasi JA, et al. Innovation in the pharmaceutical industry: New estimates of R&D costs. J Health Econ. 2016;47:20-33.
3. Eddershaw PJ, et al. ADME/PK as part of a rational approach to drug discovery. Drug Discov Today. 2000;5(9):409-414.
4. Agrawal M, et al. Large Language Models are Few-Shot Clinical Information Extractors. arXiv. Preprint posted online November 30, 2022:arXiv:2205.12689.
5. Brown TB, et al. Language Models are Few-Shot Learners. arXiv. Preprint posted online July 22, 2020:arXiv:2005.14165.
6. Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv. Preprint posted online August 27, 2019:arXiv:1908.10084.
7. Nye B, et al. A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. In: Gurevych I, Miyao Y, eds. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics; 2018:197-207.
8. Hu Y, et al. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics. 2023;39(9):btad542.
9. Wang Q, et al. PICO entity extraction for preclinical animal literature. Syst Rev. 2022;11(1):209.
10. Huang DZ, et al. The challenges of generalizability in artificial intelligence for ADME/Tox endpoint and activity prediction. Expert Opin Drug Discov. 2021;16(9):1045-1056.
11. Wang W, et al. A Tutorial on RxODE: Simulating Differential Equation Pharmacometric Models in R. CPT Pharmacomet Syst Pharmacol. 2016;5(1):3-10.
12. Lacarelle B, et al. Abbott PKS system: A new version for applied pharmacokinetics including Bayesian estimation. Int J Biomed Comput. 1994;36(1):127-130.
13. Jung WJ, et al. Dose Optimization of Vancomycin Using a Mechanism-based Exposure–Response Model in Pediatric Infectious Disease Patients. Clin Ther. 2021;43(1):185-194.e16.
Reference: PAGE 34 (2026) Abstr 11885 [www.page-meeting.org/?abstract=11885]
Poster: Methodology – AI/Machine Learning