Felix Jost (1), Christoph Grebner (2), Hans Matter (2), Henrik Cordes (1)
(1) Sanofi R&D, Drug Metabolism and Pharmacokinetics, Frankfurt, Germany; (2) Sanofi R&D, Integrated Drug Discovery, Frankfurt, Germany
Objectives: The prediction of in vivo (i.v.) plasma pharmacokinetic (PK) profiles for new small molecules is a large challenge in drug discovery. Several model-based (hybrid) PK prediction approaches were published which differ in their complexity and combination of machine learning (ML) and mechanistic modeling. Prediction models are trained on noncompartmental analysis (NCA) parameters as clearance (CL) and volume of distribution (Vss) predicting a monophasic PK curve [1]. Another approach uses a physiologically based PK (PBPK) model which is parameterized with ML predicted physicochemical parameters, see [1] and references therein. Approaches without mechanistic models make use of PK profiles as input and predict the PK profile using a trained neural networks [2]. We developed ML models to predict the parameters of one (CL1, V1), two (CL1, CL2, V1, V2) and three (CL1, CL2, CL3, V1, V2, V3) compartment (cmt) models and evaluated their prediction accuracy against the i.v. PK profiles. The proposed hybrid method based on empiric PK models and ML predicted parameters is embedded between approaches using NCA and mechanistic PBPK models.
Methods: 1006 intravenous PK profiles from 949 different small molecules were extracted from an internal database and NCA was performed without prior filtering. Further, molecules were excluded for which no SMILES string was available. For 889 studies of 812 molecules a one, two and three cmt model was fitted to the respective PK profiles using the NCA results as input for the initial estimates. A population PK approach with interindividual variability on all parameters and an exponential error model was used. For each study the best PK model was chosen (lowest root mean squared error and for all model parameters coefficient of variation < 50%). Next, for every PK parameter a separate graph convolutional network was trained with the estimated values from the PEs as input. The datasets were divided into training, validation and test sets with a split of 80%, 10%, 10%, respectively. Here, splitting was done using a chemical diversity-based algorithm (Min-Max-Picking as implemented in RDkit). Models were optimized by monitoring the Pearson correlation score (PCS) for the test dataset. For tuning the hyperparameters, several models with different hyperparameter combinations were build and tested [4]. The best performing method was used for further calculations. Training was stopped if the loss of PCS did not improve for 20 subsequent steps (early-stopping) [3]. NCA, compartmental modeling, descriptive statistics and visualization was performed within Rstudio (version 4.2.0). The SAEM algorithm implemented in nlmixr2 (version 2.0.8) was used. All machine learning calculations were performed using an inhouse python toolkit based on deepchem for machine learning as described in [3] and rdkit for molecular structures using internal scripts for data preparation, dataset splitting, training and prediction.
Results: The automized NCA resulted in 904 NCA with CL and Vss calculations from 798 molecules. The selection of the best PK model together with visual checks resulted in 694 PEs with 679 compounds divided up into 212 one, 339 two and 143 three cmt models. The ML models for the six PK parameters had a prediction accuracy on the training, test and validation set ranging from PCS=0.3-0.78, 0.4-0.8, 0.4-0.8, respectively. In addition, for every molecule a prediction of the PK profile was performed using the best PK model either with the estimated or predicted PK parameters. As quality criterion the PCS and mean symmetric accuracy (MSA) were used on the predicted vs. observed concentrations. The highest MSA was 13% and PCS=0.96 observed for the three cmt models with estimated parameters, following the two (MSA=22%, PCS=0.86) and one (MSE=87%, PCS=0.51) cmt models. Using the predicted parameters, the two cmt models resulted in the highest MSA = 91% and PCS = 0.74, following the three (MSA=134%, PCS=0.46) and one cmt (MSA=189%, PCS=0.60) models. The prediction accuracy was acceptable via visual checks including Cmax and AUC comparisons with the highest errors on low concentrations.
Conclusions: As a case study our work demonstrated the usage of a hybrid approach to predict PK profiles of new molecules based on predicted PK parameters and the most suitable PK model. The required next step is the development of a PK model classifier which is upstream of the ML models to complete the prediction approach.
References:
[1] Mavroudis, D. P. et al. Application of Machine Learning in combination with Mechanistic Modeling to Predict Plasma Exposure of Small Molecules. Front Syst Biol (3), (2023).
[2] Bräm, S. D., . et al. Introduction of an artificial neural network-based method for concentration-time predictions. CTP : Pharmacometrics & Systems Pharmacology (11), 745-754 (2022).
[3] Grebner, C. et. al. Application of Deep Neural Network Models in Drug Discovery Programs. Chem Med Chem (16), 3772-3786 (2021)
Reference: PAGE 32 (2024) Abstr 11235 [www.page-meeting.org/?abstract=11235]
Poster: Methodology – AI/Machine Learning