Development and application of ML based drug renal clearance predictor - PAGE Meeting (Population Approach Group Europe)

Natalia Łapińska, Sebastian Polak ^1,2

1 Jagiellonian University Medical College (Krakow, Poland), 2 Certara UK (Sheffield, UK)

Introduction:
Renal clearance is one of determinants of systemic exposure for many small molecules and their metabolites. Robust estimation of renal clearance remains a key objective from early drug candidates screening up to late-stage clinical pharmacology. At the pre-clinical level, renal clearance is characterized using in silico models, in vivo mass-balance and PK studies, kidney tissue distribution, and mechanistic in vitro systems (e.g., transporter assays and renal proximal tubule models) to dissect secretion and reabsorption pathways. At the clinical level, renal clearance is commonly estimated from urinary excretion data combined with plasma concentration profiles, supported by population PK, physiologically based PK, and biomarker-based assessments of renal function.
Objectives:
To develop and validate ML based empirical model allowing human renal clearance prediction.
Methods:
The dataset, derived from a comprehensive literature survey, comprised 1540 experimental renal clearance (CLR) observations corresponding to 462 unique compounds after exclusion of smaller populations of patients [1]. Molecular structures were encoded using over 200 RDKit descriptors capturing physicochemical, topological, and functional group information [2]. To account for physiological variability, the model incorporated study-level covariates including body weight, administered dose, route of administration, clearance determination method, and health status. Transporter interaction data (influx/efflux) were retrieved from the Drug Interaction Database (DIDB®) and encoded to reflect active renal transport mechanisms [3]. Model development was performed using an Automated Machine Learning (AutoML) MLJAR tool with compound-level splitting (90% training, 10% test set) [4]. Various scenarios with varying input vector were tested. PBPK model built in Certara’s Simcyp simulator (V25) was used to simulate kinetics of example drug, metformin, under various conditions: 1) without, 2) with clinically estimated, 3) with mechanistically modelled [5], 4) with the developed QSAR predicted CLR as an input [6]. Simulation of metformin kinetics after single dose was then compared versus observed data.
Results:
The final predictive system was an ensemble composed of XGBoost, Random Forest, LightGBM, and ExtraTrees single models. The model achieved R² = 0.84 (RMSE = 3.15) for the training set and R² = 0.61 (RMSE = 5.36) on the external test set, demonstrating moderate but robust generalization performance. SHAP analysis identified health status as the dominant covariate, while dose and route of administration showed minimal influence, supporting the hypothesis that intrinsic elimination mechanisms outweigh administration-related factors under linear pharmacokinetic conditions. Structural descriptors contributed through complex, non-linear interactions rather than simple monotonic relationships.
The developed model was used to predict CLR for metformin (25 L/h) and compared versus the clinically estimated value (32.5 L/h). The observed Cmax and AUCinf after single dose of 500 mg metformin HCL in healthy individuals varies between 1-1.4 µg/ml and 5.5-8.5 µg/ml•h [7,8]. The Simcyp simulated Cmax and AUC, single dose 500 mg metformin HCL for the above mentioned 4 scenarios were as follows: 1) 3.16/62.88 (no CLR), 2) 0.95/6.52 (clinically estimated CLR=32.5 L/h), 3) 1.06/7.55 (mechanistically modelled CLR), 4) 1.10/8.07 (the developed QSAR predicted CLR=25 L/h) µg/ml and µg/ml•h respectively.
Conclusions:
The proposed empirical model allows to predict CLR based on relatively simple independent parameters. The models’ precision increases with the increase of input data details level, i.e. even qualitative information on API’s affinity to transporters allows for more robust CLR estimation. As part of the current study the predicted CLR value was used to parametrise example PBPK model built for metformin. In this case the predicted PK parameters, namely Cmax and AUC were overpredicted when CLR was not considered. Addition of the QSAR predicted CLR allowed to simulate realistic drug kinetics comparable with the clinical data.
The developed QSAR model can be also used for early prediction of CLR, even at the discovery stage.

References:
[1] Łapińska, N; Polak, S, Human renal clearance of xenobiotics, Mendeley Data, V2, 2025, doi:10.17632/3427x3wzzc.2
[2] RDKit version 2025.3.3 https://www.rdkit.org/
[2] Drug Interaction Database (DIDB®) https://www.certara.com/drug-interaction-database-didb/
[4] Plonska A., Plonski P. MLJAR: State-of-the-Art Automated Machine Learning Framework for Tabular Data, version 0.10.3. https://github.com/mljar/mljar-supervised
[5] Neuhoff, S. et al. (2013). Accounting for Transporters in Renal Clearance: Towards a Mechanistic Kidney Model (Mech KiM). In: Sugiyama, Y., Steffansen, B. (eds) Transporters in Drug Development. AAPS Advances in the Pharmaceutical Sciences Series, vol 7. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8229-1_7
[6] Simcyp simulator https://www.certara.com/software/simcyp-pbpk/
[7] Santos-Caballero N et al., Comparative Pharmacokinetic Study between Metformin Alone and Combined with Orlistat in Healthy Mexican Volunteers”, Pharmacology & Pharmacy 3(3)2012, doi:10.4236/pp.2012.33040
[8] Lucía Montoya-Eguía S. et al. Comparative Pharmacokinetic Study Among 3 Metformin Formulations in Healthy Mexican Volunteers: A Single-Dose, Randomized, Open-Label, 3-Period Crossover Study, Current Therapeutic Research, 77, 2015, doi:10.1016/j.curtheres.2014.09.003

Reference: PAGE 34 (2026) Abstr 12200 [www.page-meeting.org/?abstract=12200]

Poster: Drug/Disease Modelling - Other Topics