Mélanie Karlsen 1,2, Jérôme Azé 1, Sandra Bringay 1,3, Pascal Poncelet 1, David Fabre 2, David Marchionni 2, Elisa Calvier 2
1 LIRMM, Laboratory of Computer Science, Robotics and Microelectronics in Montpellier, CNRS, Montpellier University (Montpellier, France), 2 : Sanofi R&D Montpellier, Pharmacokinetics Dynamics and Metabolism / Translational Medicine and Early Development (Montpellier, France), 3 Applied Mathematics, Computer Science and Statistics (AMIS), Montpellier 3 University (Montpellier, France)
Objective
Covariate model building (CMB) remains one of the most time-consuming and decision-intensive steps in population pharmacokinetic (popPK) modeling. Stepwise covariate modeling (SCM) is widely used [1] for its interpretability and structured search strategy, but it suffers from substantial computational burden. Machine-learning (ML) approaches have been proposed to assist covariate screening [1]; however, they are most often applied as a preliminary, standalone step prior to SCM [2]. Such two-step ML–SCM workflows typically ignore the uncertainty associated with empirical Bayes estimates (EBEs) and rely on fixed covariate preselection rules, which may result in inefficient searches, unstable covariate–parameter relationships, or unnecessary exclusion of relevant covariates.
The objective of this work was to develop an SCM variant that integrates ML directly within each step of the covariate search while accounting for EBE uncertainty. The aim was to improve computational efficiency without compromising model quality or interpretability. This method is referred to as SCM-dML (SCM – dynamic ML).
Methods
In SCM-dML, ML methods are embedded within each forward step of the SCM procedure. At every iteration, ML models are trained to rank the importance of candidate covariates for each pharmacokinetic (PK) parameter corresponding EBEs. To address the known limitations of EBEs, the covariate search space is adaptively reduced using a quantity referred to as “retention score”, reflecting EBE uncertainty (standard error of individual parameter estimates) relative to the EBEs variance across the population. The higher the retention score, the more (up to all) covariates are tested by SCM. The smaller the retention score, the less covariates (at least one) are tested by SCM. For all continuous covariates-parameter pairs, the only tested functional relationship among linear, exponential and power by SCM is the one achieving lowest RMSE according to a preliminary fit of the covariates on the individual PK parameters.
The proposed method was evaluated using two real popPK datasets corresponding to two distinct compounds, referred to as molecules A and B. For each dataset, predictive performance and model fitness were assessed using relative mean prediction error (MPE), relative root mean square error (RMSE), Bayesian Information Criterion (BIC), and computational cost was reported as total run time. The number of selected covariate-parameter pairs and fold changes (FC) of PK parameters between the 5th and 95th percentiles of covariates were also reported. SCM-dML was compared against four common CMB approaches: SCMplus [3], COSSAC [4], covSAMBA [5], and the conventional two-step ML–SCM workflow [2] (referred to as ML-SCM) in which ML-based covariate screening is performed only prior to SCM, with the top three covariates systematically retained regardless of EBE uncertainty. All methods were applied using identical base models, covariate coding, and initial search spaces to ensure a fair comparison.
Results
Across both molecules A and B, the BIC ranking (from best to worst) was: SCM+, SCM-dML, ML-SCM, COSSAC, and covSAMBA. In contrast, for predictive performance metrics (MPE and RMSE), COSSAC and covSAMBA consistently ranked highest, followed by ML-SCM, while SCM-dML and SCM+ showed the lowest performance.
Despite these differences in ranking, SCM-dML achieved predictive performance comparable to the other SCM-based approaches (SCM+ and ML-SCM), with maximum absolute differences of 0.39 in relative MPE and 0.09 in relative RMSE. SCM-ML provided a substantial reduction in computational time compared with SCM+, being approximately 2.1- and 6.2-fold faster for molecules A and B, respectively. CovSAMBA remained the fastest method overall (5.9- and 10.3-fold faster than SCM-ML).
The number of selected covariate-parameter pairs (and FC) for COSSAC, covSAMBA, SCM-dML, ML-SCM and SCM+ were 1 (1.34), 3 ([1.26 – 1.91]), 2 ([1.55, 1.88]), 2 ([1.03, 1.85]) and 7 ([0.826 – 9.42]) respectively for molecule A, and 1 (2.2), 7 ([1.18 – 4.97]), 7([1 – 2.22]), 4 ([1.06 – 2.49]) and 13 ([1.07 – 5.5]) respectively for molecule B. These tendencies suggest that COSSAC is conservative, while SCM+ is more permissive, and ML-hybrid approaches (SCM-dML, ML-SCM) and covSAMBA fall in between.
Conclusion
SCM+ produced more complex models which were favored by BIC, whereas parsimonious approaches, particularly COSSAC were favored by predictive metrics (MPE, RMSE). ML-assisted strategies maintained predictive performance comparable to SCM+ while reducing its computational time and limiting the inclusion of numerous covariate effects. Comparisons on simulated datasets and extra real datasets are needed to confirm these findings
References:
[1] M. Karlsen, S. Khier, D. Fabre, et al., “Covariate Model Selection Approaches for Population Pharmacokinetics: A Systematic Review of Existing Methods, From SCM to AI,” CPT: Pharmacometrics & Systems Pharmacology 14, no. 4 (2025): 621–639.
[2] E. Sibieude, A. Khandelwal, J. S. Hesthaven, P. Girard, and N. Terranova, “Fast Screening of Covariates in Population Models Empowered by Machine Learning,” Journal of Pharmacokinetics and Pharmacodynamics 48 (2021): 597–609.
[3] R. J. Svensson and E. N. Jonsson, “Efficient and Relevant Stepwise Covariate Model Building for Pharmacometrics,” CPT: Pharmacometrics & Systems Pharmacology 11 (2022): 1210–1222
[4] G. Ayral, J.-F. S. Abdallah, C. Magnard, and J. Chauvin, “A Novel Method Based on Unbiased Correlations Tests for Covariate Selection in Nonlinear Mixed Effects Models: The COSSAC Approach,” CPT: Pharmacometrics & Systems Pharmacology 10 (2021): 318–329.
[5] M. Prague and M. Lavielle, “SAMBA: A Novel Method for Fast Automatic Model Building in Nonlinear Mixed-Effects Models,” CPT: Pharmacometrics & Systems Pharmacology 11 (2022): 161–172.
Reference: PAGE 34 (2026) Abstr 12156 [www.page-meeting.org/?abstract=12156]
Poster: Methodology - Covariate/Variability Models