Ibtissem Rebai 1, Stephen Duffull Duffull 1, Ayman Akil 1, Anna Largajolli 1, Floris Fauchet 1
1 Certara ( Princeton, USA)
Introduction
Stepwise covariate modeling (SCM) [1] is the most widely used method for covariate selection. An improved version (called SCM+) was recently introduced with adaptive scope reduction and stage-wise filtering [2,3]. Machine learning (ML) approaches have emerged as promising alternatives, with random forests, neural networks, and support vector regression demonstrating major reductions in computational burden when applied to empirical Bayes estimates (EBEs) [4]. This study evaluates the Boruta feature selection algorithm [5] combined with four different tree-based ML models (ie, Random Forest, XGBoost, LightGBM, and CatBoost), with or without Lasso pre-processing [6]. This work compare the performance of the ML approach to SCM+ across linear and target-mediated drug disposition (TMDD) models.
Methods
A linear and a TMDD model with a quasi-steady-state approximation were used for simulations [7]. A single-dose with rich sampling design (180 subjects across 6 dose levels) was simulated. Seven covariates were sampled from the NHANES database to preserve realistic correlation structure: age, sex, race, body weight, albumin, creatinine, and hemoglobin. Six covariate parameter relationship scenarios were simulated ranging from no covariate effect (Scenario 0) to multiple correlated effects on clearance (CL) and volume (V1) (Scenario 5), with 100 datasets per scenario. Base models without covariate effects were fitted to all datasets, and EBEs served as target variables. Lasso regularization (with λ_min or λ_1se) was evaluated as a pre-processing step. λ_min selects the λ with the lowest cross-validated error, whereas λ_1se selects a slightly larger λ within one standard error of the minimum. Boruta was combined with four tree-based ML models (ie, Random Forest, XGBoost, LightGBM, and CatBoost) and its performance with or without Lasso was assessed using Type I error, sensitivity (S1), and precision (S2), and compared to SCM+ with a p-value of 0.05 and 0.01 for the forward and backward step. S1 was defined as the ability to identify the correct covariates together with additional ones on EBEs whereas S2 was defined as the ability to identify exclusively the correct covariates
Results
All parameters from the models runs were well estimated with a low shrinkage (RSE < 15%; shrinkage < 5%) except for CL in the TMDD model (moderate shrinkage of 20%).
Linear Model:
In Scenario 0, SCM+ demonstrated excellent Type I error control for CL and V1 (0% and 1%, respectively). Among Boruta variants, LightGBM showed the lowest false positive rate (9% and 13%) vs CatBoost (24% and 20%) and XGBoost (20% and 24%). Boruta Random Forest exhibited inflated Type I error (40% and 43%) and thus was excluded from further evaluation.
In single-covariate scenarios (from scenario 1 to 4), SCM+ achieved near-perfect S1 and S2 (varying from 97% to 100%), while Boruta LightGBM showed similar high sensitivity (S1 from 98% to 100%) but moderate precision (S2 from 64% to 90%) due to occasional over-selection of covariate. Results for XGBoost and CatBoost are similar for S1 but degraded for S2.
In Scenario 5 (Weight and gender effect on CL and V1), SCM+ achieved S1 and S2 of 0%, consistently failing to detect both covariates simultaneously. Gender was the only covariate selected in 100% of case. In contrast, Boruta LightGBM substantially outperformed SCM+, achieving S1 of 89% and 95% (for V1 and CL, respectively) and 66%,74% for S2.
Lasso pre-processing markedly improved performance of Boruta LightGBM. Both λ_min and λ_1se reduced Type I error to <5% (i.e comparable to SCM+). In single continuous-covariate scenarios, Lasso λ_min + Boruta LightGBM achieved near-perfect sensitivity and precision (S1=99%, S2=94% to 96%). For categorical covariate scenarios, Lasso λ_1se + Boruta LightGBM have perfect sensitivity of 100% and achieved superior precision (S2: 96% to 99%) by eliminating correlated continuous covariates prior to Boruta.
In Scenario 5, Lasso λ_min + Boruta LightGBM performed best with S1/S2 of 93%/75% and 91%/78% (CL and V1).
TMDD Model:
Lasso λ_min + Boruta LightGBM maintained strong Type I error control (<5%) and high sensitivity (S1 = 99% to 100%). Precision decreased modestly from 7 to 10% reduction in S2 (for scenario 1 to 4) compared to the linear model, reflecting increased structural complexity.
Conclusion
Boruta performance with LightGBM and Lasso pre-processing step provided the most reliable balance between sensitivity and precision. SCM+ consistently failed to detect two correlated covariates but achieved excellent results for other scenarios
References:
1. Jonsson EN, Karlsson MO (1998) Automated covariate model building within NONMEM. Pharm Res 15:1463–1468. https://doi.org/10.1023/a:1011970125687
2. Svensson RJ, Jonsson EN. Efficient and relevant stepwise covariate model building for pharmacometrics. CPT Pharmacometrics Syst Pharmacol. 2022;11(8):1027-1040.
3. Ahamadi M, et al. Operating characteristics of stepwise covariate selection in pharmacometric modeling. J Pharmacokinet Pharmacodyn. 2019;46(3):273-285.
4. Sibieude E, Khandelwal A, Hesthaven JS, Girard P, Terranova N. Fast screening of covariates in population models empowered by machine learning. J Pharmacokinet Pharmacodyn. 2021;48(4):597-609.
5. Kursa MB, Jankowski A, Rudnicki WR. Boruta—a system for feature selection. Fundamenta Informaticae. 2010;101(4):271-285.
6. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B. 1996;58(1):267-288.
7. B. Meibohm, B. Brockhaus, M. Zühlsdorf, A. Kovar. Semi-mechanistic model-based drug development of EMD 525797 (DI17E6), a novel anti-?v integrin monoclonal antibody. PAGE 22 (2013), 11/2015
Reference: PAGE 34 (2026) Abstr 12264 [www.page-meeting.org/?abstract=12264]
Poster: Methodology – AI/Machine Learning