**SAMBA: a new algorithm for automatic construction of nonlinear mixed-effects models.**

Marc Lavielle (1, 2), Mélanie Prague (1)

1) Inria, France (2) Ecole Polytechnique, France

**Objectives:** Building a pharmacometric model is a complex process that requires in-depth expertise, advanced statistical methods, the use of sophisticated software tools, but most importantly, time and patience. Indeed, correctly identifying all the components of the model is far from easy: It is a matter of finding the best structural model, determining the nature of the relationship between covariates and individual parameters, selecting which parameters vary between individuals, identifying possible correlations between random effects, or even modelling residual errors. Our goal is to accelerate and optimise this process of model building by figuring out how best to improve some of the model components at each step.

**Methods:** The procedure for building a model is usually iterative: one fits an initial model to the data, and diagnostic plots and statistical tests allow one to identify possible misspecifications in the proposed model. Then, a new model must be proposed to correct these flaws and improve the predictive capabilities of the model. The choice of this new model is usually not obvious: for example, which covariates should be included in the model when many covariates are available and correlated?

Machine Learning is a branch of Artificial Intelligence where the machine is trained to learn from its past experiences. This is exactly what we are asking our machine to do here: It uses the results obtained with an incorrect model to learn the correct model. We then propose the following iterative procedure called SAMBA (Stochastic Approximation for Model Building Algorithm): The structural model is selected and an initial statistical model is defined to describe the random nature of the data. The population parameters of this first model are estimated and the individual parameters are sampled from the conditional distribution defined under this first model. These simulated individual parameters can now be used in combination with the observed data to select and fit a new statistical model. For example, if we assume linear covariate models, we need to build the linear model with the lowest BIC for each individual parameter, which is easy and fast since both the individual parameters and the covariates are available at this stage. The method can be easily extended to other covariate models: For example, a power model would require fitting a linear covariate model to the log parameter. Once the covariate model is selected, the random effects are computed and their correlation and variance structure are selected by minimizing an appropriate criterion (AIC, BIC, ...). Finally, the evaluation of the structural model based on the simulated individual parameters provides predictions that can be used to select an optimal residual error model. Once this new statistical model is identified, it can in turn be fitted to the data. This method makes it possible to find the optimal model that minimizes the selected criterion in a few steps.

Extensions of SAMBA have been developed that allow the introduction of prior information about the model. In this way, a posterior distribution is defined over the set of models, which can be either maximized or sampled.

**Results:** This method has been implemented in the R package Rsmlx. An extensive simulation study has shown that for simple PK examples, usually only one or two steps of the proposed method are needed to reach the final model. We have also applied this method to fit cisplatin PK data using a three-compartment PK model. The available covariates are weight, height, body surface area (BSA), age, and sex. We started by fitting the simplest statistical model (no covariates, no correlation, full variability, constant error models). It then took less than 10' for SAMBA to discover the model with the lowest BIC: V1 is a function of BSA, there is no IIV on V3 and Q3, V1 and V2 are correlated (r=0.82) as well as Cl and Q2 (r=0.65), the residual error model is a combined model.

**Conclusion:** We propose an extremely promising new approach that will help the modeler to build his model faster and more efficiently. Of course, this is not a decision-making tool, but a decision-support tool: blind optimization of a statistical criterion may have no biological meaning. Finally, it should be emphasized that this method can be easily implemented in any software that uses an MCMC-like algorithm, which is the case for the most commonly used tools in pharmacometrics.

**References:**

[1] Nguyen T.H.T. et al (2017) Model Evaluation of Continuous Data Pharmacometric Models: Metrics and Graphics, CPT Pharmacometrics Syst. Pharmacol., Vol 6, pp 87–109.

[2] Jonsson, E. N., & Karlsson, M. O. (1998). Automated covariate model building within NONMEM. *Pharmaceutical research*, Vol 15, 1463-1468.

[3] Lavielle M. (2014), Mixed Effects Models for the Population Approach - Models, Tasks, Methods and Tools. Chapman & Hall/CRC Biostatistics Series, CRC Press, Boca Raton, Florida, USA.

[4] Prague, M., & Lavielle, M. (2022). SAMBA: A novel method for fast automatic model building in nonlinear mixed‐effects models. *CPT: Pharmacometrics & Systems Pharmacology*, Vol 11, 161-172.

[5] Ayral, G., Si Abdallah, J. F., Magnard, C., & Chauvin, J. (2021). A novel method based on unbiased correlations tests for covariate selection in nonlinear mixed effects models: The COSSAC approach. *CPT: pharmacometrics & systems pharmacology*, Vol 10, 318-329.

[6] Murphy K. P. (2012), Machine Learning: A Probabilistic Perspective. The MIT Press