IV-001

Announcement of Machine Learning Benchmarking

Mark Sale1, Dr Liang Zhao2

1Certara, 2University of California, San Francisco

Objectives: 1.Provide an objective structure to assess the performance of machine learning (ML) model selection methods, in a way relevant to real-world pharmacometrics projects. Methods: The last decade has seen dramatic innovation in ML methods for model selection. These include Automated Model development (AMD [1]), Neural ODEs, [2] IMPRES M [3], SAMBA [4], DeepPUMAS [5] and pyDarwin [6]. We propose to organize an annual benchmarking exercise of ML model selection methods. The prototype for this effort is the Critical assessment of methods of protein structure prediction (CASP) [7], which evaluated algorithms for predicting protein folding in a blinded fashion. The CASP effort is widely thought to both support the overall development of these methods and to build confidence in the approach. There was a similar, although more limited effort in pharmacometrics, organized by Girard and Mentre to assess the performance of different estimation algorithms for mixed effect models in 2005 [8]. First. we would like to collect from the pharmacometrics community inputs on need, timing, and format of an annual blinded benchmarking exercise on using machine learning (ML) approach for NONMEM model selections. If endorsed, the organizer would create up to 3 data sets from models specified in NONMEM. These models would be PK and or PK/PD. The organizer would publish these data sets along with a “search space”. The search space will consist of a list of hypotheses that would be entertained in real world model selection, such as number of compartments, absorption model, elimination mechanism, between subject variability terms, possible covariate relationships and candidate residual error models. The search space will include the “true” model, i.e., the data generating process (the NONMEM simulation model). The goal is limited to evaluating the performance of the algorithms, and is to be independent of the user. To ensure that results are independent of participant skills, the search space will be public. The ”analysis objectives” would also be shared with the participants. The analysis objectives would include the criteria that can be used to define a “good” model, with sufficient detail to enable model selection, as might be included in an analysis plan. Sufficient detail might include the criteria for likelihood ratio test (e.g., 3.84 points per degree of freedom), whether a successful covariance step is to be required and other performance criteria such as unbiased prediction of Cmax or the AUC of other pharmacokinetic properties including prediction of an external validation data set, simulated from the same model with a different random seed (with the validation data set remaining blinded to the participants, just as in the real world). Note that the “best” model by the analysis objectives need not be the “true” model, but rather the one that optimizes the analysis objectives. The data sets, search space and modeling objectives will be made public and announced on public fora (e.g., nmusers). The submission window will be open for 6 months after the announcement. Participants will report to the organizers: •Algorithm, and all required files to execute the model selection •Final results of the analysis objectives or predictions if the objective is an external validation data set •Run time for the model search. Conclusions: We feel that the development of ML based model selection method would be accelerated by unbiased assessment of the methods. The assessment methods should reflect to the extent possible, modeling processes and goals used in real world pharmacometrics. We therefore would like to collect inputs on a potential proposal on an annual blinded benchmarking exercise, similar to that done for protein folding in CASP.

 [1] Chen X, Nordgren R, Belin S, et al. A fully automatic tool for development of population pharmacokinetic models. CPT Pharmacometrics Syst Pharmacol. 2024; 13: 1784-1797. doi:10.1002/psp4.13222 [2] Lu, J., Bender, B., Jin, J.Y. et al. Deep learning prediction of patient response time course from early data via neural-pharmacokinetic/pharmacodynamic modelling. Nat Mach Intell 3, 696–704 (2021). https://doi.org/10.1038/s42256-021-00357-4 [3] https://pd-value.com/impres-m/current-status-impres-m/ Accessed 6 March, 2025 [4] Prague M, Lavielle M. SAMBA: A novel method for fast automatic model building in nonlinear mixed-effects models. CPT Pharmacometrics Syst Pharmacol. 2022; 11: 161-172. doi:10.1002/psp4.12742 [5] https://pumas.ai/our-products/deepPumas Accessed 6 March, 2025 [6] Li, X., Sale, M., Nieforth, K., Bigos, K.L., Craig, J., Wang, F., Feng, K., Hu, M., Bies, R. and Zhao, L. (2024), pyDarwin: A Machine Learning Enhanced Automated Nonlinear Mixed-Effect Model Selection Toolbox. Clin Pharmacol Ther, 115: 758-773. https://doi.org/10.1002/cpt.3114 [7] Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XIII. Proteins. 2019 Dec;87(12):1011-1020. doi: 10.1002/prot.25823. Epub 2019 Oct 23. PMID: 31589781; PMCID: PMC6927249. [8] https://www.page-meeting.org/page/page2005/PAGE2005O08.pdf  Accessed 6 March, 2025 

Reference: PAGE 33 (2025) Abstr 11658 [www.page-meeting.org/?abstract=11658]

Poster: Methodology - New Modelling Approaches

PDF poster / presentation (click to open)