EVALUATION OF AUTOMATIC MODEL DEVELOPMENT TOOLS PYDARWIN AND PHARMPY USING CLINICAL POPULATION PK DATA - PAGE Meeting (Population Approach Group Europe)

Sabine Stübler ¹, Igor Vasyutin ¹, Hugo Maas ¹

1 Boehringer Ingelheim Pharma GmbH & Co KG (, )

Introduction/Objectives: Automatic model development (AMD) aims to accelerate population PK (popPK) model building and enhance structural robustness by replacing subjective manual heuristics with systematic, algorithmic search strategies. We evaluated two AMD tools, pyDarwin [1] and the pharmpy AMD tool [2], on real clinical development datasets. Our objectives were to compare (i) final model quality using Bayesian Information Criterion (BIC), (ii) runtime, and (iii) ease-of-use and practical limitations.

Methods: The focus of the analysis was to identify structural base models for eleven internal case studies (from phase 1 and 2, covering intravenous and oral/subcutaneous administration). A harmonised search space was used for both tools and included first order, zero order, sequential zero- and first order, and lag time absorption models; one , two , and three compartment distribution models; first order, Michaelis-Menten, and combined elimination models; as well as inter individual variability (IIV) and residual unexplained variability (RUV). Dataset preparation (including BLQ handling and outlier filtering) followed the corresponding manual analyses and was not part of the AMD tool evaluation. To facilitate reuse, we created platform specific templates for both tools requiring only example specific inputs (dataset, dataset preparation, initial parameter values, administration route), minimising per example effort. All runs were executed on a high-performance cluster: pharmpy on 16 CPUs per job; pyDarwin allowing 20 parallel runs. NONMEM version 7.6.0, pharmpy version 1.7.2, and pyDarwin version 3.0.0 were used. Final models from each tool were benchmarked against each other and against an available manual solution for each example.

Results: In 10 of 11 examples, the pharmpy final model achieved a lower BIC than the pyDarwin final model; in 1 of 11 examples, pyDarwin achieved the lower BIC. Runtime was shorter for pharmpy in 3 of 11 examples and shorter for pyDarwin in 8 of 11. In 3 examples, pharmpy was superior with regard to both BIC and runtime, while in 1 example pyDarwin achieved better results in both metrics. Model structures diverged substantially between tools. Only 1 of 11 comparisons resulted in the same structural model across absorption, distribution, and elimination; in a further 4 of 11 comparisons, the structures matched in two of these three components. Final parameter estimates showed large variability between the two tools, in line with the structural differences selected by each workflow. When compared with manual solutions, 6 of 11 manual models lay outside the AMD search space (e.g., inclusion of inter occasion variability, dose effect on absorption, or saturable clearance). In the remaining 5 of 11 examples, pharmpy yielded a final model with a lower BIC than the manual solution; pyDarwin did so in 1 of 5. Runtimes spanned 17 minutes to 137 hours for pharmpy and 6 to 23 hours for pyDarwin. Limitations observed included reduced flexibility in pharmpy’s end‑to‑end AMD function (only a limited subset of subtool options is accessible) and two runs that terminated at or after the final step without influencing the conclusions. In addition, as inclusion of transit absorption models and inter-occasion variability into pyDarwin search space is currently too cumbersome, models with these components were excluded from search spaces of both tools.

Conclusions: Both AMD tools can substantially reduce manual effort and increase reproducibility, but thorough expert review remains essential. The available search spaces in both tools are limited; consequently, model features not included, such as e.g. dose dependent absorption effects or saturable clearance, cannot be identified by the automated workflows. While Pharmpy more frequently identified lower-BIC models within the defined space, pyDarwin’s global search strategy allowed for a faster exploration of diverse structural architectures. The results also show that, much like modeller to modeller variability, tool choice can considerably influence the selected model structure. Overall, no clear winner emerged; instead, both tools demonstrated distinct advantages as well as limitations.

References:
[1] https://certara.github.io/pyDarwin/html/index.html
[2] https://pharmpy.github.io/latest/index.html

Reference: PAGE 34 (2026) Abstr 12118 [www.page-meeting.org/?abstract=12118]

Poster: Methodology - New Modelling Approaches