I-084

TOWARD AUTONOMOUS PHARMACOMETRICS: A MULTI AGENT FRAMEWORK FOR STRUCTURE DISCOVERY, COVARIATE SELECTION AND GRAY-BOX MODELING

Racym Berrah 1, Jean-Baptiste Woillard 1,2

1 Pharmacology & Transplantation, INSERM U1248 (Limoges, France), 2 Department of Pharmacology, Toxicology and Pharmacovigilance, CHU Limoges (Limoges, France)

Objectives: Developing population pharmacokinetic (PopPK) models is an iterative, expert-dependent process. We present an autonomous multi-agent framework capable of performing end-to-end PopPK modeling. The study aims to (1) validate the autonomous engine’s reliability against industry standards (Monolix) and literature (Woillard et al., 2011) using a PyTorch-based SAEM implementation, (2) assess the limitations of current LLM code generation (e.g., syntax errors, runtime crashes), and (3) outline a future robust architecture combining JAX acceleration and an Aviary-inspired safety layer for open-ended equation discovery.
Methods: The architecture orchestrates two Large Language Model (LLM) agents: an Architect (strategy/hypothesis generation) and a Coder (implementation)1. Both LLM agents are running locally without any connection. These agents interact with a custom-built pyTorch Stochastic Approximation Expectation-Maximization (SAEM) estimation engine. The framework includes a numerical safety layer (state sanitization and multi-solver fallback) to manage unstable ODE code generated by the LLM. The system operates in a closed loop, refining model structures based on BIC/OFV metrics. We tested the framework on:
1. Theophylline dataset: To benchmark the custom SAEM engine against Monolix.
2. Simulated Tacrolimus dataset: 50 patients with 6 points each generated based on the Woillard et al.2 model, including an intentionnally random dummy covariate (Sex) to test the robustness of the selection procedure.
3. Real-world clinical data: 73 patients with 10 to 12 points each to validate performance with real world data.
Results: The custom SAEM engine demonstrated high reliability. On the theophylline dataset, parameter estimates were closely aligned with Monolix (CL: 0.046 vs 0.04 (Monolix), V 0.48 vs 0.45 (Monolix), “KA”: 1.69 vs 1.53 (Monolix)). On the more complex simulated dataset, the agent successfully recovered the “Ground Truth” structure (transit compartments + peripheral) after six iterations and correctly rejected the dummy covariate, showing that it does not overfit noise. However, the study highlighted critical limitations. The LLM “Coder” occasionally generated syntactically incorrect or unstable code, leading to execution crashes. These performance results were similar when the approach was applied to real-world clinical data. This could be resolve by improving the prompts. While the current implementation faces computational bottlenecks—requiring two hours for the theophylline dataset compared to one minute for Monolix —significant performance gains are already achievable, as demonstrated by a JAX-accelerated GPU version that reduces runtime to approximately 7 minutes. The definitive advantage of this framework lies in its PyTorch-based core, which enables the seamless integration of Neural Networks for advanced hybrid (gray-box) modeling. This architecture offers a substantial ceiling for optimization and provides a highly flexible env aironment for discovering complex biology, far exceeding the structural constraints of traditional, highly-optimized but rigid platforms.
Conclusion: This Proof-of-Concept confirms LLM agents can act as effective “Junior Pharmacometricians” but require stringent guardrails. Future work focuses on three axes: (1) HPC Acceleration: Porting the application entirely to JAX. (2) Trying geometric Constraints: Using Grassmann Flows to model latent physiological states for hybrid modeling (3) Robust Orchestration: Implement an Aviary-like (Future House) intermediate layer. This sandbox will allow the LLM to freely propose and explore novel differential equations while enforcing JAX-compliance and mathematical validity before execution, ensuring the system can discover new biology without compromising software stability.

References:
References
1. Holt S, Liu T, Qian Z, Schaar M, Weatherall J. Data-Driven Discovery of Dynamical Systems in Pharmacology Using Large Language Models. 2024:96366. doi:10.52202/079017-3053
2. Woillard JB, de Winter BCM, Kamar N, Marquet P, Rostaing L, Rousseau A. Population pharmacokinetic model and Bayesian estimator for two tacrolimus formulations–twice daily Prograf and once daily Advagraf. Br J Clin Pharmacol. 2011;71(3):391-402. doi:10.1111/j.1365-2125.2010.03837.x

Reference: PAGE 34 (2026) Abstr 11937 [www.page-meeting.org/?abstract=11937]

Poster: Methodology – AI/Machine Learning