Martin Soucail 1,2, Romain Ferrara 2, Anne Rodallec 2, Julien Nicolas 3, François Riglet 1, Florence Gattacceca 2, Sylvain Fouliard 1, Sebastien Benzekry 2
1 Quantitative Pharmacology, Translational Medicine, Servier (Gif-sur-Yvette, France), 2 COMPutational pharmacology and clinical Oncology, Centre Inria d'Université Côte d'Azur, Cancer Research Center of Marseille (Marseille, France), 3 Institut Galien Paris Sud, Université Paris-Saclay (Paris, France)
BACKGROUND: Structural model selection for Tumor Dynamics (TD, or Tumor Growth Inhibition, TGI) is traditionally time-consuming and expert-driven, requiring iterative hypothesis formulation, ODE implementation, parameter estimation via nonlinear mixed-effects methods (e.g., Monolix), and assessment of statistical and biological plausibility. Automation approaches such as genetic algorithms [1] can accelerate exploration within predefined finite model spaces, but often struggle with complex dynamics, sparse datasets, or biologically unrealistic models. To address these challenges, we propose a knowledge-guided framework using LLMs, where agents iteratively generate, evaluate, and refine mechanistic hypotheses, aiming to accelerate model development while preserving interpretability, biological plausibility, and potentially uncovering novel model structures.
OBJECTIVE: To evaluate whether LLM agents can autonomously discover and refine population TD models from synthetic and real datasets, including untreated and treated tumor growth in preclinical and clinical contexts.
METHODS: Synthetic datasets, densely sampled from classical tumor growth models with published parameters [2], were used for structural equation discovery, focusing on the underlying ODE dynamics rather than study design. Real datasets [3,4] were analyzed separately to assess workflow on actual experimental data.
The workflow, inspired by HDTwinGen [5], is an agent-based iterative loop guided by pharmacometric principles:
1) A model-builder LLM agent proposes candidate ODE systems in MLXTRAN format, informed by minimal MLXTRAN syntax documentation and constrained by guidelines for biological plausibility, parsimony, and parameter interpretability.
2) Models are fitted locally using Monolix (SAEM), and diagnostic metrics (e.g., BICc, parameter RSEs, IWRES) are computed.
3) A diagnostic LLM interprets these metrics and suggests structural or parametric refinements, initiating the next iteration.
No confidential data are sent to the LLM, which only uses summary metrics for decision-making. Unlike purely algorithmic approaches relying on a single metric (e.g., BICc with a penalty reflecting typical pharmacometric model-selection choices [6,7]), this workflow evaluates models across multiple diagnostics, including parameter uncertainty, goodness-of-fit, biological plausibility, and parsimony.
RESULTS:
1) Synthetic datasets
Untreated preclinical TD: The workflow reliably recovered classical growth models (exponential, logistic, Gompertz, power law), often suggesting them from the first prompt (“Starting with simple exponential growth, then exploring more realistic models with carrying capacity (logistic), and models with changing growth rates”). For more complex cases (generalized logistic), the exact ground-truth was not recovered, but selected alternative showed comparable fit and mechanistic plausibility.
Treated clinical TD (Claret [2], double exponential [8], two-populations [9], Wang [10]): Exact original models were not consistently recovered, yet statistically robust alternatives were identified, sometimes with increased complexity compared to typical pharmacometric parsimony.
PKTD scenarios: Across six datasets (2 growth functions exponential/Gompertz combined with 3 drug-effects log-kill/Norton–Simon/Emax), the true model was recovered in 50% of cases. Deviations in the remainder were minor and preserved overall dynamics.
2) Real datasets
Untreated preclinical TD [3,4,11,12]: Two experimental systems were analyzed (LLC syngeneic mice [3] and human breast cancer xenografts [4]). The workflow mainly selected (Generalized) Logistic growth based on BIC and identifiability, though some biologically implausible parameters were observed (e.g., initial tumor size fivefold higher than injected cells, carrying capacity only 1,900 mm³ versus a typical 10,000 mm³), for which expert modelers would prefer Gompertz.
Treated preclinical PKTD [13]: Across three dose groups and administration routes of a novel subcutaneously injectable polymer prodrug, the workflow identified an Emax-type PK–PD relationship with accurate fit and identifiable parameters.
On average, ~30,000 tokens (≈ 0.10 US$ using Claude Sonnet 4.5) were consumed per experiment, with runtimes from <2 minutes (TD) to 20 minutes (dense PKTD) on a standard workstation (32 GB RAM, 6-core CPU, 12 threads at 3.20 GHz). CONCLUSIONS:This LLM-driven framework efficiently supports TD model development, frequently recovering true or structurally similar models across synthetic and real datasets. Convergence was rapid, token usage limited, and computational cost manageable, demonstrating practical feasibility. While exact recovery of complex reference models was not always achieved, the proposed structures remained statistically robust and biologically interpretable, providing a reproducible, tractable assistant rather than replacing expert judgment. Future work will explore additional tumor growth and PKTD models to assess robustness, evaluate potential bias toward commonly reported models, and test performance on less conventional structures. We also aim to extend the approach to covariate modeling with automated selection, enabling better capture of inter-individual variability. These steps will help define the limits and scalability of LLM-guided structural discovery in pharmacometrics. References: 1. Bies, R. R. et al. A Genetic Algorithm-Based, Hybrid Machine Learning Approach to Model Selection. J Pharmacokinet Pharmacodyn (2006). 2. Claret, L. et al. Model-based prediction of phase III overall survival in colorectal cancer on the basis of phase II tumor dynamics. J Clin Oncol (2009). 3. Benzekry, S. et al. Classical Mathematical Models for Description and Prediction of Experimental Tumor Growth. PLOS Computational Biology (2014). 4. Vaghi, C. et al. Population modeling of tumor growth curves and the reduced Gompertz model improve prediction of the age of experimental tumors. PLoS Comput Biol (2020). 5. Holt, S. et al. Automatically Learning Hybrid Digital Twins of Dynamical Systems. Preprint at https://doi.org/10.48550/arXiv.2410.23691 (2024). 6. Richardson, S. et al. A machine learning approach to population pharmacokinetic modelling automation. Commun Med (2025). 7. Pierre, M. et al. Implementation and comparison of seven algorithms for automated model selection in Monolix: two simulation studies and eleven applications. PAGE (2025). 8. Stein, W. D. et al. Tumor Growth Rates Derived from Data for Patients in a Clinical Trial Correlate Strongly with Patient Survival: A Novel Strategy for Evaluation of Clinical Trial Data. Oncologist (2008). 9. Chatterjee, M. et al. Population Pharmacokinetic/Pharmacodynamic Modeling of Tumor Size Dynamics in Pembrolizumab-Treated Advanced Melanoma. CPT: Pharmacometrics & Systems Pharmacology (2017). 10. Wang, Y. et al. Elucidation of relationship between tumor size and survival in non-small-cell lung cancer patients can aid early decision making in clinical drug development. Clin Pharmacol Ther (2009). 11. Benzekry, S. et al. Machine-learning and mechanistic modeling of metastatic breast cancer after neoadjuvant treatment. PLoS Comput Biol (2024). 12. Constantine Daskalakis. Tumor Growth Dataset. TSHS Resources Portal (2016). 13. Rodallec, A. et al. Model-Driven Scheduling of Nanocarriers: Application to an Anticancer Polymer Prodrug Administered Subcutaneously. Preprint under review at https://inria.hal.science/hal-04937053 (2025).
Reference: PAGE 34 (2026) Abstr 12130 [www.page-meeting.org/?abstract=12130]
Poster: Methodology – AI/Machine Learning