Early go/no-go decisions in clinical trials using a DeepNLME joint tumor growth dynamics and overall survival model - PAGE Meeting (Population Approach Group Europe)

Lorenzo Contento ¹, Kiyoto A. Tanemura ², Lu Chen ², Mohamed Tarek ¹, Amit Roy ¹, Chuanpu Hu ², Anna G. Kondic ²

1 Pumas-AI Inc. (Dover, United States of America), 2 Bristol-Myers Squibb (Lawrenceville, United States of America)

Introduction/Objectives: Overall survival (OS) endpoints in oncology clinical trials require prolonged follow-up, delaying critical go/no-go decisions. Early tumor growth dynamics (TGD) data, such as sum of longest diameters (SLD), become available much sooner and may inform survival predictions. However, traditional summary measures such as objective response rate (ORR) only capture limited features of TGD trajectories. Nonlinear mixed-effects (NLME) modelling can characterize these trajectories more thoroughly, especially via DeepNLME, its combination with scientific machine learning (SciML).
We present a framework for earlier interim go/no-go decisions in randomized clinical trials:
1. Develop a joint TGD-OS model, treatment-agnostic but specialized to the tumor type, using historical data.
2. Apply the model to early data from a new trial to predict population survival curves for the treatment and control arms.
3. Compare the predicted curves (e.g., using scalar statistics) to assess which treatment is superior (different decision thresholds can be chosen to achieve desired error rates).
Since both predictions come from the same treatment-agnostic model, any systematic prediction bias affecting both arms similarly should cancel out. Furthermore, since the trial is randomized, there is no expected covariate shift between the arms, enabling a direct comparison.

Methods: The TGD model is based on a universal differential equation (UDE), a SciML approach where the response’s time derivative is defined by a neural network (NN) with domain-informed constraints (e.g., enforcing positivity of SLD). Including subject-specific random effects as NN inputs makes the model an NLME model, effectively mapping the space of possible TGD trajectories into a lower-dimensional space. A second NN predicts individualized distributions for the random effects from subject covariates (with subject-dependent mean and subject-independent variance). This second NN reduces predictive uncertainty, especially when very few TGD observations are available, by exploiting covariate information. A log-logistic proportional hazards OS model is fitted using the subject’s covariates and the SLD trajectories predicted by the TGD model using the subject’s empirical Bayes estimates (EBEs), together with derived quantities such as the SLD time derivative.
The models were trained on SLD and OS data from 7 non-small cell lung cancer (NSCLC) trials and validated on 2 held-out trials: a successful one (treatment better than control) and an unsuccessful one. For each test trial, 1000 synthetic early trials were generated via discrete event simulation using inter-arrival time distributions fitted to the real enrollment patterns. The criterion for stopping each trial uses a minimum follow-up duration and a minimum number of subjects with that follow-up. The resulting trial also contains several subjects with shorter follow-up, and some subjects may have follow-up well beyond the minimum.
For each synthetic trial, EBEs were computed from the available early data, and the resulting TGD trajectories were fed into the OS model to predict the survival distribution of each subject. Per-subject survival curves were averaged to obtain population survival curves for each arm. Summary statistics (such as median survival and milestone survival) were extracted from each curve. A score was defined as the difference in a given statistic between treatment and control arms, with higher values indicating treatment benefit. This score yields a binary classifier for trial success once a threshold is chosen; the threshold can be calibrated on trials with known outcomes by studying how error rates vary.

Results: Using the difference in median survival as the score and a threshold of zero, the classifier achieved a true positive rate (sensitivity) of 79.2% under the least stringent stopping criterion (5 subjects, 90 days follow-up), rising to 94.3% under the most stringent (15 subjects, 180 days). Milestone survival gave consistent results for sensitivity: 70.8-86.1% (6-month), 80.6-94.7% (12-month), and 80.3-94.3% (18-month).
The false positive rate was at most 1.5% with 90 days of follow-up and 0.4% with 180 days, regardless of the statistic chosen.

Conclusions: The proposed framework enables reliable early assessment of treatment efficacy using limited interim TGD and survival data. Low error rates were achieved on held-out trials even in the lowest data scenario. The treatment-agnostic design allows a model trained on historical data to be applied to new treatments without retraining, while the NLME-SciML approach leverages NN expressiveness to capture the diversity of TGD trajectories.

Reference: PAGE 34 (2026) Abstr 12022 [www.page-meeting.org/?abstract=12022]

Poster: Oral: Methodology - New Tools