NONMEM IMPLEMENTATION OF NEURAL NETWORKS AS FLEXIBLE MODELS OF HAZARD FUNCTIONS IN JOINT MODELS - PAGE Meeting (Population Approach Group Europe)

Piotr Juszczak ¹, Alberto Russu ²

1 Johnson & Johnson Innovative Medicine (, Switzerland), 2 Johnson & Johnson Innovative Medicine (, Italy)

Introduction
In recent years, machine learning (ML) models have gained popularity in pharmacometrics. However, these two quantitative fields differ markedly in methodology and assumptions regarding data.

In pharmacometrics, the most common datasets are hierarchical with repeated measures e.g., biomarker concentrations measured across multiple time points for the same subject, while ML typically assumes independent observations.

Pharmacometrics model parameters generally have physiological meaning (e.g., production rates), whereas most ML models are non-parametric, with parameters unrelated to underlying processes (e.g., nodes in neural networks or support vectors in support vector machines).

In ML, the emphasis is primarily on prediction accuracy, with interpretability often being secondary (black-box approach). In pharmacometrics, interpretability takes precedence. Nevertheless, ML methods can be valuable in certain pharmacometrics and biostatistical applications, such as modelling hazard functions in time-to-event (TTE) analysis [1-3], or in joint models, where the hazard function is estimated by an ML method and biomarker/drug concentrations by mixed-effects models.

When modelling time to a single event (no repeated events), parameters are often not linked to underlying biological processes. Usually, TTE are modelled by parametric models, e.g. Weibull, Gompertz, lognormal, etc. Using the same assumptions, a single neural network (NN) can accurately capture diverse hazard functions.

Objectives
This research aimed to:
1. Implement NNs in NONMEM.
2. Evaluate NNs performance in TTE models with variety of hazard function shapes: constant, increasing, decreasing, bathtub and lognormal.
3. Assess NNs performance in joint models with dropout modelled as an additional competing hazard function.
4. Compare NNs prediction accuracy with common parametric models: Exponential, Weibull, Gompertz, lognormal, etc.
5. Compare optimisation times between NNs and parametric models and determine NN complexity needed to model common hazard function shapes.

Results
Accuracy, complexity, and optimisation times for NNs and parametric models were evaluated using simulated hazard functions. Prediction accuracy was quantified using the concordance index (C-index) [4].

Comparisons in joint modelling were based on simulated data from a published PSA–survival model for prostate cancer [5]. A missing-at-random (MAR) dropout mechanism was introduced, allowing independent estimation and simulation of two competing hazard functions using both NNs and parametric models.

Optimisation times for NNs and parametric models were comparable. NNs with 2-3 input nodes modelled accurately the most common hazard function shapes observed in clinical trials, e.g. constant, increasing and decreasing. For more complex shapes, such as lognormal and bathtub, 4-5 nodes yielded higher prediction accuracy than parametric models.

NNs complexity was adjusted by enabling or disabling nodes using: max(threshold, THETA(1)*TIME+THETA(2)). However, these often cause optimisation issues in NONMEM. These issues could be mitigated via careful initialisation or alternatively log-scale parameterisation, and manual adjustment of the number of nodes in successive or parallel NONMEM runs until a statistically significant increase in the objective function value is observed.

NNs parameters are less intuitive for altering hazard function shapes when simulating variations of modelled hazard functions. On the other hand, a single NN structure (single NONMEM code) can model a wide range of hazard functions. NNs typically outperform parametric models when more complex hazard functions needed to be captured, more complex shapes of these hazard functions cannot be adequately captured by standard parametric models.

Conclusion
Neural networks can be effectively implemented in NONMEM for modelling hazard functions in time-to-event, joint models, and dropout scenarios. They provide comparable optimisation times to parametric models, require 2-5 nodes for commonly encountered, in clinical trials, hazard functions, and deliver better accuracy for complex hazard functions. For less interpretable yet highly flexible modelling, NNs offer an adaptable alternative to parametric hazard models particularly when hazard functions deviate from common functional forms.

References:
[1] Faghiri F, Kohansal A. Cox proportional hazards model with Bayesian neural network for survival prediction. Nature Scientific Reports, 15, 2025.
[2] Utkin L, Satyukov E, Konstantinov A. SurvNAM: The machine learning survival model explanation. Neural Networks, 147, 2022.
[3] Bräm D, Steiert B, Pfister M, Steffens B, Koch G. Low-dimensional neural ordinary differential equations accounting for inter-individual variability implemented in Monolix and NONMEM. Pharmacometrics & Systems Pharmacology, 2024.
[4] Uno H, Cai T, Pencina M, D’Agnostino R, Wei L. On the C-statistics for evaluating risk prediction procedures with censored survival data. Statistics in Medicine, 2011.
[5] Desmée S, Mentré F, Veyrat-Follet C, Sébastien B, Guedj J. Using the SAEM algorithm for mechanistic joint models characterizing nonlinear PSA kinetics and survival in prostate cancer. Biometrics, 73(1):305–312, 2017.

Reference: PAGE 34 (2026) Abstr 11964 [www.page-meeting.org/?abstract=11964]

Poster: Methodology – AI/Machine Learning