Performance of Neural Ordinary Differential Equations in a multiple dose pharmacometrics setting

M Ruppert1, M Noort, van1

1LAP&P Consultants B.V.

Introduction: Neural Ordinary Differential Equations (N-ODEs) model the dynamics of a system via neural nets. Given some initial state, they try to learn the differential equations by observing the state of the system at discrete points in future time. Since their conception [1], N-ODEs have been proven successful in the field of robotics, biology, meteorology and economics, among others. N-ODEs can be seen as residual neural networks [2] in continuous time, providing several benefits: the observed states do not have to be sampled at regular intervals, observed states can contain missing information and the system is continuously aware of its state which allows for the introduction of external changes to the system at arbitrary time points. While N-ODEs have been applied in the pharmacokinetic/pharmacodynamic (PK/PD) context [3-5], their performance has rarely been assessed systematically for PK/PD models with sparse, irregularly timed observations [6]. The influence of penalty functions is not well studied. We apply N-ODEs to describe the dynamics of synthetic data generated by a two-compartmental PK system with multiple dosing (MD). Objectives: •To evaluate the performance of N-ODEs with respect to fitting, interpolation and extrapolation of PK data with MD •To explore penalizing techniques to improve performance •To evaluate the benefit of imposing mass balance restrictions in the N-ODE system Methods: Six datasets were generated using a linear PK model with six intakes during a 48 hour observation period. The number of time points ranged between seven and 40, and had noiseless observations in depot, central and peripheral compartments. The data were fitted with an N-ODE model using various loss functions. Replicates per setting ranged from 10 to 90 (average 40). The N-ODE model consisted of three inputs, one hidden layer with eight neurons and ‘tanh’ activation function, and four outputs representing the flows of amounts. The loss functions used the sum of squared differences between prediction and observed data. On top of that, several penalties were added to evaluate their influence. The penalties included negativities of amounts, negativities of flows, the curvature of the ODE solution (to reduce unnecessary fluctuations of hidden states), and the sum of squares of the weights (L2-regularization). Performance of the fitted N-ODE model was evaluated in several ways: based on the observation error (square root of the mean squared error (RMSE) of the prediction of the observations); the hidden state error (continuous evaluation of the RMSE of the hidden state prediction); the generalization error (as the hidden state error but now during an unseen extension period of 144 hours with 19 intakes). Results: In our setting: •A small amount of L2 regularization penalty helped reduce observation error by 58% (for sparse data) to 5% (for dense data); the hidden state error was reduced by 90% (sparse) to 10% (dense); the generalization error reduction was between 91% (sparse) and 45% (dense); •Penalizing negativity of flows helped reduce the observation error (between 48% and 20%); hidden state error was reduced marginally by up to 35%; generalization error changed by -50% to +20%; •Penalizing negativity of amounts helped reduce the three errors for all except the densest dataset: 42% to 3%. For the densest dataset the errors increased by 30% to 50%; •Penalizing curvature did not help reduce the observation error, but the hidden state error was reduced by 6% to 44% for the sparse datasets. Generalization error reduced by 47% to 61% for the sparse datasets. For the dense datasets this penalty increased the error; •The number of iterations needed to converge was reduced when using L2 penalties (between 18% and 27%) and curvature penalties (between 11% and 22%); •Imposing mass balance restrictions helped convergence in combination with L2 and curvature penalties. Conclusion: Imposing extra penalties was helpful in fitting MD PK data when judged by the three error measures. Typically, sparse datasets benefitted more than dense datasets. L2 penalties seem most promising. Penalizing negative flows did not improve hidden state error and could even increase generalization error. Imposing mass balance restrictions is recommended to ease convergence. The optimal magnitudes of the penalties need to be found by exploration and are problem-specific. With appropriate penalties, long-time extrapolation was feasible with low error.

1. Chen RTQ, Rubanova Y, Bettencourt J, Duvenaud D (2018) Neural ordinary differential equations. arXiv: 1806.07366. https://doi.org/10.48550/arXiv.1806.07366 2. Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun; Deep Residual Learning for Image Recognition; 2015; https://arxiv.org/abs/1512.03385 3. Lu J, Deng K, Zhang X, Liu G, Guan Y (2021) Neural-ODE for pharmacokinetics modeling and its advantage to alternative machine learning models in predicting new dosing regimens. iScience 24:. https://doi.org/10.1016/j.isci.2021.102804 4. Bräm D, Nahum U, Schropp J et al. (2023) Low-dimensional neural ODEs and their application in pharmacokinetics. J Pharmacokinet Pharmacodyn: https://doi.org/10.1007/s10928-023-09886-4 5. Bräm DS, Steiert B, Pfister M, Steffens B, Koch G. Low-dimensional neural ordinary differential equations accounting for inter-individual variability implemented in Monolix and NONMEM. CPT Pharmacometrics Syst Pharmacol. 2025; 14: 5-16. doi:10.1002/psp4.13265 6. Losada IB, Terranova N. Bridging pharmacology and neural networks: A deep dive into neural ordinary differential equations. CPT Pharmacometrics Syst Pharmacol. 2024; 13: 1289-1296. doi:10.1002/psp4.13149

Reference: PAGE 33 (2025) Abstr 11653 [www.page-meeting.org/?abstract=11653]

Poster: Methodology – AI/Machine Learning

PDF poster / presentation (click to open)