III-053

ADPO: Automatic-differentiation-assisted parametric optimization algorithm

Keith Nieforth1, Rong Chen1, Mark Sale1, Alex Mazur1, Michael Tomashevskiy1, Shuhua Hu, James Craig1, Mike Dunlavey1, Robert Leary1

1Certara

Introduction: Automatic differentiation (AD) is a widely used method for calculating gradients in neural networks (NN) [1]. The traditional method for gradient calculation in pharmacometrics (PMX) software is finite difference (FD). FD requires a nontrivial step size to calculate the gradients and suffers from truncation and round-off errors. First order conditional estimation (FOCE) algorithm [2] relies on gradient calculations as it uses the gradient-based Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton method [3]. Gradients obtained by AD are exact (to machine precision). We hypothesize that the performance of FOCE can be improved by replacing FD with AD. Objective: To examine the performance of AD-based gradient calculation in the FOCE Extended Least Squares (FOCE ELS) algorithm in Phoenix NLME [4] in comparison to FD. Methods: The FOCE algorithm contains two steps: •Inner loop: given estimates of the population residual error and fixed and random effect parameters, optimal empirical Bayesian estimates (EBE) of the random effect vector ETA for each subject are calculated. •Outer loop: given the current EBE estimate for each subject, the fixed effect parameter vector THETA, the random effect parameter matrix OMEGA, and residual error parameters SIGMA are updated. The algorithm iterates between these steps until the data log-likelihood converges to a local maximum. As the calculation is done for each individual in the inner loop, the FD gradient calculation is the most CPU intensive part. This was supported by random-pause sampling experimentation [5]. To overcome this bottleneck, we implemented AD for the inner loop in FOCE as ADPO. When evaluating the value of a function (e.g., the joint log-likelihood for each subject) in the inner loop, ADPO’s dual number approach will promote each of the relevant ETA variables to its corresponding dual number type variable by including a virtual infinitesimal vector. The computation rules between dual numbers are derived from Taylor expansion, up to the first order of the virtual infinitesimal vector. In ADPO the function value will be presented by a dual number, whose real part is the function value, and whose dual vector part represents the gradients of the function for each of the ETA variables. Thus, the function value and its exact gradients are obtained simultaneously and subsequently used in the inner loop. Results: We examine the performance of AD vs FD in a two-compartment model with first-order clearance. All subjects receive an oral bolus at time 0, followed by an intravenous (IV) bolus at time 72. Absorption was by first-order kinetics. The model contains 6 fixed effect parameters (THETA), tvV, tvCl, tvKa, tvV2, tvCl2, tvlogitF, and 4 random effects parameters (the diagonal elements in the OMEGA matrix), nV, nCl, nKa, nlogitF, and 1 residual error parameter (SIGMA), CEps. The data set contains 100 subjects, each with 16 observations. It was simulated from the following parameters, tvV=4.5, tvCl=1.6, tvKa=0.8, tvV2=10.5, tvCl2=0.6, tvlogitF=-1.6, nV=nCl= nKa=nlogitF=0.09, CEps=0.12. We then compared FOCE-ADPO on an Amazon Workspace (AWS) Windows machine with 4 CPU cores running at 2.4 Ghz and 32 GB memory. For both the FOCE and the ADPO runs, default run options in Phoenix NLME are used, and they begin with the same initial parameters which were perturbed slightly from the simulation parameters. We find that: •FOCE-FD finished in 23.17 seconds with 75 iterations in the outer loop. Its inner loop took 258321 BFGS iterations and 259004 line searches. Parameter estimations are, tvV=4.394, tvCl=1.612, tvKa=0.807, tvV2=10.280, tvCl2=0.599, tvlogitF=-1.635. nV=0.080, nCl=0.082, nKa=0.0997, nlogitF=0.086, and CEps=0.120. •FOCE-ADPO finished in 14.56 seconds (34% reduction) with 71 iterations in the outer loop. Its inner loop took 177150 BFGS iterations (32% reduction) and 177264 line searches (32% reduction). Parameter estimations are, tvV=4.393, tvCl=1.613, tvKa=0.807, tvV2=10.28, tvCl2=0.598, tvlogitF=-1.635. nV=0.080, nCl=0.083, nKa=0.998, nlogitF=0.086, and CEps=0.120. The estimated parameters from FOCE FD and FOCE ADPO are similar to the simulation parameters, and both converge. Conclusion: We conclude that ADPO may improve the performance of FOCE, without sacrificing robustness and accuracy. Implementing AD in the outer loop may be a possible direction to further improve ADPO’s performance.

 Poster Session III, Ιουν?ου 5, 2025, 9:50 πμ – 11:45 πμ 1.         Baydin, A.G., et al., Automatic differentiation in machine learning: a survey. Journal of machine learning research, 2018. 18(153): p. 1-43. 2.         Wang, Y., Derivation of various NONMEM estimation methods. Journal of Pharmacokinetics and pharmacodynamics, 2007. 34(5): p. 575-593. 3.         Dennis Jr, J.E. and R.B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations. 1996: SIAM. 4.         Certara. Phoenix NLME. 2025; Available from: https://www.certara.com/software/pkpd-modeling-and-simulation/phoenix-nlme/. 5.         Dunlavey, M., Performance tuning with instruction-level cost derived from call-stack sampling. SIGPLAN Not., 2007. 42(8): p. 4–8. 

Reference: PAGE 33 (2025) Abstr 11595 [www.page-meeting.org/?abstract=11595]

Poster: Methodology - Estimation Methods

PDF poster / presentation (click to open)