Dr. Mark Sale1, Dr. Rong Chen1, James Craig1, Michael Tomashevskiy1, Alex Mazur1, Shuhua Hu1, Dr. Mike Dunlavey1, Dr. Bob Leary1, Dr. Keith Nieforth1, Rik de Greef1
1Certara
Objectives: Compare two alternative approaches to parameter estimation in the first order conditional with interaction (FOCE-I) method mixed effect non-linear regression for efficiency and robustness. Methods: Typically, in the FOCE-I, finite difference (FD) is used to approximate the gradient. FD is a numerical approximation to the gradient, accomplished by calculating the objective function value at multiple points. Almquist [1] first described using sensitivity equations (SE) and reported improved speed in NONMEM. We understand that this method is currently included with the NONMEM FAST option. An alternative to SE is automatic differentiation using dual numbers (AD) [2]. This method is widely used in training neural networks [3]. We have implemented AD for parameter optimization (ADPO). We present a comparison of the effect of these two exact methods of calculating the gradient in FOCE-I. The NONMEM FAST options vs NONMEM without FAST was used to evaluate the effect of SE and NLME with ADPO vs NLME without ADPO was used to evaluate the effect of ADPO. 72 models were constructed using pyDarwin [4]. The models included 1,2 and 3 compartments, with Michaelis-Menten elimination and were specified by ordinary differential equations (ODE). Other options included an estimated exponent on Km in the Michaelis-Menten expression, the central volume independent of or an allometric function of weight, and various OMEGA structures, up to 4 diagonal elements and various non-diagonal structures. All models included first order absorption. The data set included 50 subjects with 8 samples each. The DVERK ODE solver was used for all evaluations. The data set construction and parameter values were designed such that the models ranged from well-identified to poorly-identified, as an attempt to represent the real world model selection process. The primary metrics of algorithm performance are execution time for estimation and the covariance step. Secondary endpoints include the fraction that had successful convergence and the fraction with successful covariance step. Results: Below are given tables of percent change in execution speed (100*(FD – SE|ADPO)/FD) by a measure of model complexity (N theta= number of estimated THETA elements in the model). N is the number of models included. Only models that completed in < 6 hours without unrecoverable numerical problems were included. Estimation and Covariance: FD vs SE N theta N Estimation % faster Covariance % faster (mean (95% range) (mean (95% range) 4 6 -46.9 (-113,47.7) 70.2 (56.5,88.1) 5 12 13.8 (-73.1,86.3) -141 (-1065,89.4) 6 12 -44.9 (-372,51) 66.8 (20.7,96.2) 7 7 -62.5 (-426,75.1) 15.7 (-144,88.4) 8 10 -88.7 (-310,17.4) 62.7 (12.9,92.3) 9 9 -168 (-630,13.7) -4350 (-22913,89.9) 10 2 43.5 (33.7,53.2) 87.8 (83.3,92.3) FD vs ADPO N thetaNEstimation % fasterCovariance % faster (mean (95% range) (mean (95% range) 4614.3 (-18,52.7)18.3 (0.8,29.4) 51210.5 (-50.1,38.7)17.0 (-2.70,31.9) 61248.0 (1.80,89.8)43.2 (-4.90,93.2) 71279.4 (40.4,97.1)66.7 (18.9,95.0) 81228.3 (-38.0,72.5)15.5 (-32.1,61.6) 91211.3 (-27.2,46.8)18.4 (-20.1,62.7) 10615.1 (-17.5,58.2)12.0 (-7.5,27.8) Below are given tables of the % of models that converged (Success) and had a success covariance step (Covariance) for FD vs SE and FD vs ADPO FD vs SE N thetaSuccessSuccess Covariance Covariance FD SEFDSE 48310083100 575835092 675754267 757865743 8901006050 956785611 1050100050 FD vs ADPO N thetaSuccessSuccess Covariance Covariance FD ADPOFDADPO 4100100100100 51009210092 610010010092 7100100100100 8100100100100 9100100100100 10100100100100 Below is given a table for Percent change in total time of execution for estimation and covariance for all models that completed, FD vs SE or ADPO – Estimation CovarianceEstimation Covariance SESEADPOADPO -11.166.466.962.6 Conclusions: In this limited sample of 1, 2 and 3 compartment ODE models across a range of model complexity, both SE and ADPO improve the execution speed of the covariance step by approximately 3 fold (~66% reduction in time). ADPO also improved the execution time for estimation by approximately 3 fold. SE may have improved the likelihood of successful convergence but not of a successful covariance step. ADPO had no effect on the likelihood of successful convergence of a successful covariance step.
1. Almquist J, Leander J, Jirstrand M. Using sensitivity equations for computing gradients of the FOCE and FOCEI approximations to the population likelihood. J Pharmacokinet Pharmacodyn. 2015 Jun;42(3):191-209. doi: 10.1007/s10928-015-9409-1. Epub 2015 Mar 24. PMID: 25801663; PMCID: PMC4432110. 2. Neidinger, Richard D. (2010). “Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming” (PDF). SIAM Review. 52 (3): 545–563. CiteSeerX 10.1.1.362.6580. doi:10.1137/080743627. S2CID 17134969. 3. A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Automatic differentiation in machine learning: a survey (2018), arXiv:1502.05767 [cs.SC]. 4. https://certara.github.io/pyDarwin/html/index.html. Accessed 5 March, 2025.
Reference: PAGE 33 (2025) Abstr 11394 [www.page-meeting.org/?abstract=11394]
Poster: Methodology - Estimation Methods