Charles C Margossian (1), Lu Zhang (1), Sebastian Weber (2), Andrew Gelman (1)
(1) Department of Statistics, Columbia University, USA, (2) Novartis Pharma, Switzerland
Introduction: Doing full Bayesian inference on ODE-based pharmacometrics models is known to be computationally intensive, although the exact nature of computational bottle necks is not always well understood. Better diagnosing these challenges can help us develop more scalable solutions. We study the Hamiltonian Monte Carlo (HMC) sampler [1], a state-of-the-art and popular Markov chain Monte Carlo algorithm notably implemented in the open-source software Stan ([2], mc-stan.org), and apply it to a pharmacokinetic example with a nonlinear ODE. Our first observation is that the behavior of the model changes dramatically as HMC explores the parameter space, at times causing the underlying ODE to become stiff and difficult to solve. This is particularly problematic during the warmup phase of the algorithm, when rather extreme parameter constellations are being explored. However, once HMC converges to the region where the probability mass concentrates and its tuning parameters — the step size and the inverse metric — are well adapted, a less robust but faster solver can yield very good performance. We examine the use of the pathfinder, a prototypical optimization technique, which finds the relevant region in the parameter space and generates a fast approximation of the inverse metric. The pathfinder can, under certain conditions, replace various phases of HMC’s expansive warmup.
Objectives:
- Examine the computational cost of solving ODEs during various stages of HMC sampling, specifically the three warmup phases and the sampling phase,
- Measure the benefits of using a stiff ODE integrator during early HMC stages and then switching to a less robust, faster non-stiff ODE integrator
- Compare the pathfinder warmup to the traditional HMC warmup
Methods: We simulate data for a pharmacokinetic model with Michaelis-Menten clearance and build an individual and a population model. We fit these models using Stan with various combinations of the stiff backward differentiation (BDF) ODE integrator and the non-stiff Runge-Kutta 4th/5th order (RK45) integrator: notably uniformly using one integrator, or switching from BDF to RK45 after or during the warmup phase. We also apply two flavors of the pathfinder: (1) as a tool to quickly find the region where the probability mass concentrates and (2) as a method to estimate the inverse metric of the HMC sampler. For all these methods, the MCMC relaxation time for the (unnormalized) log posterior density is computed across multiple runs and used as a performance metric.
Results: Using the RK45 integrator yields unstable behavior, with some runs being very fast and others very slow. When doing parallel runs, the worst-case scenario unfortunately dominates the computation. The BDF integrator produces more stable behavior, especially during the warmup phase. Combining integrators works well for the individual model, less so for the population model: it is moreover difficult to predict which combination of solvers will work best. For the individual model, the pathfinder yields a dramatic speedup, because it is able to accurately estimate HMC’s inverse-metric. The pathfinder yields marginal benefits on the population model, where estimation of the inverse-metric is poor. We propose a diagnostic, based on the Pareto smoothed Importance Sampling [3], to assess the quality of the approximated inverse metric and find it to be, if somewhat conservative, reliable.
Conclusion: Doing full Bayesian inference requires us to solve ODEs for a broad range of configurations which, as we demonstrate, makes it difficult to tune the integrator. This is particularly challenging during the warmup phase of HMC. For this reason, techniques which can shorten and almost bypass the warmup phase, such as the pathfinder, strike us as a promising avenue of research, most notably if we can extend the method’s ability to find performant and cheap constructions of HMC’s inverse metric.
References:
[1] Betancourt M. (2017) A Conceptual Introduction to Hamiltonian Monte Carlo, arXiv:1710.02434
[2] Carpenter B, et al. (2017) Stan: A Probabilistic Programming Language, Journal of Statistical Software, 76, 1
[3] Vehtari A, et al. (2015) Pareto smoothed Importance sampling, arXiv.1507.02646
Reference: PAGE 29 (2021) Abstr 9872 [www.page-meeting.org/?abstract=9872]
Poster: Methodology - Estimation Methods