Stan: an open source probabilistic programming language for high-performance statistical computing

Eric Novik, Daniel Lee

Generable Inc.

Introduction:

Stan (http://mc-stan.org/) is an open source platform for statistical modeling and high-performance statistical computation [1]. Users rely on Stan for statistical modeling, data analysis, and prediction in the biological, social, and physical sciences, engineering, and business.Users specify log density functions in Stan’s probabilistic programming language and get: full Bayesian statistical inference with MCMC sampling (NUTS, HMC)approximate Bayesian inference with variational inference (ADVI)penalized maximum likelihood estimation with optimization (L-BFGS)Stan’s math library provides differentiable probability functions and linear algebra (C++ autodiff). Additional R packages provide expression-based linear modeling, posterior visualization, and leave-one-out cross-validation.

Objectives:

Ability to express arbitrarily complex statistical models including models with ODEs
Stan’s implementation of Hamiltonian Monte Carlo and its ability to efficiently sample from high dimensional distributions making comparisons to Random Walk Metropolis and Gibbs sampling as well as Penalized Maximum Likelihood estimation

Methods:

Most people use Stan for specifying statistical models in the Stan language and fitting them with full Bayesian inference using Stan’s implementation of Hamiltonian Monte Carlo [2]. In drug development, Stan is being used in Pharmacometrics where the ODE solvers exposed in the Stan language are used to express the model for the conditional mean of PK PD models. See, for example, Weber et al. [3].

Stan’s language offers the users a high level interface for specifying statistical and mechanistic models. Stan is a turing complete language, which means that Stan is able to compute any computable function. This offers the level of (almost) arbitrary flexibility allowing users to express, for example, deep hierarchical models (mixed effects) with site specific effects and ODE based patient level effects, and so on.

Stan contains a very efficient implementation of Hamiltonian Monte Carlo (HMC) which allows it to sample from analytically intractable, non-conjugate posterior distributions with complex geometry in high-dimensional parameter spaces. For instance, we fit Stan models with over a million parameters. HMC with NUTS is a modern sampling algorithm which relies on the geometry of the posterior distribution (gradients) to guide the sampling process and scales wells with the number of parameters.

Results:

Stan’s performance has been evaluated in extensive simulations and on the real world datasets in many industries including Pharma and drug development. In the original NUTS paper [2] Section 4 empirical evaluations are reported. The efficiency is evaluated in terms of effective sample size (ESS) normalized by the number of gradient evaluations. In total, we ran 3,200 experiments with HMC and 600 experiments with HMC-NUTS. ESS of a sample is a measure of how many independent samples a set of correlated samples is worth for the purposes of estimating the mean of some function; a more efficient sampler will give a larger ESS for less computation. The algorithm was evaluated on 250-dimensional multivariate normal (MVN) with highly correlated posterior distribution comparing HMC-NUTS (Stan’s default), Random Walk Metropolis (RWM), and Gibbs sampling. NUTS was run with δ = 0.5 for 2,000 iterations, with the first 1,000 iterations being used as burn-in and to adapt. This required about 1,000,000 gradient and likelihood evaluations in total. We ran RWM for 1,000,000 iterations with an isotropic normal proposal distribution. The cost per iteration of RWM is effectively identical to the cost per gradient evaluation of NUTS, and the two algorithms ran for about the same amount of time. Weran Gibbs sampling for 1,000,000 sweeps over the 250 parameters. In summary, RWM has barely begun to explore the posterior space. Gibbs does better, but still has left parts of the space unexplored. NUTS, on the other hand, is able to generate many effectively independent samples.

Conclusions:

Stan is a relatively new programming language but it has demonstrated good performance compared with previous generation of probabilistic programming languages like BUGS and JAGS. In particular Stan scales well in high dimensions, does not require conjugacy, offers ODE solvers, and allows for specification of arbitrary complex statistical and mechanistic models often found in Pharmacometrics.

References:
[1] Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76(1). DOI 10.18637/jss.v076.i01
[2] Matthew D. Hoffman, Andrew Gelman. 2011. The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. https://arxiv.org/abs/1111.4246
[3] Sebastian Weber, Andrew Gelman, Daniel Lee, Michael Betancourt, Aki Vehtari, and Amy Racine-Poon. Bayesian aggregation of average data: an application in drug development. Annals of Applied Statistics (Submitted).

Reference: PAGE 27 (2018) Abstr 8760 [www.page-meeting.org/?abstract=8760]

Poster: Methodology - Estimation Methods

PDF poster / presentation (click to open)