Christopher Rackauckas

GPU Acceleration of Pharmaceutical Simulations

Chris Rackauckas (1,2), Yingbo Ma (3), Vijay Ivaturi (2)

(1) Massachusetts Institute of Technology (2) University of Maryland, Baltimore, Center for Translational Medicine, (3) Julia Computing

Introduction:
GPUs have caused major advances in computational efficiency for many disciplines such as machine learning, but these same performance advantages have not been able to be realized by pharmacometricians. The Single Program Multiple Data (SPMD) paradigm of GPU computing is difficult to map over to simulation of stiff ordinary differential equations found in pharmacokinetic/pharmacodynamic (PK/PD), physiologically-based pharmacokinetic (PBPK), and QSP models because it requires the full GPU to do the same operation, something that has been optimized specifically for linear algebra kernels (like matrix multiplication) but is difficult to do on more general software. GPU acceleration approaches seen in traditional ODE solver software, such as SUNDIALS [1], utilize these specialized linear algebra kernels to allow for GPU acceleration, but are only accelerate applications with large ODEs (>1000).

Objectives:
– Allow users of the high-level Pumas pharmaceutical simulation software to take advantage of potential GPU acceleration on the systems that typically arise in the domain applications without having to write CUDA code.
– Integrate GPU acceleration into the standard estimation and analysis routines so that all parallelization issues are hidden to the user.

Methods: Automated program generation utilizing CUDAnative.jl [2] over the DifferentialEquations.jl [3] ODE solvers is utilized to generate viable GPU accelerated code for small ODEs (<200) in Pumas. A block-diagonal Jacobian representation with an optimized LU-factorization is utilized to optimize the implicit solving. Specialized GPU-based mapreduced kernels are implemented to reduce thread desync in event handling (dosing and mtime).

Results: Accelerations of 175x over an optimized C++ CVODE CPU code on highly stiff internal PKPD and QSP models of 10’s to 100’s of ODEs when generating trajectories with 10,000 different parameters is showcased on a Leucine model (unpublished) and on the Tewari-Beard model [4]. GPU-acceleration is shown in real-world contexts, allowing orders of magnitude speed improvements in NLME simulation and estimation, along with QSP applications to global sensitivity analysis and virtual populations calculations, by using multi-GPU cloud infrastructure for large-scale pharmacometric models.

Conclusion: This demonstrates an end-to-end automated GPU approach which allows for what was previously seen as an expensive parameter study to be reduced to quick a library call with hardware support.

References:
[1] A. C. Hindmarsh, P. N. Brown, K. E. Grant, S. L. Lee, R. Serban, D. E. Shumaker, and C. S. Woodward, “SUNDIALS: Suite of Nonlinear and Differential/Algebraic Equation Solvers,” ACM Transactions on Mathematical Software, 31(3), pp. 363-396, 2005. Also available as LLNL technical report UCRL-JP-200037.
[[2] Besard, Tim, Christophe Foket, and Bjorn De Sutter. “Effective extensible programming: unleashing Julia on GPUs.” IEEE Transactions on Parallel and Distributed Systems 30.4 (2018): 827-841.
[3] Rackauckas, C. and Nie, Q., 2017. DifferentialEquations.jl – A Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia. Journal of Open Research Software, 5(1), p.15. DOI: http://doi.org/10.5334/jors.151
[4] Tewari, Shivendra G., et al. “Dynamics of cross-bridge cycling, ATP hydrolysis, force generation, and deformation in cardiac muscle.” Journal of molecular and cellular cardiology 96 (2016): 11-25.

Reference: PAGE () Abstr 9242 [www.page-meeting.org/?abstract=9242]

Poster: Oral: Methodology - New Tools