Corinna Maier

Linking Bayesian data assimilation and reinforcement learning for model-informed precision dosing in oncology

C. Maier (1,2), N. Hartung (1), C. Kloft (3), J. de Wiljes (1,4), and W. Huisinga (1)

(1) Institute of Mathematics, University of Potsdam, Potsdam, Germany (2) Graduate Research Training Program PharMetrX: Pharmacometrics and Computational Disease Modeling, Freie Universität Berlin, University of Potsdam, Berlin, Potsdam, Germany (3) Department of Clinical Pharmacy and Biochemistry, Institute of Pharmacy, Freie Universität Berlin, Berlin, Germany (4) Department of Mathematics and Statistics, University of Reading, Whiteknights, UK

Introduction:

Therapy individualization using therapeutic drug/biomarker monitoring could substantially improve efficacy and safety of drug therapies [1]. Current strategies for dose adaptation include model-informed dosing tables [2] or are based on maximum a-posteriori (MAP) estimates [3]. These approaches, however, do not fully exploit patient-specific information and lack a quantification of uncertainty. We investigate how Bayesian data assimilation (DA), reinforcement learning (RL) as well as the combination of both, can be used to achieve the full potential of therapy individualization.

Methods:

In a simulation study, we compared these novel frameworks with existing approaches to control neutropenia, the most frequent dose-limiting side effect in multiple cycle treatment of 3-weekly paclitaxel. Since grade 4 neutropenia exposes patients to life-threatening infections and absence of neutropenia (grade 0) has been associated with reduced median survival [4], we defined an optimal dosing strategy to avoid grades 0 and 4, and target grades 1-3.

Through individualized uncertainty quantification and propagation, DA enables comprehensive individual forecasting of the therapeutic outcome [5], which allows both safety and efficacy to be integrated into an objective function determining an optimal dose. Yet the integration of individualized uncertainties requires sample approximations of the risk of subtherapeutic/toxic ranges, which makes the optimization problem hard and costly to solve.  

In model-based RL, model simulations are used to learn how to act best in a given situation with respect to a pre-defined evaluation function (reward). It has been shown that RL can provide an efficient framework for therapy optimization allowing for complex dosing regimens and patient state combinations [6,7]. Computations are performed beforehand and the resulting dosing policy (strategy) works like a static look-up table based on a-priori uncertainties using the assessed patient state as input. A major concern is the lack of interpretability, as the dosing policy is provided as output without a clear understanding of which factors of the patient state influence the dose selection in which manner.

To address the raised issues of the single approaches, we propose a combined approach (DA-RL-guided dosing), which allows to improve and individualize not only the parameters of the pharmacokinetic/pharmacodynamics model of the drug under consideration (DA) but also the dosing policy (RL) in the course of treatment. Based on the prior parameter distribution a dosing policy is learned beforehand (offline) and refined online based on the posterior distribution in relevant and promising state-dose combinations.

Results: Compared to traditional approaches (table/MAP-guided dosing), we found that the proposed approaches, DA-, RL- and DA-RL-guided dosing, substantially reduced the incidence of life-threatening grade 4 as well as subtherapeutic grade 0 neutropenia. In particular, the occurrence of grades 0/4 decreased substantially in later cycles in dosing strategies using individualized uncertainties. To address the interpretability issues of RL dosing policies, we proposed a stratified visualization that allowed the identification of covariates that drive dose decisions. This highlighted again the importance of baseline neutrophil counts in the optimal dose selection compared to sex or age. Thus, the combined DA-RL approach brings together many advantages of the single approaches: (i) it efficiently allocates resources between offline and online computation, (ii) it profits from an improved model-derived patient state description (posterior expectation), (iii) the full information provided by the posterior (individualized uncertainties) is used to update the dosing policy, and (iv) it also allows for the identification of relevant covariates by stratifying the expected outcomes.

Conclusion:

By investigating novel dosing strategies using DA and/or RL to control neutropenia, we found that the combination (DA-RL-guided dosing) provides the best of both worlds and constitutes a comprehensive and efficient approach to well-informed therapy individualization. Due to its flexibility, DA-RL guided dosing can easily be extended to integrate multiple surrogates or endpoints, e.g., tumor growth, survival, or patient-reported outcomes, thereby promising important benefits for future individualized therapies.

References:
[1] Darwich et al. (2017) Clin. Pharmacol. Ther.
[2] Joerger et al. (2016) Ann. Oncol.
[3] Wallin et al. (2010) Basic Clin. Pharmacol. Toxicol.
[4] Di Maio et al. (2006) Nat. Clin. Pract. Oncol.
[5] Maier et al. (2020) CPT Pharmacometrics Systems Pharmacol
[6] Yauney & Shah (2018) J. Mach. Learn. Res.
[7] Bartolucci et al. (2019) Page 28 Abstr 9148

Reference: PAGE () Abstr 9370 [www.page-meeting.org/?abstract=9370]

Poster: Oral: Methodology - New Tools