Dealing with stochasticity in precision dosing decision-making processes by fully exploiting PK-PD modelling in Reinforcement Learning algorithms. A practical case-study on Vancomycin continuous infusion in ICU patients

Alessandro De Carlo (1), Elena Maria Tosca (1), Paolo Magni (1)

(1) 1) Laboratory of Bioinformatics, Mathematical Modelling and Synthetic Biology (BMS Lab), Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy

Background: The integration of Reinforcement Learning (RL) with PK-PD models to solve precision dosing tasks has gained great momentum in pharmacometrics [1–3]. Q-Learning (QL) is the most used RL algorithm to this scope as it provides well-explainable dosing policies by leveraging a discrete representation of both patient condition and dose space[1,2,4]. PK-PD models are integrated as simulation engine within the QL framework to estimate, for each patient state-dose action couple (s,a), a utility function, Q(s,a)[1,2,4,5]. This quantifies the appropriateness of administering dose a when patient is in state s in order to achieve the targeted therapeutic goal. During the training Q(s,a) is updated with Bellman equation (sample update strategy) at every (s,a) occurrence [6] .

It is well assessed that the performances of this hybrid method decrease in presence of randomness in patient response which is represented, for example, by residual unexplained variability (RUV) [1]. In this case, RUV can negatively affect the estimation of Q(s,a) due to the sample update strategy implemented in QL algorithm [1,2].

Objectives: Here, a new QL/PK-PD modeling approach, exact QL (EQL), is presented to personalize pharmacological treatment in presence of stochasticity in patient response which is modeled as RUV. Vancomycin precision dosing problem in intensive care unit (ICU) patients was used as case study to illustrate and compare EQL performances with those of literature QL/PK-PD modeling approaches.

Methods: A literature popPK model [7] was used to describe Vancomycin concentration dynamics under continuous infusion regimen. The precision dosing problem was formalized considering a clinically realistic scenario for dose selections and concentration target range:

Patient state was described by combining a discretization of Vancomycin concentration range (inefficacy <15 mg/L, efficacy [15-25 mg/L], grade 1 toxicity (25-30] mg/L, grade 2 toxicity >30 mg/L) and the previous administered dose.
QL agent could customize loading, initial and maintaining dose levels.
Reward function was designed to remunerate dosing strategies bringing Vancomycin concentration to the efficacy range.
According to median ICU stay during Vancomycin treatment [8], a time frame of 14 days with daily pre-dosing PK measurement was assumed for the RL simulation.

PK model of a typical ideal patient was integrated within the QL framework. 3 different coefficients of variations (CVs) for RUV (10%, 20% and 30%) were considered in QL training and test procedures. PK model simulations were leveraged to perform an exact update of Q(s,a) in EQL using the full characterization of randomness in patient PK. Classic QL with sampling update (QLc) was used as benchmark. A QL agent was also trained neglecting RUV (QLdet) and then tested in presence of RUV to assess the effect of totally ignoring the stochastic behavior in patient response. Furthermore, EQL robustness with respect to RUV misspecification was assessed by testing the EQL agents in scenarios with higher/lower CV than that used in training.

Results: For each CV, EQL outperformed QLc agents by maximizing the probability distribution of the total discounted rewards. Furthermore, EQL reached the highest probability of observing Vancomycin concentration in the target range at each endpoint. Conversely, QLc agents learned suboptimal dosing strategies thus confirming the limits of sample updates. Performances became dramatically poor when RUV was ignored in training. Indeed, QLdet agent was unable to suggest any dosing strategy in some states as they were not explored during training. Interestingly, the proposed EQL showed good robustness in presence of RUV misspecifications. When the absolute relative difference between training and test CV was 0.5, EQL agent performed very similarly to the case in which RUV was known. Differently, higher RUV under/over estimations could significantly lower EQL performances.

Conclusion: EQL showed interesting results in managing stochastic behaviors in pharmacological treatments compared with classical approaches. Further evaluations of the methodology will be performed by considering other sources of uncertainty, for example due to parameter estimation uncertainty during treatment monitoring.

References:

Ribba B, Bräm DS, Baverel PG et al. Model enhanced reinforcement learning to enable precision dosing: A theoretical case study with dosing of propofol. CPT: Pharmacometrics & Systems Pharmacology 2022;11:1497–510.
De Carlo A, Tosca EM, Fantozzi M et al. Reinforcement Learning and PK-PD Models Integration to Personalize the Adaptive Dosing Protocol of Erdafitinib in Patients with Metastatic Urothelial Carcinoma. Clinical Pharmacology & Therapeutics 2024;n/a, DOI: 10.1002/cpt.3176.
Ribba B, Dudal S, Lavé T et al. Model-Informed Artificial Intelligence: Reinforcement Learning for Precision Dosing. Clinical Pharmacology & Therapeutics 2020;107:853–7.
De Carlo, Alessandro, Tosca, Elena Maria, Magni, Paolo. Integrating Reinforcement Learning and PK-PD modelling to enable precision dosing: a multi-objective optimization for the treatment of Polycithemia Vera patients with Givinostat. PAGE 31.
Yauney G, Shah P. Reinforcement Learning with Action-Derived Rewards for Chemotherapy and Clinical Trial Dosing Regimen Selection. In: Doshi-Velez F, Fackler J, Jung K, et al. (eds.). Proceedings of the 3rd Machine Learning for Healthcare Conference. Vol 85. PMLR, 2018, 161–226.
Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT press, 2018.
Roberts JA, Taccone FS, Udy AA et al. Vancomycin dosing in critically ill patients: robust methods for improved continuous-infusion regimens. Antimicrob Agents Chemother 2011;55:2704–9.
Tafelski S, Nachtigall I, Troeger U et al. Observational clinical study on the effects of different dosing regimens on vancomycin target levels in critically ill patients: Continuous versus intermittent application. J Infect Public Health 2015;8:355–63.

Reference: PAGE 32 (2024) Abstr 10937 [www.page-meeting.org/?abstract=10937]

Poster: Methodology – AI/Machine Learning