MODEL-INFORMED REINFORCEMENT LEARNING FOR PRECISION DOSING OF CARBOPLATIN - PAGE Meeting (Population Approach Group Europe)

Ilaria Gaiffi ¹, Elena Maria Tosca ¹, Paolo Magni ¹

1 Laboratory of Bioinformatics, Mathematical Modelling and Synthetic Biology (BMS lab), Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy (Pavia, Italy)

Background:
Precision dosing refers to tailoring drug doses to individual patient characteristics to maximize the efficacy-safety balance [1]. Adaptive dosing extends precision dosing to repeated treatments requiring dose adjustments to account for evolution of patient conditions, typically represented through efficacy and toxicity biomarkers [2, 3].
Model-Informed Precision Dosing (MIPD) integrates population PK-PD models and Bayesian estimation techniques to generate patient-specific digital-twins, enabling simulation of responses to alternative dosing strategies and optimization of treatment regimens [4, 5]. Given that adaptive dosing might require evaluating numerous candidate strategies, recent research has used Reinforcement Learning (RL), a branch of Machine Learning, to identify the optimal strategies. RL addresses sequential decision-making problems and is therefore well suited to adaptive dosing, which requires dose adjustment based on periodic patient monitoring [3, 6].
Objective:
This hybrid framework combining PK-PD modeling and RL was evaluated to optimize the dosage of carboplatin, a chemotherapeutic drug, in lung cancer patients. Due to its cytotoxicity, carboplatin induces myelosuppression affecting platelets (PLT) and neutrophils (NT), often requiring dose reductions or treatment interruptions. The model-informed RL framework was, thus, developed to identify the optimal protocol to administer the highest possible dose without reaching severe PLT or NT toxicity.
Methods:
The hybrid PK-PD RL framework combined a QL algorithm with a previously developed PK-PD model describing carboplatin-induced myelosuppression on PLT and NT [7].
The QL elements were based on clinical evidence for the carboplatin-based treatment:
– Actions: dose adjustments expressed as fractions of the standard carboplatin dose calculated for each patient.
– States: defined by the maximum toxicity grade reached at the end of each cycle and the previously administered dose.
– Reward: a function incorporating PLT levels, NT levels and administered dose.
To apply QL, virtual patients were generated. For each patient, individual parameters and covariates were extracted to create digital twins for agent training. Two virtual patients populations were considered, with 120 and 41 subjects.
First, a single QL agent was trained on the population of 120 virtual patients with the aim of identifying a general dosing protocol applicable to the entire cohort.
Next, for each patient in the 41-patient population, an individual QL agent was trained. The objective was to train a distinct personalized agent for each patient, so that the treatment protocol could be optimized individually rather than applying a single strategy across subjects.
Finally, a Bayesian updating approach was implemented to better reflect clinical practice, where individual PK-PD parameters are not fully known at baseline but become available over time. The process begins with the population agent selecting the first dose. Using new simulated measurements, individual PK-PD parameters are estimated and used to train an individual QL agent for the next dose selection. This procedure is iteratively repeated to progressively personalize treatment.
To facilitate the clinical applicability of this approach, a graphical user interface (GUI) was developed to enable clinicians to simulate the effects of the administered dose on PLT and NT levels, allowing comparison between standard clinical protocol and QL strategies.
Results:
The QL algorithm was generally able to select the highest feasible drug doses while avoiding severe PLT and NT toxicity. To ensure a fair comparison, the three protocols were evaluated on the same cohort of 41 virtual patients by simulating five 21-days treatment cycles.
Overall, the individualized agent achieved the best balance between minimizing severe toxicity and delivering high doses, with 98 grade 4 toxicity episodes and a mean of 560.2 mg per administration. The Bayesian adaptive approach showed intermediate performance, while the population-level agent performed less effectively, with 286 toxicity episodes and a mean dose of 576.3 mg per administration.
The developed GUI allows users to easily visualize the effects of administered doses on the patient’s digital twin, thereby facilitating comparison across alternative dosing strategies.
Conclusion:
The proposed integration of MIPD and RL showed the potential to optimize carboplatin dosing by maximizing the administered dose while maintaining acceptable toxicity levels. The better performance of the individualized agent underscores the value of patient-specific modeling, whereas the Bayesian approach supports the feasibility of adaptive parameter updating in realistic dosing.
The implementation of a clinician-oriented GUI enhances the potential of the proposed framework by providing usability and integration into clinical decision-making.

References:
[1] Peck, R.W., Precision Dosing: An Industry Perspective. Clin. Pharmacol. Ther. 2021; 109: 47-50.
[2] Chakraborty B, Murphy SA. Dynamic Treatment Regimes. Annu Rev Stat Appl. 2014;1:447-464.
[3] Tosca EM, De Carlo A, Ronchi D, Magni P. Model-Informed Reinforcement Learning for Enabling Precision Dosing Via Adaptive Dosing. Clin Pharmacol Ther. 2024;116(3):619-636.
[4] Maier C, de Wiljes J, Hartung N, Kloft C, Huisinga W. A continued learning approach for model-informed precision dosing: Updating models in clinical practice. CPT Pharmacometrics Syst Pharmacol. 2022;11(2):185-198.
[5] Keizer RJ, ter Heine R, Frymoyer A et al. Model-Informed Precision Dosing at the Bedside: Scientific Challenges and Opportunities. CPT Pharmacometrics Syst Pharmacol.2018;7:785–7.
[6] Ribba B.Reinforcement learning as an innovative model-based approach: Examples from precision dosing, digital health and computational psychiatry. Front. Pharmacol. 2023;13:1094281.
[7] PAGE 32 (2024) Abstr 10981 [www.page-meeting.org/?abstract=10981]

Reference: PAGE 34 (2026) Abstr 12189 [www.page-meeting.org/?abstract=12189]

Poster: Methodology – AI/Machine Learning

PDF poster / presentation (click to open)