Combining Reinforcement Learning and PK-PD models to personalize multiple drug administration: an application to Axitinib-Anti-Hypertensive treatment in metastatic renal cancer patients

Alessandro De Carlo(1), Elena Maria Tosca(1), Nadia Terranova (2), Paolo Magni(1)

(1) Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy; (2) Merck Institute for Pharmacometrics, Ares Trading S.A. (an affiliate of Merck KGaA, Darmstadt, Germany), Lausanne, Switzerland

Objectives: Reinforcement Learning (RL) is a class of Machine Learning algorithms designed to solve sequential decision-making problems [1]. Recently, the integration of RL with PK-PD models gained momentum for supporting treatment personalization within the landscape of dose-adaptive therapies [2], [3]. However, there is still a limited literature on using this approach to personalize the administration of multiple drugs [4].

The aim of this work is evaluating the integration of RL and PK-PD models to personalize concomitant drug administrations. We considered, as case study, the axitinib and anti-hypertensive (AH) co-treatment in metastatic renal cell carcinoma (RCC).

Background: Axitinib is an orally-administered tyrosine kinase inhibitor of the vascular endothelial growth factors receptors (VEGFR), which are involved in the tumor progression, metastasis and angiogenesis [5]. Axitinib has already been approved in several countries for the treatment of RCC [5]. Analogously to other VEGFR inhibitors, diastolic blood pressure (dBP) is proposed as a biomarker for both efficacy and toxicity of Axitinib treatment [6], [7]. Indeed, an increase of dBP was correlated with a higher drug efficacy [8], [9], conversely hypertension is the most common adverse event and AH medications are often prescribed during Axitinib treatment [10], [11], [12]. Furthermore, a BP<180/90 mmHg is required to start Axitinib treatment [8], [10], [11]. Similarly to other targeted anticancer treatments [13], [14], [15], [16], an adaptive dosing strategy based on dBP monitoring has been proposed [17].

Recently, a novel modelling framework was developed to quantitatively describe Axitinib effect on dBP, soluble VEGFR (sVGEFR) concentration, tumor size and overall survival (OS) [18].

Methods: Axitinib models available from literature [18], [19] do not consider the effect of AH co-medications. In order to consider a more realistic scenario, AH effect on dBP was introduced by leveraging the inhibition function designed for Levantinib-AH co-treatment [20]. Model steady-state analysis and Monte Carlo Simulations using individual parameter distributions were used to identify 75 different types of dBP-OS response patterns in the target population. The implemented RL algorithm was Q-Learning (QL) which was integrated with the empirical Axitinib-AH PK-PD model to simulate all possible dosing scenarios. QL framework was formalized to provide clinically plausible decisions in the following way.

Patient health status was described by a tuple containing discretized dBP levels (i.e., inefficacy if dBP<90 mmHg, efficacy if dBP mmHg, moderate toxicity if dBP mmHg and severe toxicity if dBP>105 mmHg), previously administered doses of AH and Axitinib, and, finally, a flag value indicating whether treatment was stopped due to severe toxicity (1) or not (0).
At the beginning of treatment, QL agents could select only the initial Axitinib dose as patients should be normotensive before starting the treatment [8], [10], [11]. Then, QL-based Axitinib dose adjustments were defined using realistic clinical constraints (i.e., gradual dose changes, interruption in presence of severe toxicities, available dose levels coherent with clinical formulation). Given the absence of general consensus on AH medications in terms of drug and schedule, QL agents could couple each Axitinb dose level with five integer AH normalized dosages ranging from 0 to 4[20].
Reward function was designed to simultaneously achieve the following goals: pushing up Axtinib dose, limiting AH medications and maintaining dBP in the target range, i.e., [90,100) mmHg.

For each of the 75 different dBP response patterns, a randomly extracted patient represented by a set of PK-PD parameters was considered and a personal QL agent was trained to tailor the co-medication on that specific individual.

Results: QL agents were able to correctly discern patients well-tolerating higher Axitinib dosages from those requiring a gradual Axitinib up-titration with an eventual increase of AH medications to compensate for hypertension. Learned policies allowed to push up Axtinib to its maximum available level (10 mg b.i.d.) in all the 75 archetypal virtual patients and avoid severe hypertension in all possible scenarios. Furthermore, the administered AH dose was always maintained below the highest dose, thus avoiding its overdosing.

Conclusions: RL combined with PK-PD modelling showed good results on this co-administration case study, confirming the flexibility and the power of the methodology. Further analysis will be focused on directly optimizing patient survival probability by integrating OS within reward function.

References:
[1]    R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[2]    A. De Carlo, E. M. Tosca, M. Fantozzi, and P. Magni, ‘Reinforcement Learning and PK-PD Models Integration to Personalize the Adaptive Dosing Protocol of Erdafitinib in Patients with Metastatic Urothelial Carcinoma’, Clinical Pharmacology & Therapeutics, vol. n/a, no. n/a, Feb. 2024, doi: 10.1002/cpt.3176.
[3]    De Carlo, Alessandro, Tosca, Elena Maria, and Magni, Paolo, ‘Integrating Reinforcement Learning and PK-PD modelling to enable precision dosing: a multi-objective optimization for the treatment of Polycithemia Vera patients with Givinostat’, PAGE, vol. 31, [Online]. Available: www.page-meeting.org/?abstract=10580
[4]    J.-N. Eckardt, K. Wendt, M. Bornhäuser, and J. M. Middeke, ‘Reinforcement Learning for Precision Oncology’, Cancers, vol. 13, no. 18, 2021, doi: 10.3390/cancers13184624.
[5]    C. Rothermundt et al., ‘Second-line treatment for metastatic clear cell renal cell cancer: experts’ consensus algorithms.’, World J Urol, vol. 35, no. 4, pp. 641–648, Apr. 2017, doi: 10.1007/s00345-016-1903-6.
[6]    P. Bhargava, ‘VEGF kinase inhibitors: how do they cause hypertension?’, Am J Physiol Regul Integr Comp Physiol, vol. 297, no. 1, pp. R1-5, Jul. 2009, doi: 10.1152/ajpregu.90502.2008.
[7]    M. Jain and R. R. Townsend, ‘Chemotherapy agents and hypertension: a focus on angiogenesis blockade.’, Curr Hypertens Rep, vol. 9, no. 4, pp. 320–328, Aug. 2007, doi: 10.1007/s11906-007-0058-7.
[8]    B. I. Rini et al., ‘Axitinib dose titration: analyses of exposure, blood pressure and clinical response from a randomized phase II study in metastatic renal cell carcinoma.’, Ann Oncol, vol. 26, no. 7, pp. 1372–1377, Jul. 2015, doi: 10.1093/annonc/mdv103.
[9]    B. I. Rini et al., ‘Axitinib in metastatic renal cell carcinoma: results of a pharmacokinetic and pharmacodynamic analysis.’, J Clin Pharmacol, vol. 53, no. 5, pp. 491–504, May 2013, doi: 10.1002/jcph.73.
[10] Y. Tomita et al., ‘Key predictive factors of axitinib (AG-013736)-induced proteinuria and efficacy: a phase II study in Japanese patients with cytokine-refractory metastatic renal cell Carcinoma.’, Eur J Cancer, vol. 47, no. 17, pp. 2592–2602, Nov. 2011, doi: 10.1016/j.ejca.2011.07.014.
[11] M. Eto et al., ‘Overall survival and final efficacy and safety results from a Japanese phase II study of axitinib in cytokine-refractory metastatic renal cell carcinoma.’, Cancer Sci, vol. 105, no. 12, pp. 1576–1583, Dec. 2014, doi: 10.1111/cas.12546.
[12] B. I. Rini et al., ‘Hypertension among patients with renal cell carcinoma receiving axitinib or sorafenib: analysis from the randomized phase III AXIS trial.’, Target Oncol, vol. 10, no. 1, pp. 45–53, Mar. 2015, doi: 10.1007/s11523-014-0307-z.
[13] H.-Y. Min and H.-Y. Lee, ‘Molecular targeted therapy for anticancer treatment’, Experimental & Molecular Medicine, vol. 54, no. 10, pp. 1670–1694, Oct. 2022, doi: 10.1038/s12276-022-00864-3.
[14] D. M. K. Keefe and E. H. Bateman, ‘Tumor control versus adverse events with targeted anticancer therapies’, Nature Reviews Clinical Oncology, vol. 9, no. 2, pp. 98–109, Feb. 2012, doi: 10.1038/nrclinonc.2011.192.
[15] A. Mueller-Schoell et al., ‘Therapeutic drug monitoring of oral targeted antineoplastic drugs’, European Journal of Clinical Pharmacology, vol. 77, no. 4, pp. 441–464, Apr. 2021, doi: 10.1007/s00228-020-03014-8.
[16] S. L. Groenland et al., ‘Precision Dosing of Targeted Therapies Is Ready for Prime Time’, Clinical Cancer Research, vol. 27, no. 24, pp. 6644–6652, Dec. 2021, doi: 10.1158/1078-0432.CCR-20-4555.
[17] Pfizer Laboratories Div Pfizer Inc., ‘INLYTA — axitinib tablet, film coated: Full prescribing information.’ Accessed: Dec. 01, 2023. [Online]. Available: https://labeling.pfizer.com/ShowLabeling.aspx?id=759#section-1.1
[18] E. Schindler, M. Amantea, M. Karlsson, and L. Friberg, ‘A Pharmacometric Framework for Axitinib Exposure, Efficacy, and Safety in Metastatic Renal Cell Carcinoma Patients’, CPT: Pharmacometrics & Systems Pharmacology, vol. 6, no. 6, pp. 373–382, Jun. 2017, doi: 10.1002/psp4.12193.
[19] Y. Chen, B. I. Rini, A. H. Bair, G. M. Mugundu, and Y. K. Pithavala, ‘Population pharmacokinetic-pharmacodynamic modelling of 24-h diastolic ambulatory blood pressure changes mediated by axitinib in patients with metastatic renal cell carcinoma.’, Clin Pharmacokinet, vol. 54, no. 4, pp. 397–407, Apr. 2015, doi: 10.1007/s40262-014-0207-5.
[20] R. J. Keizer et al., ‘A model of hypertension and proteinuria in cancer patients treated with the anti-angiogenic drug E7080.’, J Pharmacokinet Pharmacodyn, vol. 37, no. 4, pp. 347–363, Aug. 2010, doi: 10.1007/s10928-010-9164-2.

Reference: PAGE 32 (2024) Abstr 11170 [www.page-meeting.org/?abstract=11170]

Poster: Methodology - New Modelling Approaches