Welcome to the Population Approach Group in Europe

Combining PKPD models with agent-based modelling and reinforcement learning/artificial intelligence algorithm to optimize cancer treatment

Van Thuy Truong1,2, Grant D. Lythe2, James W. T. Yates3, Paolo Vicini4, Vincent F. S. Dubois1

1. Clinical Pharmacology and Quantitative Pharmacology, Clinical Pharmacology and Safety Sciences, AstraZeneca, Aaron Klug Building, Granta Park, Cambridge, CB21 6GH, UK 2. Department of Applied Mathematics, University of Leeds, Leeds, United Kingdom 3. DMPK, IVIVT, RD Research, GSK, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, United Kingdom 4. Confo Therapeutics, Technologiepark 94, 9052 Ghent (Zwijnaarde), Belgium

Introduction: Cancer is a complex disease and treatment success depends on tumour characteristics such as growth rate, immune cell infiltration, spatial distribution, emerging drug resistance, mutations and environmental conditions like oxygen gradients. While agent-based models are suitable to simulate complex multiscale biological systems [1], finding the optimum treatment schedule can be difficult due to the complexity of the system and the need to consider a vast amount of components and interactions. For example, each cell cycle phase has a different sensitivity to treatment. Docetaxel targets the G2 and M phase [2]. The S, G2 and M phases are the most radiosensitive while the resting phase G0 is the least affected by radiotherapy [3]. In addition, immune cells have different radiosensitivity. Radiation can increase the immune cell infiltration while also increase PDL1 mutation in cancer cells and subsequentially cause immune effector exhaustion [4]. PD1 antibody treatment success depends on the mutation status of the tumour and the immune cell infiltration [5]. Further, toxicity of therapies need to be considered and combining treatment options can act synergistic or detrimental.

Objectives:

Introduce a PKPD ODE PDE ABM to simulate the interaction of tumour and immune cells, environmental conditions and drug concentration effect of different therapies;

Apply reinforcement learning to optimize treatment and highlight the use of this artificial intelligence algorithm in oncology.

Methods:

We have developed a PKPD ODE PDE ABM. The model is hybrid: we use ODEs to simulate the PKPD of drug treatment such as PD1 antibody, docetaxel and DNA damage response inhibitor treatment [6,7,8]. A modified linear square model describes the effect of radiotherapy on immune and cancer cells [3]. The oxygen and drug gradient is modelled with a PDE. Interactions between tumour and immune cells and the heterogeneity of cells are simulated with a 3D ABM. This model is implemented using the python programming language. Q learning is applied to optimize combination therapy. This is a reinforcement learning algorithm where a learning agent aims to find the optimal action given the previous state by collecting rewards and avoiding punishments [9]. To apply this concept to cancer treatment, each state space is defined with the amount of infiltrated immune cells and the number of cancer cells in a specific cell cycle phase. Different treatment options and their combinations are possible actions that the algorithm can take to collect the resulting number of eliminated cancer cells as a rewards and side effects such as elimination or exhaustion of immune cells as a punishment. During the learning phase the agent explores the environment taking random actions and evaluating the outcome to find the best next treatment action based on the current state of the system.

Results:

We successfully combined reinforcement learning with a hybrid ODE PDE ABM. The preliminary results show that after 10000 runs for training which takes approximately 24 h, the algorithm chooses treatment schedules according to the tumour status. Due to the stochasticity of the ABM, the selected treatment schedule is different for each run. But given the tumour status a pattern can be observed. With an early treatment start at 50 days, few immune effector cells have infiltrated the tumour microenvironment. Therefore, chemotherapy is preferred. With a later starting point at 100 days, PD1 antibody is chosen more frequently. While in an immunosuppressive tumour micro environment, PD1 antibody is chosen to maintain immune function.

Conclusions:

Through simulations we show the power of combining reinforcement learning with a hybrid agent-based model to optimise a complex modelling problem. This algorithm provides an insight into the complex interaction during combination therapy with a manageable running time. Depending on the tumour composition different treatment schedules are chosen. While less immunogenic tumours are receiving chemotherapy and DNA damage response inhibitors, with an increased immune cell infiltration, PD1 antibody treatment is preferred. Further, the simulation can be personalized by changing the parameters. With the advent of more patient specific data and complex ABMs, we anticipate that the use of reinforcement algorithm will help to elucidate dose scheduling and rationalize combination strategies in oncology and other therapeutic areas.

References:
[1] Truong, Van Thuy, et al. "Step‐by‐step comparison of ordinary differential equation and agent‐based approaches to pharmacokinetic‐pharmacodynamic models." CPT: Pharmacometrics & Systems Pharmacology 11.2 (2022): 133-148.
[2] Han, T. D., D. H. Shang, and Y. Tian. "Docetaxel enhances apoptosis and G2/M cell cycle arrest by suppressing mitogen-activated protein kinase signaling in human renal clear cell carcinoma." Genet Mol Res 15.1 (2016): 1-10.
[3] Powathil, Gibin G., Douglas JA Adamson, and Mark AJ Chaplain. "Towards predicting the response of a solid tumour to chemotherapy and radiotherapy treatments: clinical insights from a computational model." PLoS computational biology 9.7 (2013): e1003120.
[4] Sato, Hiro, et al. "Radiotherapy and PD-L1 expression." Gan to Kagaku ryoho. Cancer & Chemotherapy 46.5 (2019): 845-849.
[5] Kim, J. M., and Daniel S. Chen. "Immune escape to PD-L1/PD-1 blockade: seven steps to success (or failure)." Annals of Oncology 27.8 (2016): 1492-1504.
[6] Lindauer, A., et al. "Translational pharmacokinetic/pharmacodynamic modeling of tumor growth inhibition supports dose‐range selection of the anti–PD‐1 antibody pembrolizumab." CPT: pharmacometrics & systems pharmacology 6.1 (2017): 11-20.
[7] Frances, Nicolas, et al. "Tumor growth modeling from clinical trials reveals synergistic anticancer effect of the capecitabine and docetaxel combination in metastatic breast cancer." Cancer chemotherapy and pharmacology 68 (2011): 1413-1419.
[8] Hamis, Sara, et al. "Bridging in vitro and in vivo research via an agent-based modelling approach: predicting tumour responses to an atrinhibiting drug." bioRxiv.
[9] Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8 (1992): 279-292.

PAGE 2023: Methodology � AI/Machine Learning
Van Thuy Truong

Combining PKPD models with agent-based modelling and reinforcement learning/artificial intelligence algorithm to optimize cancer treatment

Reference: PAGE 31 (2023) Abstr 10367 [www.page-meeting.org/?abstract=10367]

Poster: Methodology � AI/Machine Learning