2022 - Ljubljana - Slovenia

PAGE 2022: Methodology - New Modelling Approaches
Benjamin Ribba

Population model-enhanced reinforcement learning to enable precision dosing: case study with dosing of propofol for sedation

Ben Ribba, Dominic Stefan Bräm, Paul Gabriel Baverel, Richard Peck

F. Hoffmann-La Roche Ltd


Precision dosing is a key enabler for precision medicine by taking each patient and their disease’s individual characteristics into account to individualize dose regimen [1].

Reinforcement learning (RL) is an artificial intelligence (AI) technique mimicking how humans and animals learn [2]. Through trials-and-errors, a machine learns optimal actions to take based on recursive reward feedback from interacting with its environment to maximize a long-term goal. One of the main application area of RL is creation of bots in video gaming. RL has also application in medicine in particular for sedation in intensive care units where it can adjust automatically dosing of drugs such as propofol.

As any AI technique, RL requires a large amount of data to work efficiently. To circumvent these limitations, we propose to complement RL with a simulation engine powered by population modeling to generate artificial data on which RL can learn.

Our objective is to evaluate and discuss the efficiency of coupling RL with population modeling through the study of precision dosing of propofol.


The pharmacokinetic-pharmacodynamic (PKPD) model of propofol inducing sedation has been reported in literature [3]. The PK part consists of a linear three compartments model and an effect compartment. The PD endpoint is the bispectral index (BIS) characterizing the level of consciousness. A linear relationship with negative slope links concentration in the effect compartment and BIS. Baseline level of BIS is fixed to 93.6 representing full consciousness, while the target level of sedation is arbitrarily chosen at BIS = 55.

Each dosing action is atomic, with one-second interval, consistent with syringe pump precision. Dosing action space is dichotomous: each second, a decision to dose or not must be taken.  

The precision dosing problem is: Find the best dosing regimen for each patient minimizing the residual mean square errors (RMSE) between the patient’s BIS and the target.

The model is used to:

  1. Simulate clinical trial data. Simulations is performed with a population size of 100 subjects, a coefficient of variation (CV) of 10% on the PK parameters. Performed tests include trial population sizes up to 1000 subjects, CV up to 40% and 10% measurement error. In the simulated trials, all patients are treated with a standard dosing protocol, consisting of dosing if the BIS is above the target and refrain for dosing otherwise.
  2. Embed a simulation engine within the RL algorithm to explore dosing scenarios not encountered in the clinical trial data.

Q-learning is used as the RL algorithm [2]. BIS levels and PKPD model variables are considered as patient’s state definition. The reward feedback function is designed to increase from 0 to 1 as BIS gets closer to its target, either from above or below. Long-term goal is measured in terms of RMSE.


By just analyzing retrospective trials data, more than 500 patients are needed for the RL algorithm to deliver a precision dosing solution within a 2-fold range accuracy with respect to the standard protocol.

However, when using the PKPD model as a simulation engine within the core of the RL algorithm, a population of 100 patients is enough for the algorithm to identify precision dosing solution with a 60% increased accuracy with respect to the standard dosing solution. This improvement in accuracy is obtained by learning a non-intuitive behavior: start reducing dosing frequency in average 10 seconds before with respect to the standard protocol. This earlier dosing frequency reduction contributes to smoothen oscillations around the target typically observed with standard protocol.

Increasing the definition of patient’s state by not just only the BIS level but also the concentration of drug in the model compartments further improves the accuracy of the identified dosing solution by 20%.

Minimal level of measurement errors, such as 1%, has a major impact on the efficiency of the predictive method. However, intra-individual variability on model parameters up to 40% does not affect accuracy.


While fundamentally distinct from the usual Bayesian inference used by pharmacometricians to optimize dose, RL can synergize with population modeling and presents a unique opportunity to integrate multidimensional data effectively in search of an individualized dosing regimen tailored towards a patient-centric medicine development paradigm.

[1] Peck, R.W., Precision Medicine Is Not Just Genomics: The Right Dose for Every Patient. Annu Rev Pharmacol Toxicol, 2018. 58: p. 105-122.
[2] Sutton, B., Reinforcement Learning: An Introduction. 2 ed. 2018.
[3] Marsh, B., et al., Pharmacokinetic model driven infusion of propofol in children. Br J Anaesth, 1991. 67(1): p. 41-8.

Reference: PAGE 30 (2022) Abstr 10023 [www.page-meeting.org/?abstract=10023]
Oral: Methodology - New Modelling Approaches