Conor O'Hanlon1, Jonas Denck1, Candice Jamois2, Clarisse Chavanne2, Elif Ozkirimli1, Stefanie Bendels1, Oscar Brück3,4,5, Eric Fey6, Kimmo Porkka3,4, Ken Wang2
1Roche Informatics, F. Hoffmann-La Roche AG, 2Roche Pharmaceutical Research and Early Development, Roche Innovation Center, 3Comprehensive Cancer Center, Helsinki University Hospital, 4Department of Oncology, University of Helsinki, 5Department of Clinical Chemistry, HUS Diagnostic Center, Helsinki University Hospital, 6iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki and Helsinki University Hospital
Introduction/Objectives: Real-world data (RWD) from electronic health records provide a rich source of information about patient trajectories and outcomes. Machine learning (ML) models can make use of these large datasets to identify significant features and patterns associated with drug efficacy or safety. However using clinical RWD in ML modelling can be challenging due to sparse, imbalanced and missing data. Pharmacokinetic-pharmacodynamic (PKPD) modelling can provide biological and mechanistic basis in a prediction task. One opportunity to combine the approaches is to use PKPD-informed labels for ML classification tasks. PKPD models can use individual and population level information to make a prediction at important time points, even when there may not be a direct observation. This can help to label more patients, and amplify the signal from relevant features across the population. Predictive models can be used to support clinical decision-making by identifying baseline features associated with development of severe neutropenia. The objective of this study was to develop a predictive framework that combines PKPD-informed labeling with ML techniques to accurately predict severe neutropenia risk. Development of severe neutropenia can be dose limiting in oncology and few treatment options exist. Severe neutropenia following docetaxel administration was selected as a case example. Methods: Data were available from oncology electronic health records at the Helsinki University Hospital (HUS), structured in the OMOP common data model. Patients administered docetaxel without other anti-cancer or Granulocyte Colony Stimulating Factor drugs within ±30 days were eligible for inclusion. Dose and prior information about docetaxel pharmacokinetics were used to predict drug exposure, driving neutropenia drug effect [1]. The Friberg model was used to describe the neutrophils dynamics following docetaxel [2]. The PKPD analysis was performed using NONMEM (ICON Development Solutions, Maryland, USA) version 7.5.1 and Perl Speaks NONMEM version 5.5.0. Model evaluation involved Visual Predictive Checks and non-parametric bootstrapping to assess parameter uncertainty. The ML model used the XGBoost algorithm for the severe neutropenia prediction task. Only baseline information prior to the first dose was used (no on treatment data). Patients with a neutrophil observation below < 0.1 cells9/L were labelled as positive. However negative labels are more challenging when dealing with sparse data. A naive approach requires assumptions about when the severe neutropenia will occur (e.g. requiring an neutrophil observation > 1.0 cells9/L between 5-7 days after the docetaxel dose). The PKPD-informed label was determined by the prediction at the individual patient’s nadir and did not require an observation in a specific window. Feature selection was conducted through Sequential Feature Selection and Conditional Permutation Importance. Standard ML classification evaluation metrics were used to assess predictive performance. Results: After application of exclusion criteria and data cleaning, data from 4477 patients from the HUS database were available for analysis. The population PD parameter estimates (relative standard error %) were, baseline neutrophils 4.14 cells9/L (7.3 %), mean transit time 2.4 days (5.1 %), Hill exponent 0.12 (17 %), drug effect slope 13.8 (8.3 %). A combined (additive and proportional) error model was used to describe the residual error. For the ML model, the AUC-ROC was 0.72, AUC-PR was 0.50, precision was 1.0 and recall was 0.14. The PKPD enhanced labelling enabled all patients in the dataset to be labelled compared to 1754 (39%) patients using a naive method. Key predictive features were identified using Conditional Permutation Importance and included neutrophil count prior to dose, dose number, c-reactive protein concentration, and bilirubin concentration. Conclusion: Integrating PKPD with ML models provides a larger population with which to gain insights into the features associated with severe neutropenia risk. This framework leverages RWD, mechanism based PKPD model predictions and ML algorithms. The framework can be generalized to other interventions and patient outcomes.
[1] Bruno, René, et al. “Population pharmacokinetics and pharmacokinetic-pharmacodynamic relationships for docetaxel.” Investigational new drugs 19 (2001): 163-169. [2] Friberg, Lena E., et al. “Model of chemotherapy-induced myelosuppression with parameter consistency across drugs.” Journal of clinical oncology 20.24 (2002): 4713-4721.
Reference: PAGE 33 (2025) Abstr 11603 [www.page-meeting.org/?abstract=11603]
Poster: Methodology – AI/Machine Learning