II-067 Melanie Karlsen

Identifying SHAP’s added value to PK covariate modeling on a small dataset

Mélanie Karlsen

LIRMM, Sanofi

Introduction:
Covariate model building can be the most time-consuming step of popPK modeling. To reduce this time required, several machine learning (ML) methods have gained the interest of pharmacometricians, among which the SHapley Additive exPlanations (SHAP) method [1]. This explainable AI tool has risen recent curiosity for its ability to both detect important covariates and decipher covariate-to-PK-parameter relationships [2, 3, 4, 5, 6], offering the potential for a decrease in the time used to build the covariate model. However, computing exact SHAP values can be highly expensive, which leads to question whether its added value is truly worth it.

Objectives:
Examine whether SHAP correctly identifies true covariates and their relationships to PK parameters in the case of a small dataset.

Methods:
First, a dataset was generated using original clinical study data from 74 patients with 9 covariates (5 categorical and 4 continuous) in total. PK concentrations in the dataset were simulated using the NONMEM V.7.5.1 software. The population model used for PK simulations was a one-compartment model with inter-occasion variability on clearance (CL) and volume of distribution (V), inter-individual variability on CL, V and absorption rate constant (KA), an exponential error model, and linear effects of creatinine clearance (CLCR) on CL and weight (WT) on V.  Subsequently, four ML models were trained on the dataset through leave-one-out cross-validation and were evaluated with normalized root mean squared error (NRMSE) on their ability to correctly predict CL, V and KA at baseline. These models were namely: Multi-Layer Perceptron (MLP), Random Forests (RF), XGBoost (XGB) and Support Vector Regression (SVR) and the computational environment was Python V3.6.15. Then, each model was trained on the full dataset for the SHAP procedure. SHAP covariate ranking was based on the mean absolute SHAP (mean(|SHAP|)) values and on the 90% middle span of the SHAP values (span(SHAP, 90%)). Ability to correctly rank covariates was assessed through ability to correctly identify the true covariate as the top ranked one with both mean(|SHAP|) and span(SHAP, 90%). To assess whether SHAP surpasses state-of-the-art ML feature selection methods [7], the best SHAP model was compared to the following methods: Gini importance, feature permutation importance with Random Forests and LASSO. To assess whether SHAP brought information on the functional relationship between covariates and PK parameters, SHAP values and individual PK values were plotted against each covariate and compared.

Results:
Out of the four machine learning models, SVR performed the best on average across all PK parameters (NRMSE =~20%). However, the SVR’s SHAP values failed at identifying the true covariates. Despite worse model performances (NRMSE =~25%), Random Forests was the only model for which SHAP values correctly identified the true covariates included in the final model. The more classical feature selection methods Gini importance and permutation importance, which are both based on Random Forests, are the only classical methods that correctly identified the true covariates. This implies that this ability was an intrinsic property of random forests, rather than an added value from SHAP in our case.

SHAP values correctly and clearly suggested a linear relationship between covariates and PK parameters with all machine learning models. Although a linear functional form could have been suspected from the PK-parameter-to-covariate plots, SHAP values suggested it with much more confidence.

A single instance of computing the SHAP values on one PK parameter took on average 43.03 +- 4 seconds.

Conclusion:

With this dataset, SHAP did not demonstrate the potential to be a complete full model alternative to classical covariate analyses. In such cases of small datasets, the computational time of SHAP remains low enough for it to be worth the information it brings on the functional form between covariates and PK parameters. Unfortunately, this computational time grows exponentially with covariates and extra patients in the dataset.

References:

  • [1] Janssen, Alexander. (2022). Application of SHAP values for inferring the optimal functional form of covariates in pharmacokinetic modeling. CPT Pharmacometrics & Systems Pharmacol.
  • [2] Basu, S. (2022). Predicting disease activity in patients with multiple sclerosis: An explainable machine-learning approach in the Mavenclad trials. CPT Pharmacometrics Syst Pharmacol.
  • [3] Ogami, C. (2021). An artificial neural network-pharmacokinetic model and its interpretation using Shapley additive explanations. CPT Pharmacometrics Syst Pharmacol.
  • [4] Terranova, N. (2022). Pharmacometric modeling and machine learning analyses of prognostic and predictive factors in the JAVELIN Gastric 100 phase III trial of avelumab. CPT Pharmacometrics Syst Pharmacol.
  • [5] Zhu, X. (2022). Machine learning advances the integration of covariates in population pharmacokinetic models: Valproic acid as an example. Front Pharmacol.
  • [6] Lunderg, S.-I. L. (2017). A Unified Approach to Interpreting Model Predictions. NIPS.
  • [7] Molnar, C. (2019). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.

Reference: PAGE 32 (2024) Abstr 10808 [www.page-meeting.org/?abstract=10808]

Poster: Methodology – AI/Machine Learning

PDF poster / presentation (click to open)