III-071 Thao Pham

DeepARV-ChemBERTa: Ensemble Deep Learning to Predict Drug-Drug Interaction of Clinical Relevance with Antiretroviral Therapy

Thao Pham (1), Mohamed Ghafoor (2), Sandra Grañana-Castillo (1), Catia Marzolini (1,3,4), Sara Gibbons (1), Saye Khoo (1), Justin Chiong (1), Dennis Wang (5,6), Marco Siccardi (1,7)

(1) Department of Pharmacology and Therapeutics, University of Liverpool, UK, (2) Department of Computer Science, University of Liverpool, UK, (3) Department of Infectious Diseases and Hospital Epidemiology, Departments of Medicine and Clinical Research, University Hospital Basel, Switzerland, (4) Service of Clinical Pharmacology, University Hospital Lausanne, Switzerland, (5) National Heart and Lung Institute, Imperial College London, UK, (6) Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), Republic of Singapore, (7) esqLABS

Introduction: 

Drug-drug interaction (DDI) may result in toxicity or treatment failure of antiretroviral therapy (ARV) or comedications. Despite the high number of possible drug combinations in clinical use, only a limited number of clinical DDI studies are conducted before and after marketing approval. Computational prediction of DDIs could provide key evidence for the rational management of complex therapies. Currently, the cutting-edge approach is via deep learning, a sub-field of machine learning which is inspired by the human neural network that offers powerful tools to generalise learning by mapping the artificial neurons between the given input and output data [1]. Transformer-based model is a type of neural network architecture that learns context and semantic information within sequential data via self-attention mechanism, gaining significant popularity for its effectiveness in capturing complex patterns/relationships within a sequence [2]. Designed for applications in cheminformatics, ChemBERTa [3], pre-trained on 10 million compounds from PubChem, serves as a robust tool for molecular representations and holds potential for further refinement for DDI prediction tasks. Our study aimed to assess the potential of deep learning approaches to predict DDIs of clinical relevance between ARVs and comedications.

Objectives:

  • Fine-tune the advanced transformer-based model, ChemBERTa-2 as a molecular feature extractor for the prediction of clinical DDI risks.
  • Employ downsampling and algorithmic adjusted weight techniques to address the imbalance challenge formed by the skewed distribution among DDI grading categories (with the most severe DDI category comprising less than 10% of the entire dataset).

Methods:

DDI severity grading between 30,142 drug pairs was extracted from the Liverpool HIV Drug Interaction database [4]. SMILES of each drug was featurised to embeddings by ChemBERTa, a transformer-based model. We developed DeepARV-ChemBERTa, a feed-forward neural network where drug embeddings were fed into the input layer and outputted 4 categories of DDI: a) Red: drugs should not be co-administered, ii) Amber: interaction of potential clinical relevance manageable by monitoring/dose adjustment, iii) Yellow: interaction of weak relevance and iv) Green: no expected interaction. ChemBERTa was downloaded from seyonec/PubChem10M_SMILES_BPE_450k tokenizer via Huggingface [5]. The final hidden layer was adapted to serve as a featuriser that outputted embeddings of length of 768 bits for a given drug SMILES. The embeddings were concatenated for corresponding drug pairs as input feature to the neural network, which was subsequently fine-tuned for the prediction of DDIs, called DeepARV-ChemBERTa. The imbalance in the distribution of DDI severity grades was addressed by undersampling and applying ensemble learning during training of DeepARV-ChemBERTa.

Results: 

To construct the independent test set for the evaluation of DeepARV-ChemBERTa, eight ARVs were randomly selected where associated drug pairs data with either of those ARVs (n = 5,824 pairs, 20% of the whole data) were kept blind from the 5-fold cross-validation training phase. On this independent test set, DeepARV-ChemBERTa predicted clinically relevant DDI between ARVs and comedications with a weighted mean accuracy of 0.837 ± 0.008, precision of 0.752 ± 0.012, sensitivity of 0.675 ± 0.016, and specificity of 0.878 ± 0.015. The evaluation metrics also included balanced accuracy – a trade-off between sensitivity and specificity, which was 0.776 ± 0.011.

Conclusions:

DeepARV-ChemBERTa uses compound molecular data as a basis for stratifying the clinical risk of a DDI. This approach has the potential to leverage molecular structures associated with DDI risks and reduce DDI class imbalance, effectively increasing the predictive ability on clinically relevant DDIs. DeepARV-ChemBERTa represents a predictive tool to rationalize the risk of DDIs for novel drug candidates with applications in the discovery and development phases, supporting the identification of high-risk pairing of drugs and streamlining the screening process.

References:
[1] LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature. 521(7553), 436-444 (2015).

[2] Gillioz, A. et al. Overview of the Transformer-based Models for NLP Tasks. In 2020 15th Conference on Computer Science and Information Systems (FedCSIS). 179-183 (IEEE, 2020).

[3] Chithrananda, S., Grand, G. and Ramsundar, B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 (2020).

[4] Liverpool HIV Interaction Checker. Available at: https://www.hiv-druginteractions.org/. Accessed on 01st March 2022.

[5] Hugging Face Model. Available at: https://huggingface.co/seyonec/PubChem10M_SMILES_BPE_450k/tree/main. Accessed on 11th March 2023.

Reference: PAGE 32 (2024) Abstr 10856 [www.page-meeting.org/?abstract=10856]

Poster: Methodology – AI/Machine Learning

PDF poster / presentation (click to open)