DeepCIM: Prediction of myelosuppressive anti-cancer molecules using drug-target interaction datasets

Lee Taeyeub (1), Thomas, Philipp (2), Serra Traynor, Carlos (3)

(1) Department of Surgery and Cancer, Imperial College London, London, UK; (2) Department of Mathematics, Imperial College London, London, UK ; (3) Clinical Pharmacology and Quantitative Pharmacology, Clinical Pharmacology and Safety Science, R&D, AstraZeneca, Cambridge, UK

Objectives: The accurate modelling and analysis of chemotherapy-induced myelosuppression (CIM) in preclinical development are crucial for understanding the potential therapeutic index of new investigational drugs. This work aims to predict the CIM effect of small molecule drugs using a novel deep neural network named DeepCIM that is fully connected and has three distinct input layers, each designed for different types of data. These include tokenised canonical SMILES (a text format representing chemical structures), tokenised protein sequences, and pIC50 values. DeepCIM’s applicability and accuracy are assessed by contrasting it with the support vector machine (SVM) standard, employing an external validation of drug-target interactions and CIM outcomes from various experimental studies.

Methods: DeepCIM has a triple input system integrating SMILES strings, protein sequences, and pIC50 values. The input data is processed through subsequent layers, increasing in neuron count to create high-dimensional representations. The network then integrates the features derived from each input source through a 1024-neuron layer and undergoes dropout layers to prevent overfitting. The output layer is designed for binary classification and enables accurate predictions of CIM outcomes. A five-fold cross-validation approach was leveraged to train and test DeepCIM using the DrugBank (which contains 16,575 drugs) and BindingDB (which involves 2.8 million data points of small molecules and protein targets) datasets. Testing with the Davis dataset (which contains experimental measures of affinities for 72 small molecules and 442 protein kinases) allowed for external validation of the DeepCIM approach.

Results: By aligning DrugBank IDs and SMILES strings with BindingDB, we identified 890 drug-target interactions involving 288 protein targets and 112 chemotherapeutics. DeepCIM was trained on amino acid sequence features to work on the Davis dataset, which shares only 80 proteins by UniProt ID with BindingDB. The external validation results for predicting CIM outcomes indicate that DeepCIM consistently outperforms the SVM. DeepCIM showcases a higher accuracy rate of 0.79 versus SVM’s 0.70 for all target drugs within the Davis dataset and 0.91 compared to SVM’s 0.78 for drugs common between the Davis and BindingDB datasets. Moreover, DeepCIM achieved a superior precision rate of 0.81 over SVM’s 0.61, along with an F-1 score of 0.89, notably outperforming SVM’s 0.76. The area under the curve (AUC) for DeepCIM stands at 0.85 against SVM’s 0.80 for all target drugs in the Davis dataset and escalates to 0.93 versus SVM’s 0.82 for the common drugs between the Davis and BindingDB datasets.

Conclusions: This study uses a comprehensive range of structural data encompassing chemical and biological characteristics to train our novel DeepCIM model. The results significantly improved SVM on the Davis benchmark evaluating real-world myelosuppressive activity. DeepCIM can predict whether a compound has a myelosuppressive or non-myelosuppressive therapeutic effect based on the binding affinity values between the ligand structure and target sequence.

References:
[1] Wishart, D.S., et al., DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res, 2018. 46(D1): p. D1074-D1082.
[2] Gilson, M.K., et al., BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res, 2016. 44(D1): p. D1045-53.
[3] Davis, M.I., et al., Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol, 2011. 29(11): p. 1046-51.

Reference: PAGE 32 (2024) Abstr 11108 [www.page-meeting.org/?abstract=11108]

Poster: Methodology – AI/Machine Learning

PDF poster / presentation (click to open)