II-085

Interpretable Deep Learning Survival Analysis of Alzheimer’s Disease Using Genetic Variants Associated with Metabolic Disorders

Sungwoo Goo 1,2,4, Soyoung Lee 1,4, Jung-woo CHAE 1,2,4, Sangkeun Jung 2,3, Hwi-yeol Yun 1,2

1 College of Pharmacy, Chungnam National University (Yuseong-gu, Korea, Republic of), 2 Department of Bio-AI convergence, Chungnam National University (Yuseong-gu, Korea, Republic of), 3 Department of Computer Science and Engineering (Yuseong-gu, Korea, Republic of), 4 Institute of Drug Research and Development, Chungnam National University (Yuseong-gu, Korea, Republic of)

Objectives: Alzheimer’s disease (AD) is a progressive and multifactorial neurodegenerative disorder resulting from a highly complex interplay of genetic and environmental factors. Predicting the onset of AD is crucial for early intervention; however, traditional statistical approaches, such as the Cox proportional hazards model, face significant limitations. Specifically, these linear models struggle to capture the complex, nonlinear relationships and epistatic (gene-gene) interactions inherent in high-dimensional genomic data. Attempting to explicitly include all potential interaction terms in a linear framework inevitably leads to a combinatorial explosion, rendering the analysis computationally infeasible. While deep learning models offer a solution by autonomously learning these high-order nonlinear patterns, their “black-box” nature severely restricts their clinical interpretability and trustworthiness.
Methods : To overcome these dual challenges of modeling complexity and interpretability, we developed a novel predictive framework integrating a feedforward neural network (FFN) with a parametric Weibull accelerated failure time (AFT) survival model. This FFN-Weibull model was trained to predict the time to AD onset using a high-dimensional genetic dataset sourced from the Clinical & Omics Data Archive. The dataset included older adults with and without AD, incorporating the highly potent APOE genotype alongside a large panel of weaker, noisier single-nucleotide polymorphisms (SNPs) associated with metabolic disorders like dyslipidemia and type 2 diabetes. We employed 5-fold cross-validation and the Concordance index (C-index) to compare the predictive performance of our model against a traditional linear Weibull baseline. Crucially, we integrated Shapley additive explanations (SHAP), an explainable artificial intelligence (XAI) technique, to demystify the neural network, quantify the precise contribution of each genetic feature, and visualize individual risk profiles.
Results: The FFN-Weibull model demonstrated robust generalization and consistently outperformed the linear baseline in predictive accuracy, yielding a higher mean test C-index. The SHAP analysis successfully validated the model’s biological plausibility by autonomously isolating the strongest genetic signals from the high-dimensional noise. It accurately identified APOE E4 as the primary driver of early AD onset and APOE E2 as a potent protective factor. Furthermore, SHAP revealed intricate nonlinear dynamics missed by linear models, such as the asymmetric impact of APOE E2’s presence compared to its absence. Interestingly, the model identified complex, and sometimes counterintuitive, effects among the metabolic-disorder-related SNPs, such as specific TBL2 variants. Several genotypes traditionally viewed strictly as metabolic risk factors exhibited protective tendencies against AD onset in specific contexts, strongly suggesting the presence of underlying gene-environment or gene-gene interactions.
Conclusions: The proposed FFN-SHAP framework provides a highly effective, interpretable tool for unraveling the genetic architecture of complex diseases. By inherently solving the nonlinear challenges of epistasis without requiring predefined interaction terms, the model far exceeds the capabilities of traditional linear analyses. Our approach not only validated established biological knowledge regarding APOE but also generated novel, testable hypotheses concerning the nuanced roles of metabolic SNPs in AD pathogenesis. These findings underscore the critical need for future comprehensive studies that integrate large-scale genetic data with acquired environmental factors to fully decode the complex etiology of Alzheimer’s disease.
Lee S, CHAE J, and Jung S, Yun H were contributed equally as correspondents.

References:
Stephen T Sherry, M-H Ward, M Kholodov, J Baker, Lon
Phan, Elizabeth M Smigielski, and Karl Sirotkin. dbsnp:
the ncbi database of genetic variation. Nucleic acids
research, 29(1):308–311, 2001
Yasunobu Nohara, Koutarou Matsumoto, Hidehisa
Soejima, and Naoki Nakashima. Explanation of machine
learning models using shapley additive explanation and
application for real data in hospital. Computer Methods and Programs in Biomedicine, 214:106584, 2022.
Cameron Davidson-Pilon. lifelines: survival analysis in
python. Journal of Open Source Software, 4(40):1317,
2019.
Mélodie Monod, Peter Krusche, Qian Cao, Berkman
Sahiner, Nicholas Petrick, David Ohlssen, and Thibaud
Coroller. Torchsurv: A lightweight package for deep survival
analysis. arXiv preprint arXiv:2404.10761, 2024.
Scott M Lundberg and Su-In Lee. A unified approach to
interpreting model predictions. Curran Associates, Inc.,
2017.
Philip B Verghese, Joseph M Castellano, and David M
Holtzman. Apolipoprotein e in alzheimer’s disease and
other neurological disorders. The Lancet Neurology,
10(3):241–252, 2011.
Xiao-Na Zeng, Rui-Xing Yin, Ping Huang, Ke-Ke Huang,
Jian Wu, Tao Guo, Quan-Zhen Lin, Lynn Htet Htet Aung,
Jin-Zhen Wu, and Yi-Ming Wang. Association of the
mlxipl/tbl2 rs17145738 snp and serum lipid levels in the
guangxi mulao and han populations. Lipids in Health and
Disease, 12(1), October 2013.
Chao-Qiang Lai, Mary K. Wojczynski, Laurence D.
Parnell, Bertha A. Hidalgo, Marguerite Ryan Irvin, Stella
Aslibekyan, Michael A. Province, Devin M. Absher,
Donna K. Arnett, and José M. Ordovás. Epigenome-wide
association study of triglyceride postprandial responses to
a high-fat dietary challenge. Journal of Lipid Research,
57(12):2200–2207, December 2016.
Chunxiao Xu, Rongpan Bai, Dandan Zhang, Zhenli Li,
Honghong Zhu, Maode Lai, and Yimin Zhu. Effects of
apoa5 -1131t>c (rs662799) on fasting plasma lipids and risk
of metabolic syndrome: Evidence from a case-control study
in china and a meta-analysis. PLoS ONE, 8(2):e56216,
February 2013.

Reference: PAGE 34 (2026) Abstr 12144 [www.page-meeting.org/?abstract=12144]

Poster: Methodology – AI/Machine Learning