Multi-Endpoint Item-Response Modelling for Integrating Several Endpoints in Alzheimer’s Disease Drug Trials

Camille Vong (1), Shuying Yang (2), Massimiliano Germani (3), Chao Chen (2)

Clinical Pharmacology Modelling and Simulation, (1)GSK, Baar, Switzerland, (2)GSK, Stevenage, UK, and (3)GSK, Tres Cantos, Spain.

Objectives: Several rating scales are used in Alzheimer’s Disease (AD) to describe the heterogeneity of the disorder as a collection of cognition, daily function, behaviour and quality of life assessments. They remain nonetheless an arduous and lengthy procedure for both physician and patient/caregiver [1]. On one hand, these scales have demonstrated low efficacy signal to noise ratio [2]; on the other hand, there is appreciable overlap among them. The objectives of this analysis were to build a multi-endpoint item-response model (MIRM), integrating these scales, and test the performance of the MIRM within a conceptual randomized trial framework, created entirely from pooled placebo data.

Methods: Individual data of common AD scales (CDR-SOB, MMSE, ADASCOG14, ADCS-ADL, iADRS, EQ5D-Proxy, QoL caregiver/patient) from mild to moderate AD patients were extracted from Critical Path Institute’s large and anonymized placebo database [3]. Patients were clustered into “non-responder” and “responder” groups using the unsupervised Machine Learning (ML) k-means and Silhouette methods; and resampled with incremental weight of the responder set to create hypothetical trials with 1:1 placebo:active randomization ratio. Selection of the final trial was based on 1) a target clinical threshold of the CDR-SOB mean change from baseline evaluated by MMRM on the two arms, and 2) the ability of preserving conditional exchangeability evaluated by propensity score distributions and average standardized differences (ASD) on covariates.

With item scores of all scales, an MIRM was developed as a compensatory multi-dimensional IRT using both exploratory (i.e. items loaded in each latent variable) and confirmatory (i.e. specific assignment of items per latent variable using hierarchical cluster analysis) approaches. Standard AIC, item characteristic curves (ICC) and simulation-based diagnostics, and correlation matrix of residuals were used to assess model performance. Symptom severity as latent variable(s) was estimated for each patient at each assessment time.

To test the model’s ability in detecting a drug effect, power assessment by MMRM for all scales and the MIRM was performed over 100 bootstrapped datasets of the final trial at different sample sizes (n=100-1200).

Results: Based on k-means, patients were split into 2 sets (270 “non-responder” vs. 754 “responder”) with cluster change-from-baseline means of (-0.90 vs. 0.32) for CDR-SOB, (-0.85 vs. 0.31) for ADASCOG14 and (-1.28 vs. 0.46) for iADRS. Based on weighted sampling, the selected trial exhibited arm differences in CDR-SOB (p=2.22e-05), ADASCOG14 (p= 0.139929), iADRS (p= 0.058463), ADCS-ADL (p= 0.04408). Visualization of overlapped propensity score distributions and unweighted ASD scores <0.1 for all covariates (except black or African American under-represented category) resulted in satisfactory conservation of conditional exchangeability.

A final MIRM based on 86 items (graded-response and binary) and 2 latent variables (total of 480 parameters) was found adequate to characterise the underlying disease. Although hierarchical cluster analyses revealed mainly 2 clades of items (featuring predominantly CDR-SOB, ADASCOG14, ADCS-ADL and partially MMSE, vs. QoLs, EQ5D-Proxy and the remaining MMSE items), a confirmatory 2-latent MIRM with a correlation of 0.221 performed better (ΔAIC = -8412.7). ICCs and Information curves of the final MIRM showed greater disease differential for CDR-SOB and ADCS-ADL items.

Based on the final trial’s original sample size, a statistically significant drug effect was found for the latent variable (p= 3.652017e-04). For MMRM-based power assessments from bootstrapped datasets, the final MIRM showed that with 800 subjects, the latent variable yielded 83% power to detect a treatment effect, while CDR-SOB, iADRS or ADCS-ADL alone resulted in 94%, 59% and 25% power, respectively.

Conclusions: The emerging results of this analysis demonstrate, as a proof-of-concept, the possibility of integrating several clinical scales to inform the underlying latent constructs. They also highlight that using MIRMs over the use of total scores could better increase power for detecting treatment effect over key scales, except CDR-SOB alone. The latent variable, estimated from multiple rating scales that are integrated by item-response methodology, may serve as a surrogate clinical endpoint with high signal-to-noise ratio.

References:
[1] Guideline on medicinal products for the treatment of Alzheimer’s disease and other dementias (CPMP/EWP/553/95 Rev.1). [http://www.emea.europa.eu]
[2] Becker RE, Greig NH. Alzheimer’s Disease Drug Development in 2008 and Beyond: Problems and Opportunities. Curr Alzheimer Res. 2008 Aug;5(4):346–57.
[3] The Critical Path for Alzheimer’s Disease database (CPAD). Available from: https://c-path.org/program/critical-path-for-alzheimers-disease/

Reference: PAGE 32 (2024) Abstr 11271 [www.page-meeting.org/?abstract=11271]

Poster: Methodology - New Modelling Approaches

PDF poster / presentation (click to open)