III-010

An Item Response Theory-Based Comparison of Multiple Versions of SARA Rating Assessment of Degenerative Ataxia in a SCA3 Cohort

Alzahra Hamdan 1, Fengjing Wang 1, Andrew C. Hooker 1, Xiaomei Chen 1, ESMI Consortium, Matthis Synofzik 2,3, Mats O. Karlsson 1

1 Pharmacometrics Research Group, Department of Pharmacy, Uppsala University (Uppsala, Sweden), 2 Division Translational Genomics of Neurodegenerative Diseases, Center for Neurology and Hertie Institute for Clinical Brain Research, University of Tübingen (Tübingen, Germany), 3 German Center for Neurodegenerative Diseases (DZNE) (Tübingen, Germany)

Background: Spinocerebellar Ataxia (SCA) is a heterogeneous group of rare autosomal-dominant progressive diseases that affect gait, balance, and limb coordination (1). The Scale for Assessment and Rating of Ataxia (SARA) is the most widely used ataxia outcome in registries and clinical trials (2,3). It is a clinician-reported outcome comprising 4 axial items (gait, stance, sitting, and speech) and 4 appendicular items (for limb movements). SARA items are rated on a 5-, 7-, or 9-point ordinal scale, yielding a total score range of 0 (non-ataxia) to 40 (most severe).
Critiques about the metric properties and meaningfulness of SARA were raised by ataxia researchers and regulators (4,5). Therefore, several SARA derivatives have been proposed, including functional SARA (fSARA), with 3 different versions according to Moulaire et al. (6), L’Italien et al. (7), and Aqneursa NDA review by FDA (5), and two modified SARA versions (i.e., mSARAs) with different coarsenings. All fSARAs and mSARAs employ a 5-category scale for the axial items, but with different scoring algorithms. fSARAs exclude the appendicular items, which are kept in mSARAs.
In this work, we aim to evaluate the overall and item-level performance and information of the 6 above-mentioned SARA versions using pharmacometric Item Response Theory (IRT) methodology (8) to inform future decisions on their use in clinical trials.

Methods: The dataset was obtained from the European SCA Type 3 (SCA3)/ Machado-Joseph Disease Initiative (ESMI) study, a longitudinal, prospective natural history study of SCA3, a highly trial-relevant SCA (9). SARA item-level data from 352 symptomatic SCA3 patients across 1168 visits were used in this analysis. Since natural history data with SARA derivatives are lacking, SARA scores were transformed to fSARA/mSARA scores using mapping algorithms (5-7).
For each SARA version, an IRT model with one underlying latent variable (LV), shared and reflected by all items, was developed. Graded-response models were used to describe the probability of obtaining a certain item score as a function of the LV and item-specific parameters (difficulty and discrimination). Items’ Fisher information was calculated and compared across SARA versions to assess their relative informativeness. Longitudinal IRT models were then built to describe SCA3 disease progression by modeling changes in the LV over time since disease onset. Based on the developed models, trial simulations were performed to compare the sample sizes required by each SARA version to detect treatment effects with 80% power with a significance level of 0.05. Placebo-controlled (1:1), parallel design, 2- and 1-year trials were simulated, including SCA3 patients with the most trial-relevant disease stage of mild to moderate ataxia (i.e., SARA ranging from 3 to 16), and a hypothetical disease-modifying treatment effect of 100%.

Results: SARA item characteristics indicated good performance, with high discrimination and ordered, well-designed response categories that cover the various ataxia levels in the cohort. A similar performance was observed in a published SARA IRT model from other rare ataxia cohorts (10). Compared with its derivatives, SARA was the most informative, with a total Fisher information of 13.95, followed by mSARAs (12.72-12.84) and then fSARAs (8.18-8.77).
Type I error was controlled for all analyses across all simulated trial scenarios. Fisher information results were corroborated by sample size calculations, which showed that the original SARA was the most powerful version. Modifications to SARA, including item removal and rescoring, reduced its statistical power. mSARAs showed a small increase in sample size compared to SARA (257-274 vs. 241 in 1-year trials, and 60-63 vs. 57 in 2-year trials), while fSARAs showed a more pronounced increase (by 59-86% in 1-year and 40-60% and 2-year trials, respectively). To disentangle the effect of the two changes in fSARAs (items rescoring and omission of appendicular items), an additional analysis was conducted using SARA but without its appendicular items (no rescoring). The results show comparable sample sizes between this experimental fSARA version and the other fSARAs (385 vs. 382-448 in 1-year trials, and 85 vs. 80-91 in 2-year trials), indicating a more substantial impact of items’ removal than of items rescoring.

Conclusions: IRT analysis of SARA in the SCA3 cohort aligns with and supports previous findings on SARA’s validity as an ataxia outcome. Modifications to SARA, particularly the removal of appendicular items, substantially reduce its informativeness and statistical power, even in clinical trials in mild-to-moderate ataxia.

References:
Acknowledgment: This work was supported by the European Union project: European Rare Disease Research Alliance (ERDERA, #101156595). Model computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) at UPPMAX, partially funded by the Swedish Research Council through grant agreement no. 2025/2-98.

1. Klockgether T, Mariotti C, Paulson HL. Spinocerebellar ataxia. Nat Rev Dis Primers. 2019 Apr 11;5(1):1–21. doi:10.1038/s41572-019-0074-3
2. Schmitz-Hübsch T, du Montcel ST, Baliko L, et al. Scale for the assessment and rating of ataxia: development of a new clinical scale. Neurology. 2006;66(11):1717–20. doi:10.1212/01.wnl.0000219042.60538.92
3. Klockgether T, Synofzik M, Alhusaini S, et al. Consensus Recommendations for Clinical Outcome Assessments and Registry Development in Ataxias: Ataxia Global Initiative (AGI) Working Group Expert Guidance. Cerebellum. 2023 Apr 5. doi:10.1007/s12311-023-01547-z
4. Maas RPPWM, Teerenstra S, Lima M, Pires P, et al. Differential Temporal Dynamics of Axial and Appendicular Ataxia in SCA3. Mov Disord. 2022;37(9):1850–60. doi:10.1002/mds.29135
5. Aqneursa NDA 219132 Integrated Review. U.S. Food and Drug Administration, Center for Drug Evaluation and Research; [cited 2025 Jun 5]. Available from: https://www.accessdata.fda.gov/drugsatfda_docs/nda/2024/219132Orig1s000IntegratedR.pdf
6. Moulaire P, Poulet PE, Petit E, et al. Temporal Dynamics of the Scale for the Assessment and Rating of Ataxia in Spinocerebellar Ataxias. Movement Disorders. 2023;38(1):35–44. doi:10.1002/mds.29255
7. L’Italien G, Popoff E, Rogula B, et al. Development and Validation of SCACOMS, a Composite Scale for Assessing Disease Progression and Treatment Effects in Spinocerebellar Ataxia. Cerebellum. 2024 Oct 1;23(5):2028–41. doi:10.1007/s12311-024-01697-8
8. Ueckert S. Modeling Composite Assessment Data Using Item Response Theory. CPT Pharmacometrics Syst Pharmacol. 2018;7(4):205–18. doi:10.1002/psp4.12280
9. ESMI Ataxia Study Protocols. [cited 2026 Jan 31]. ESMI Ataxia. Available from: https://ataxia-esmi.eu
10. Hamdan A, Hooker AC, Chen X, et al. Item performance of the scale for the assessment and rating of ataxia in rare and ultra-rare genetic ataxias. CPT: Pharmacometrics & Systems Pharmacology. 2024;13(8):1327–40. doi:10.1002/psp4.13162

Reference: PAGE 34 (2026) Abstr 12154 [www.page-meeting.org/?abstract=12154]

Poster: Drug/Disease Modelling - CNS