**Comparison of item response theory and classical test theory for power/sample size for questionnaire data with various degrees of variability in items' discrimination parameters**

Emilie Schindler, Lena E. Friberg, Mats O. Karlsson

Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden

**Objectives**: Patient-reported outcomes, usually assessed using questionnaires, are increasingly collected during clinical trials to evaluate variables not directly quantifiable such as fatigue, health-related quality of life or pain. Due to their multi-scale nature, their analysis is challenging and item response theory (IRT) in a non-linear mixed effect modeling framework [1] offers an alternative to classical test theory using total score (TS). The aim of this analysis was to compare IRT vs TS approaches for power/sample size calculation based on longitudinal questionnaire data for different magnitudes of variability between the items’ discrimination parameter.

**Methods**: An IRT model was used to simulate item-level data for a 7-item questionnaire in a parallel-group trial of one placebo and one active dose arm with 1000 patients/arm and 6 occasions per patient. Each item had scores ranging from 0 to 4, the probability of each score being described by a proportional odds model. Discrimination and difficulty parameters used for simulations were obtained from IRT modelling of physical subscale of baseline Functional Assessment of Cancer Therapy-Breast (FACT-B) in metastatic breast cancer patients [2]. Four scenarios were simulated with 0%, 50%, 100% and 200% of original variability in discrimination parameters. The latent variable D_{i} was assumed to vary over time according to the following equation: *D _{i}(t)=D_{i,0}+(θ_{1}*x_{grp}+*

*η*, where D

_{2})*Time_{i,0}=D

_{i}(0) is a standard normally distributed random variable, x

_{grp}=0 in the placebo group and x

_{grp}=1 in the treatment group. Total scores for TS analysis were calculated as the sum of simulated item responses. Monte-Carlo Mapped Power method [3] implemented in PsN software was used for power calculation.

**Results**: For all four scenarios, IRT approach resulted in smaller sample sizes to achieve 80% power to detect a drug effect compared to TS approach (18%, 20%, 26% and 40% fewer patients for 0%, 50%, 100% and 200% of original variability in discriminatory power, respectively). IRT was less sensitive to variability in discrimination parameters than TS.

**Conclusions**: The value of IRT modelling over TS approach may increase as variability in discrimination parameters across items increases.

**References:**

[1] Ueckert S. et al. Improved utilization of ADAS-cog assessment data through item response theory based pharmacometric modeling. Pharm Res, 2014; 31(8): p. 2152-65.

[2] Welslau M. et al. Patient-reported outcomes from EMILIA, a randomized phase 3 study of trastuzumab emtansine (T-DM1) versus capecitabine and lapatinib in human epidermal growth factor receptor 2-positive locally advanced or metastatic breast cancer. Cancer, 2014; 120(5):642-51.

[3] Vong C. et al. Rapid sample size calculations for a defined likelihood ratio test-based power in mixed effects models. AAPS J, 2012; 14(2):176-86.