A bounded integer model for rating and composite scale data
Gustaf J. Wellhagen, Mats O. Karlsson
Department of Pharmaceutical Biosciences, Uppsala University
Objectives: Many clinical endpoints of importance for assessing the efficacy of therapy are obtained from rating or composite scales. Given the complexity of their nature, there is no fully satisfactory modelling strategy for such scale-based outcomes. Most commonly, these are treated as continuous variables (CV) but with the well-recognized problem that the underlying data are not continuous in nature and that data at the scale boundaries can not be well captured. An alternative modelling strategy is to treat data as ordered categorical (OC), but this approach has the drawback of requiring as many parameters, save one, as the number of categories in the scale already to describe the baseline characteristics. The aim of this work was to develop a new model for describing rating and composite scale data in a parsimonious way, while respecting the integer nature of the data.
Methods: For a scale with n categories the probit function, which is symmetric around 0, is used to divide the space under a standard normal distribution N(0,1) into n equally sized areas through n-1 cut-off values (Z_{1/n} to Z_{(n-1)/n}). To define the bounded integer (BI) model, the probability for each category is estimated from a distribution N(f(t), g(t)) where both are a function of fixed and random effects, time and covariates (f(Θ,η,t,X)) and (g(σ,η,t,X)).
This BI model was implemented on an 11-point Likert rating scale data for pain [1] and different composite scale data: Unified Parkinson’s Disease Rating Scale (UPDRS) motor subscale, Alzheimer’s Disease Assessment Scale-Cognitive (ADAS-Cog) and Schizophrenia Positive and Negative Syndrome Scale (PANSS). The results of the BI model analyses were compared to previously published or corresponding OC and/or CV models for these data. See Table 1 for additional details on composite scale data and models.
Additional explorations were performed based on UPDRS data: (i) using simulations from a previously developed item response theory (IRT) model for UPDRS [2] we investigated the relationship between the IRT model characteristics and corresponding BI and CV models, and (ii) while it is natural to set the number of categories for the BI model to the number of categories of the scale as this would allow, in principle, extrapolation to any possible score, a more restricted number of categories, limited by the observed range, could be hypothesized to provide a better fit to data.
Results: The final BI model for the Likert example had a better description of the data than previously published models for these data. The OFV of the final BI model was 47492 with 14 estimated parameters compared to treating the data as OC (48902; 18) [1]. A published CV model [3] for the same data used only 9 parameters. When BI model was reduced to 9 parameters implementing the same structural, covariate and variability components, it performed better (53135) than the published CV model (55080) respectively. The runtime was shorter for the BI model compared to both CV and OC models. For the Likert data, the published OC and CV models contained components for serial correlation (Markov or autoregressive) and so did the developed BI models. However, also without serial correlation, the BI model performed better than corresponding CV and OC models.
The results for the composite scale examples are summarized in Table 1. In all cases did the CV model have a shorter runtime compared to the BI model.
Table 1. A comparison between BI and CV models for composite scale data.
Disease |
Scale |
Categories |
Observed range |
#Patients |
#Obs |
#Parameters CV = BI |
?OFV CV-BI |
Reference |
Parkinsons’s disease |
UPDRS motor |
109 |
0-80 |
19 |
946 |
16 |
113 |
[4] |
Parkinsons’s disease |
UPDRS motor |
133* |
1-77 |
428 |
2720 |
14 |
73 |
[2] |
Alzheimer’s disease |
ADAS-Cog |
71 |
0-70 |
817 |
3594 |
11 |
730 |
[5] |
Schizophrenia |
PANSS |
181 |
30-176 |
1323 |
7728 |
17 |
145 |
[6] |
Schizophrenia |
PANSS |
181 |
30-167 |
1292 |
8520 |
15 |
17 |
[7] |
* The UPDRS scale was revised in 2007
Simulations from the IRT model for UPDRS predicted that the residual variability (σ) for the CV model and g(t) for the BI model, varied depending on the underlying disability. However, the predicted variability in the latter was considerably lower. Also, a linear change in the IRT disability mapped in most of the assessed range to a linear change in the BI model, but a rather nonlinear (sigmoid) change in the CV model.
BI models for UPDRS using the full scale range (0-133) or only the observed range (0-80), had the same number of parameters and similar goodness-of-fit (OFV) to data.
Conclusions: The bounded integer model provides a good description of rating and composite scale data, both in terms of fit and simulating real-life like data. It has consistently shown better fit than models treating the same data as either ordered categorical or a continuous variable. The BI model has advantages over OC models because it is parsimonious in number of estimated parameters and because it can be used to predict categories not present in the data (both interpolation and extrapolation). Additionally, the possibilities for parsimonious description of variability in observed in data appear to be more easily implemented in these probit-based, as opposed to logit-based, models.
While BI models do not use more parameters than CV models, they respect the integer nature of the data and the scale boundaries. A standard CV model will predict values outside the expected range due to variability or, if e.g. logit transformation or beta regression is used, will only predict the extremes scores asymptotically. Avoiding such model misspecification will provide a more robust basis for model inference and simulation. The only expense is a longer runtime of BI models, but for all the presented examples this was no more than about one hour on a single node.
The BI and the IRT models both respect the underlying non-continuous nature of data and the similarity of the two model types is not surprising given that both operate on a similar latent variable scale. Future explorations plan to further explore the nature of the relationship between these two models and the possibility to make joint analyses of total score and item level data.
References:
[1] Schindler & Karlsson. AAPS J. 2017 Sep;19(5):1424-1435
[2] Buatois et al. Pharm Res. 2017 Oct;34(10):2109-2118
[3] Plan et al. Clin Pharmacol Ther. 2012 May;91(5):820-8.
[4] Troconiz et al. Clin Pharmacol Ther. 1998 Jul;64(1):106-16
[5] Ito et al. Alzheimers Dement. 2011 Mar;7(2):151-60
[6] Friberg et al. Clin Pharmacol Ther. 2009 Jul;86(1):84-91
[7] Krekels et al. CPT Pharmacometrics Syst Pharmacol. 2017 Aug;6(8):543-551