II-77 Mohamed Gewily

Model Averaging Across Parametric and Non-Parametric Methods as an Aid for Clinical Decision Making

Mohamed Gewily, Gustaf J. Wellhagen, Mats O. Karlsson

Department of Pharmacy, Uppsala University, Uppsala, Sweden

Objectives: Randomized clinical trials (RCTs) are the main tool to generate patient-level data. The main analysis method of such data is the comparison of end-of-treatment response between study arms, where Mixed Models for Repeated Measures (MMRM) are used due to their ability to handle potential patient dropout (1). MMRM are typically predefined, not data-driven, and fully flexible regarding the time course, with separate estimates for each arm and time point. Their variance structures can range from fully saturated to parameter sparse, but do recognize the individual level correlations over time.

Pharmacometric nonlinear mixed effects models (NLMEM), typically sequentially built, are used for similar purposes. Although NLMEM can markedly increase the power in efficacy analyses (2), they are seldom used for primary analyses due to: (i) data based model building type 1 error inflation and (ii) risk of misspecifications. Model averaging (MA) is a method that allows several proposed models, to avoid these issues. It has been used to average across different NLMEM but to our knowledge not including MMRM.

In this study we compare standard NLMEM and MMRM strategies to a new technique where we (i) use model averaging, (ii) jointly include both NLMEM and MMRM in the scope, and (iii) focus on accuracy of the end-of-study treatment effect as the primary endpoint.

Methods: A published item response theory (IRT) model (3) was used to simulate MDS-UPDRS motor data for two-armed RCT (placebo and active treatment) in patients with Parkinson’s disease. The primary endpoint was the mean difference between arms at end-of-treatment. Twelve scenarios were simulated by varying baseline disease status (mild or severe), progression rate (slow or fast) and drug effect mechanisms (symptomatic, disease modifying and combined). Trial duration was 42 months with a default 4 observations per patient (n=25-100 depending on scenario) and a Weibull dropout model. Further simulation details in (4).

The simulated total score data were analyzed with: (i) NLMEM, IRT-informed model (5) with either the disease mechanism used for simulation (NLMEM-OK), or misspecified mechanisms (NLMEM-MIS), and (ii) MMRM with 1st-order autoregressive (MMRM-AR1), unconstrained (MMRM-UN), homogeneous Toeplitz (MMRM-HO) and heterogeneous Toeplitz (MMRM-HT) residual correlation model. Note that none of the NLMEM and MMRM were the same as the IRT simulation model, which simulated on a discrete and bounded scale.

For all fitted models, treatment effect (i.e. mean difference in total score between arms at 42 months) was calculated through simulations. Model averaging was performed with weights based on AIC (MA-AIC) or BIC (MA-BIC). BIC adapted for mixed effects models was used (3).

Results: MMRM-UN had the highest likelihood of all MMRM, while often MMRM-HT had a lower AIC. All of the constrained MMRM had lower BIC than MMRM-UN for some scenarios. NLMEM-OK always had higher likelihood than MMRM-UN, indicating that despite its flexibility, MMRM-UN did not adequately describe some aspect(s) of the data. Further exploration with even more flexible MMRM indicated that both scedasticity and distributional shape of the residuals could be improved, resulting in higher likelihood.

When NLMEM-OK was part of the model averaging scope, it dominated the weighted average predicted effect, if excluded it was either NLMEM-MIS, MMRM-UN or MMRM-HT that dominated the weighted average.

The RMSE average (min-max) across all 12 scenarios were: NLMEM-OK 0.169 (0.091- 0.296), MMRM-UN 0.299 (0.205- 0.466), MA-AIC 0.164 (0.084-0.296), MA-AIC (excluding NLMEM-OK from the scope) 0.366 (0.218- 0.807). MA-BIC 0.164 (0.091- 0.296), MA-BIC (excluding NLMEM-OK from the scope) 0.359 (0.205- 0.821).

Conclusions: Model averaging including both NLMEM and MMRM has the possibility to provide the best of both worlds for analysis of RCTs: high precision in estimated drug effects through NLMEM when such model(s) is an adequate representation of data, and robustness through MMRM(s) when NLMEM is not adequate. The results presented are promising, but further improvements of the strategy may come from: (i) tailoring the model averaging strategy to the joint inclusion of MMRM and NLMEM, models with a very wide difference in number of parameters, (ii) explore other MMRM, here only the most common alternatives were considered, (iii) include also some NLMEM that have a higher flexibility than the very parsimonious models used here.

References:
[1] Mallinckrodt CH, Lane PW, Schnell D, Peng Y, Mancuso JP. Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials. Drug Inf J DIJ Drug Inf Assoc. 1 juill 2008;42(4):303-19.
[2] Karlsson KE, Vong C, Bergstrand M, Jonsson EN, Karlsson MO. Comparisons of Analysis Methods for Proof-of-Concept Trials. CPT Pharmacomet Syst Pharmacol. janv 2013;2(1):e23.
[3] Buatois S, Retout S, Frey N, Ueckert S. Item Response Theory as an Efficient Tool to Describe a Heterogeneous Clinical Rating Scale in De Novo Idiopathic Parkinson’s Disease Patients. Pharm Res. oct 2017;34(10):2109-18. 
[4] Wellhagen GJ, Karlsson MO, Kjellsson MC. Comparison of Precision and Accuracy of Five Methods to Analyse Total Score Data. AAPS J. janv 2021;23(1):9.
[5] Wellhagen GJ, Ueckert S, Kjellsson MC, Karlsson MO. An Item Response Theory–Informed Strategy to Model Total Score Data from Composite Scales. AAPS J. 16 mars 2021;23(3):45. 

Reference: PAGE 29 (2021) Abstr 9822 [www.page-meeting.org/?abstract=9822]

Poster: Methodology - New Modelling Approaches

PDF poster / presentation (click to open)