Renwei Zhang 1, Tingjie Guo 1, Li Qin 2, Rik Greef 2, Matthew Zierhut 2, Coen Hasselt 1, Laura Zwep 1
1 Systems Pharmacology and Pharmacy, Leiden Academic Centre for Drug Research, Leiden University (Leiden, The Netherlands), 2 Certara Inc. (Princeton, USA)
Objectives
Model-based meta-analysis (MBMA) integrates data across clinical trials to estimate treatment effects. Including summary-level patient characteristics (covariates) can explain between-study variability (BSV) and reduce bias in treatment effect estimates [1]. However, missing covariates are common in MBMA. Missing at random (MAR) is the most frequent mechanism, as not all studies collect the same patient characteristics [2]. When covariates contribute substantially to BSV, poor handling of missingness can bias both covariate and treatment effect estimates [3].
Common imputation strategies for individual-level data, i.e., complete case analysis (CCA), single imputation (SI), multiple imputation by chained equations (MICE), and random forest (RF), have been studied extensively [4], but their performance for aggregate (summary-level) covariates in MBMA is poorly understood. The current study evaluated these four methods for handling missing aggregate prognostic covariates in MBMA. Specifically, we (1) characterized MAR mechanisms in a large clinical trial database of non-small cell lung cancer (NSCLC) trials, and (2) used these findings to design a simulation study comparing the performance of CCA, SI, MICE, and RF on prognostic covariates themselves and corresponding treatment effect estimates.
Methods
Data: Aggregate baseline patient characteristics and progression-free survival (PFS) outcomes were extracted from published NSCLC trials in the CODEx database (Certara, Princeton, NJ) [5].
MAR mechanism: We analyzed 807 trials (phase 2 and 3; randomized controlled and single-arm) to characterize missingness. Logistic regression was used to identify associations between the probability of a covariate being missing and the observed values of other covariates. Covariates examined included age, sex, race, smoking status, performance status (PS), tumor stage, histology, and therapy line.
Simulation study: A complete subset of 125 trials (9 unique treatments) served as the reference dataset. Missingness was introduced under the identified MAR mechanism at a 20% rate, and we applied all four imputation methods. Performance was assessed in two ways: (1) accuracy of imputed covariate values, measured by comparing imputed versus true distributions (mean relative bias and RMSE); and (2) accuracy of treatment effect estimates, measured by bias in hazard ratios (HR) relative to a model fit on the complete dataset. HRs compared each treatment against a carboplatin/paclitaxel reference arm, using absolute PFS time-course models with prognostic covariates.
Results
MAR mechanism: Analysis of the full trial dataset revealed that earlier study year, higher proportion of males, worse PS (PS2), earlier tumor stage (stage 3), and squamous histology were associated with higher missingness in race and smoking status. Based on these associations, we selected males, PS0, stage 3, and squamous histology to define the MAR simulation.
Simulation study: In the MBMA outcome model, higher proportions of PS0, earlier tumor stage, and first-line therapy were associated with longer PFS; these three covariates were therefore selected for missingness simulation. For covariate recovery, MICE and RF better preserved the overall distribution as indicated by mean relative bias (SI: 6.0%, MICE: 1.8%, RF: 1.9%, median across all treatments). However, MICE showed higher imputation error at the individual level as indicated by RMSE (SI: 0.31, MICE: 0.36, RF: 0.22), where RF performed best. For treatment effect estimation in MBMA, RF consistently outperformed the other methods, yielding the lowest bias in HR estimates (CCA: 14%, SI: 7.4%, MICE: 8.5%, RF: 3.0%) and the highest coverage rate (CCA: 84%, SI: 93%, MICE: 92%, RF: 94%).
Conclusions
This work provides a structured framework for characterizing MAR mechanisms in MBMA and evaluating imputation strategies for aggregate prognostic covariates. RF was the most robust method, achieving the lowest bias in both covariate recovery and treatment effect estimation. These results support the use of RF imputation over alternative imputation methods when aggregate prognostic covariates are incomplete in the context of MBMA.
References:
[1] Schauer JM et al. Alcohol. (2022); 57(1):35-46
[2] Pigott TD. The Handbook of Research Synthesis and Meta-analysis (3rd ed). (2019); (pp. 367–382)
[3] Pigott TD et al. Rev Educ Res. (2020); 90(1):24-46
[4] Lee J. Org. Res. Syn. Methods (2022); 14(1):117-136
[5] https://codex.certara.com/
Reference: PAGE 34 (2026) Abstr 11957 [www.page-meeting.org/?abstract=11957]
Poster: Methodology - Covariate/Variability Models