IV-103

COMPARISON OF METHODS TO DEMONSTRATE BIOEQUIVALENCE IN CASE OF HIGH VARIABILITY AND GROUP HETEROGENEITY

Annabelle Walz 1, Daniel Kaschek 1, Brandon Greene 2, Sabine Pestel 2

1 IntiQuan AG (Basel, Switzerland), 2 CSL Behring Innovation GmbH (Marburg, Germany)

Introduction/Objectives: Bioequivalence (BE) studies are conducted to evaluate the similarity in pharmacokinetic (PK) profiles between two drugs or formulations [1, 2]. The primary metrics used to assess BE are the area under the concentration-time profile (AUC) and the maximum concentration (Cmax), which can be derived from individual concentration-time profiles using non-compartmental analysis [3]. The recommended method for demonstrating BE in a parallel design study is the average BE approach, where BE is demonstrated if the two-sided 90% confidence interval (CI) of the geometric mean ratios of AUC and Cmax falls within predefined margins of 80-125% as recommended by the FDA. In some cases, high variability within a study may prevent the demonstration of BE. If part of the variability is explained by other factors such as sex, a linear model with drug/formulation and sex as predictors can account for such an effect, thus increasing the likelihood of showing BE. In that case, the new subgroups may also have different variability, denoted as heteroscedasticity, for which there are different options to deal with in the context of linear modelling.
Using simulations, we explored the performance of two methods implemented in R:
1. GLS: a restricted maximum likelihood approach implemented in nlme::gls [4] explicitly using the within-group heteroscedasticity structure.
2. SANDWICH: a sandwich estimator approach implemented in sandwich::vcovHC [5, 6] which is applied to the linear regression model to correct the CIs for heteroscedasticity.

Methods: We conducted a simulation-estimation study to assess the performance of both methods in terms of finding BE (based on AUC) when truly present under multiple conditions. We assumed a trial with two drugs (treatment) administered in a parallel design. The treatment groups had equal size and sex distributions. Our initial assumption was that treatment did not have an effect on AUC, whereas females would have a higher but less variable AUC than males. Subsequently, we evaluated the impact of our assumptions by varying variances and sample size distributions for combinations of sex and treatment (no, medium, or large differences between groups), and differing effect sizes for sex on the performance of the two methods. Briefly, we simulated AUC in multiple trials according to the above assumptions. Then, we evaluated each trial by fitting a linear model explaining AUC through an effect of sex and treatment, where unequal variances among the groups were accounted for either via the GLS or the SANDWICH method. The performance of the two methods was evaluated based on bias (the ability of the method to accurately estimate the true effect), precision (the spread of the estimate), coverage (the probability that the CI contains the true treatment effect), and power (the probability of finding BE when truly present).

Results:
Bias & precision: The simulations revealed that both methods correctly determined the mean drug effect as being zero on average (no bias), independent of the different tested assumptions. However, the estimate of the SANDWICH method tended to be more variable as indicated by larger standard deviations and was therefore less precise.
Coverage: The 90% CIs appropriately accounted for the actual variability of the drug effect estimator in each method, i.e., they were larger for the SANDWICH method and smaller for GLS. Accordingly, both methods showed the correct coverage of about 90% irrespective of the different tested assumptions.
Power: Both methods led to comparable high power for all tested sample size distributions and sex effect sizes. However, with increasing imbalance in variance between groups, GLS was able to estimate the drug effect more precisely than SANDWICH, corresponding to smaller 90% CIs and therefore higher probability to demonstrate BE.

Conclusions: Both methods, GLS and SANDWICH, showed a comparable performance for most of the tested assumptions. They performed equally well with regards to different sample size distributions, and impact of sex, or when equal variances were observed between male and female groups. However, GLS was superior to SANDWICH when variances between sex differed. In contrast to SANDWICH, which considers the underlying heteroscedasticity structure only when estimating the size of the 90% CI, GLS already uses this information when estimating the effect size parameters, thus resulting in a higher chance to demonstrate BE when present.

References:
[1] US Department of Health and Human Services. FDA Guidance for Industry, Statistical Approaches to Establishing Bioequivalence. http://www.fda.gov/cder/guidance/index.htm. 2001.
[2] US Food and Drug Administration. Bioavailability and Bioequivalence Studies Submitted in NDAs or INDs – General Considerations. https://www.fda.gov/media/121311/download. 2014.
[3] PhUSE CSS Development of Standard Scripts for Analysis and Programming Working Group. Analyses and Displays Associated to Non-Compartmental Pharmacokinetics – With a Focus on Clinical Trials (Version 1.0). 2014.
[4] Pinheiro J, Bates D, R Core Team (2025). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-167, https://CRAN.R-project.org/package=nlme.
[5] Zeileis A, Köll S, Graham N (2020). “Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R.” Journal of Statistical Software, 95(1), 1–36. doi:10.18637/jss.v095.i01.
[6] Zeileis A (2004). “Econometric Computing with HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software, 11(10), 1–17. doi:10.18637/jss.v011.i10.

Reference: PAGE 34 (2026) Abstr 11866 [www.page-meeting.org/?abstract=11866]

Poster: Methodology - Other topics