Igor Vasyutin 1, Sabine Stübler 1, Angèle Fleury 1, Corinna Schoelch 1, Anaïs Glatard 2, Jan-Georg Wojtyniak 2
1 Boehringer Ingelheim Pharma GmbH & Co KG (Ingelheim, Germany), 2 Pharmetheus AB (Uppsala, Sweden)
Objectives:
Early and reliable go/no go decisions are essential for accelerating drug development in metabolic dysfunction–associated steatohepatitis (MASH). Surrogate biomarkers such as propeptide of type III collagen (PRO C3) and enhanced liver fibrosis (ELF) score have the potential to support mechanistic and quantitative decision-making [1], but their ability to consistently reflect treatment effects relative to clinical endpoints must be rigorously evaluated. In this study, we assessed the Decision Equivalence Metric (DEM)—a simulation based quantitative framework—as a practical tool for characterizing surrogate performance [2]. Using pharmacokinetic(PK)/pharmacodynamic (PD) models for PRO C3 and ELF and an exposure–response (ER) model for improvement in liver histological findings defined as (i) decrease of at least two points in nonalcoholic fatty liver disease activity score (NAS), with at least one point decrease in NAS sub-score for lobular inflammation or ballooning and (ii) no worsening of fibrosis, defined as an absence of any increase in the fibrosis stage, we evaluated how DEM captures decision concordance across a range of sample sizes and compared DEM based findings with those obtained using the Prentice criteria [3].
Methods:
Our analysis used PRO C3, ELF, and histological improvement data from the Phase II survodutide trial conducted in patients with MASH. . Following the DEM framework proposed by the SxP/ISoP group, we (i) established PK/PD models for PRO C3 and ELF, and ER model for histological improvement; (ii) simulated 1500 parallel virtual trials under predefined uncertainty structures (between-subject variability and parameter uncertainty); (iii) performed independent hypothesis testing for surrogate and clinical endpoints; and (iv) computed the DEM, defined as the proportion of simulated trials in which surrogate based and clinical endpoint–based decisions matched. DEM was evaluated across a range of sample sizes to assess its sensitivity to trial scale and variability. In parallel, we applied the classical Prentice criteria to evaluate surrogacy and computed the Freedman Proportion Explained (PE) for PRO C3 and ELF. PK/PD and ER models were developed using NONMEM version 7.5, R 4.4.1 was used for forward simulations and data analysis.
Results:
Both PRO C3 and ELF met three of the four Prentice criteria, consistent with the typical difficulty of achieving full compliance with all criteria. Freedman PE values were 23.75% and 13.11% for PRO-C3 and ELF, respectively. DEM analyses showed a monotonic increase in decision agreement with increasing sample size. At small sample sizes (8–20 participants per arm), DEM values were lower due to greater variability in both biomarker and clinical endpoints, with more pronounced variability for PRO C3 and early ELF changes. Beginning at approximately 30 participants per arm, DEM exceeded 90% for both biomarkers, indicating strong agreement between surrogate based and histology based conclusions. Importantly, power trajectories for PRO C3, ELF, and histological improvement closely mirrored DEM behavior.
Conclusions:
Although neither PRO C3 nor ELF met all four Prentice criteria, partial fulfillment is common and consistent with the stringent nature of classical surrogacy definitions. DEM provided a quantitative perspective on decision alignment and indicated that both biomarkers can produce >90% agreement with histological decisions at sample sizes of ~30 participants per arm. However, the strong dependence of DEM on endpoint power represents a key limitation, as it may confound true surrogate performance with study precision. To address this issue, joint modeling strategies and power adjusted DEM formulations are recommended to better characterize surrogate reliability and support model informed drug development in MASH.
References:
[1] Sanyal A, Shankar S, Yates K, et al. The Nimble Stage 1 Study Validates Diagnostic Circulating Biomarkers for Nonalcoholic Steatohepatitis. Res Sq [Preprint]. 2023 Jan 19:rs.3.rs-2492725. doi: 10.21203/rs.3.rs-2492725/v1. Update in: Nat Med. 2023 Oct;29(10):2656-2664. doi: 10.1038/s41591-023-02539-6
[2] https://sxpsig.github.io/events/pastevents/posts/webinar-12Sep2023.html
[3] Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med. 1989 Apr;8(4):431-40. doi: 10.1002/sim.4780080407
Reference: PAGE 34 (2026) Abstr 12058 [www.page-meeting.org/?abstract=12058]
Poster: Methodology - Other topics