In silico assessment of adaptive trial design for TB regimen development
Vincent Chang, Patrick PJ Phillips, Marjorie Imperial, Rada M Savic
University of California San Francisco
Current tuberculosis (TB) regimen development is in dire need of rapid innovation; the current 6-month standard of care has not changed in over 40 years while drug development strategies are slow (Phase II and III take >8.5 years) and unsuccessful. In 2020, major TB regimen developers have announced two global partnerships (US- and EU-based) in which members are committed to leverage their compounds, offering a truly unique opportunity where numerous regimens can and will be evaluated in parallel. For that sake, platform adaptive clinical trial designs have gained significant interest, which permit shorter trials, seamless transitions between phases, and more efficient regimen selection from a larger pool of candidates.1
We identified four distinct adaptive trial designs for late stage regimen development: A Phase IIc and B seamless Phase II/III multi-arm group sequential; and C Phase IIc and D seamless Phase II/III Bayesian response adaptive randomization. Phase IIc2 extends the Phase IIb design6, which evaluates regimens only by early treatment biomarkers, enabling better informed decisions moving into Phase III by evaluating long-term clinical outcomes. Seamless1 Phase II/III is designed to evaluate regimens by early treatment biomarkers then move seamlessly into Phase III where clinical outcomes are evaluated with sufficient sample size and statistical power. Multi-arm group sequential2,5 is an adaptive trial design that saves resources and time by stopping poorly performing arms early at a prescribed interim analysis. Bayesian response adaptive randomization3 maximizes the value of the data collected throughout the trial by adjusting randomization probabilities in favor of better performing arms, maximizing sample size for the most promising regimens.
The present work aims to (1) evaluate the proposed trial designs using clinical trial simulations, (2) determine optimal adaptive trial design parameters, and (3) provide recommendations on how each design may be applied to achieve the greatest impact.
Clinical trial simulation tools were built in R using previously developed and integrated parametric survival models4 predicting time to culture conversion (TCC) and time to relapse from patient baseline and on-treatment (biomarker) characteristics. Patient phenotypes were sampled from an internal database (N=3411). The 9 intervention arms comprised 3 simulated regimens (Desirable, Borderline, Poor) at 3 durations (8, 12, 16 weeks), and were used to explore trial parameters and test designs for ability to distinguish between desirable and poor regimens. The three regimens were tuned with TCC and relapse hazard ratios (HR) on the aforementioned models: the desirable regimen was a 3 month treatment (TCC HR = 3.4, relapse HR = 0.6), the borderline regimen was a 4 month treatment (TCC HR = 1.9, relapse HR = 0.7), and the poor regimen fails to meet treatment shortening targets (TCC HR = 1.2, relapse HR = 0.85). Recruitment was fixed at 10 patients per week.
Group sequential design parameters included number of interims, interim timing, and interim criteria. Interim criteria was either a TCC HR or % relapse threshold that each arm must exceed to continue to the end of trial. These parameters were explored and optimized for maximizing the probability of poor regimen arms stopped and minimizing the probability of desirable regimen arms stopped.
Bayesian adaptive randomization was dependent on the probability that TS-39 (week 39 treatment success) in arm k is better than the control arm. Each arm’s TS-39 was estimated by P(TS-39(k)) = ψ1(k)ɸ(k) + ψ0(k)(1 - ɸ(k)).3 ɸ(k) represents the probability of TS-8 in the kth arm, or culture conversion by week 8, and ψi(k) represents the conditional probability of TS-39 given a positive or negative TS-8. The prior probability for TS-8 is a uniform distribution, and for TS-39, given a positive and negative TS-8 outcome beta distributions of (1, 10) and (10, 1) are used respectively; model generated individual patient TCC and relapse times are converted into binary TS-8 and TS-39, then used to update priors as data accumulates. In other words, as evidence accumulates of arm k performing better than control, the randomization probability to arm k will be adjusted higher. Design parameters included three randomization tuning parameters a, b, and c which were explored and optimized for maximizing the aggressiveness of the adaptive randomization and for equal allocation into control and best arms.
An optimized A design has one interim at 50 patients per arm with a TCC HR threshold of 1.7, where the poor and desirable regimens have, respectively, a 92% and a 0.3% chance of being stopped. An optimized B design has two interims, the first being the same as A and the second occurring at 200 patients per arm with a relapse threshold of 9%. The second interim, which evaluates regimens by relapse rate, confers two advantages: the criteria is (1) relevant to the primary clinical endpoint, and (2) better at distinguishing between regimens of different durations (early treatment biomarkers are less sensitive to different durations). In the B design: poor regimens have a 100% chance of being stopped by the second interim, borderline regimens at 8, 12, and 16 week durations have a 85%, 71%, and 54% chance respectively, and desirable regimens a 38%, 17%, and 4% chance respectively.
An optimal C design has parameters a = 0.8, b = 0.45, c = 0.1, which allocates 4.6 times more patients into the best vs. the worst regimen (166 vs. 36 patients) while balancing patients in control and best arms (ratio = 0.94). This reliably selects well performing arms and, as enrollment continues, seamlessly becomes a D design as the randomization probability of poorly performing arms approaches 0.
Assuming a 10 arm classical Phase IIc requires 1000 patients and 45 months and a 4 arm classical Phase III requires an additional 1600 patients and 60 months of study time, on average the A design saves 170 patients and 4.5 months and the B design saves 224 patients and 25.8 months. While the C/D design does not significantly reduce enrollment or study duration, the increased sample size to the best performing arms results in observing higher relapse events compared to the group sequential design (on average 13 vs. 7 events) and thus higher confidence in estimation of clinical outcome and optimal treatment duration. Choice of different design approaches will depend on trial objectives. Only minimal bias (<4%) was observed in relapse rate estimation across all regimens and designs except for poor regimens in the Bayesian design; due to low sample size (N = 36), the relapse rate was approximately 17% underestimated. We have developed integrated tools and clinical trial simulation tools that can and will be used by the above-mentioned consortia for implementation of adaptive clinical trial platforms for TB regimen development.
 Phillips, Patrick PJ, et al. "Innovative trial designs are practical solutions for improving the treatment of tuberculosis." Journal of infectious diseases 205.suppl_2 (2012): S250-S257.
 Phillips, Patrick PJ, et al. "A new trial design to accelerate tuberculosis drug development: the Phase IIC Selection Trial with Extended Post-treatment follow-up (STEP)." BMC medicine 14.1 (2016): 51.
 Cellamare, Matteo, et al. "A Bayesian response-adaptive trial in tuberculosis: the endTB trial." Clinical Trials 14.1 (2017): 17-28.
 Imperial, Marjorie Z., et al. "A patient-level pooled analysis of treatment-shortening regimens for drug-susceptible pulmonary tuberculosis." Nature medicine 24.11 (2018): 1708-1715.
 Boeree, Martin J., et al. "High-dose rifampicin, moxifloxacin, and SQ109 for treating tuberculosis: a multi-arm, multi-stage randomised controlled trial." The Lancet Infectious Diseases17.1 (2017): 39-49.
 Davies, Geraint, et al. "Accelerating the transition of new tuberculosis drug combinations from Phase II to Phase III trials: New technologies and innovative designs." PLoS medicine 16.7 (2019).