Abraham Vaquero Castro1, Dr Enrico Grisan1, Dr Monica Simeoni2
1London South Bank University, 2GSK
Introduction: PK/PD data analysis is a cornerstone of drug development and efficacy and safety studies. However, individual-level PK/PD data is difficult to obtain. Ideally, the availability of individual patient data (IPD) would allow methods such as model-based meta-analysis (MBMA), to provide a better characterization of the relationships between covariates and PK/PD parameters. Differently from previous research aiming to integrate AD and IPD in a combined meta-analysis (S.Weber [1], L.Yang [2]), we exploit generative-AI to regenerate IPD from AD, with the ultimate aim of enabling an IPD MBMA. To test the methodology, we simulated a scenario where a generative model is trained using IPD from a single study and can then generate IPD data from the population statistics of a number of other simulated studies. Objectives: •Produce an AI/ML pipeline to successfully regenerate external IPD datasets based on a Local IPD dataset and published AD population characteristics. •Compare the ML-generated dataset with the IPD in a simulation experiment. Methods: The methodology can be classified as a simulation-regeneration approach, where individual data were simulated by a PK/PD model and regenerated with a machine learning algorithm. 1.PK/PD data simulation: IPD of drug effect on the Forced Expiratory Volume in 1 second (FEV1) were simulated using eq.1, based on a similar model structure as described in [3]: ?FEV1?_(i,j,k) (t)=B_(i,j,k) (COV,?_Bi,?_Bik )+?DP?_(i,jk) (t,B_(i,j,k),?_DPi,?_DPik )+ ?PBO?_ik (t,?_PBOi,?_PBOik )+E_(i,j,k,x) (t,x,B_(i,j,k),?_Eix,?_Eikx )+e_(i,j.k) (1) where FEV1 are time-varying values for the th patient in the h arm of the ith study. B is FEV1 value at baseline, COV are the subject covariates (Age, Height, Smoker, Severity), DP is Disease Progression, PBO is the placebo effect, and E is the effect associated to the h drug, depending on parameters 50 and MAX. and are parameter vectors and random effects of the equation terms appearing as subscripts. ei,j,k represents residual variability. The model was applied by simulating 8 different trials with related IPD, where data from Trial 0 are labelled as local (IPD are assumed to be available) and data from Trial 1-7 are labelled as external (assuming that only AD are available). WGANs synthetic data algorithm: The core computational framework employed a WGAN architecture, with a Long Short-Term Memory (LSTM)-based generator and a dense critic network. The generator is trained to capture temporal dependencies in the trial data, particularly between FEV(t), COV, ED50, and EMAX During the training, the critic evaluates the generated data to optimize the Wasserstein distance, guiding the generator towards producing realistic outputs. Once trained, a set of covariates are sampled from the AD of Trials 1-7 for each individual that we want to simulate, and used as input to the model that will generate the corresponding FEV1(t) profile. 3.Data comparison: The individual profiles of the trials labelled as external and generated by WGAN algorithm were fitted to the model described in eq. 1 . Additionally, for each subject (described by its set of covariates), the possible distribution of FEV1 (mean and standard deviation) was evaluated by repeatedly sampling the random effects. This allowed computing the Z-scores of the FEV1 estimated on the corresponding ML-generated profile of the subject, to check if its value fell within the distribution limits. Results: When comparing the ML-generated profiles with the external datasets simulated with the PK-PD model, we identified an average sum of squared estimate of errors of 0.331 (20.4%), with an overall error range of [0.2117,0.4799]. Across all ML-generated subjects and corresponding estimates of FEV1, Z-scores resulted to be in the range [0.3489,0.4050], proving that the ML-generation was able to create PD profiles, whose model fitting provided a FEV1 value consistent with the subject’s covariates. Conclusion: We showed that our method can successfully learn and apply the original relationships of the IPD study to regenerate information lost when only population-level statistics are reported. This scalable approach for generating synthetic pharmacological individual trial data can provide foundation for further predictive modelling while mitigating concerns associated with data privacy and scarcity. The test of the methodology should be extended to different data types.
[1] S. Weber, A. Gelman, D. Lee, M. Betancourt, A. Vehtari, and A. Racine-Poon, “Bayesian aggregation of average data: An application in drug development,” The Annals of Applied Statistics, vol. 12, no. 3, Sep. 2018, doi: 10.1214/17-AOAS1122. [2] L.Yang, C.Llanos-Paez, S.Yang, C.Ambery, A.Berges, M. Kjellsson, M. O. Karlsson.” A combined model meta-analysis of aggregated and individual FEV1 data from randomized COPD trials”. PAGE 32 (2024) Abstr 10848. [3] C. Llanos-Paez, C. Ambery, S. Yang, M. Beerahee, E. L. Plan, and M. O. Karlsson, “Joint longitudinal model-based meta-analysis of FEV1 and exacerbation rate in randomized COPD trials,” Journal of Pharmacokinetics and Pharmacodynamics, vol. 50, no. 4, pp. 297–314, Aug. 2023, doi: 10.1007/s10928-023-09853-z.
Reference: PAGE 33 (2025) Abstr 11728 [www.page-meeting.org/?abstract=11728]
Poster: Methodology - New Modelling Approaches