AI-augmented QSP simulation of hormonal and endometrial dynamics of IVF/FET cycles to inform external clinical prediction models for miscarriage risk assessment in women of advanced maternal age
Wen Yao Mak1,2, Aole Zheng1, Muyesair Alifu1, Fanyu Zhao3, Yue Wei4,5, Yunrong Shen6, Xiao Zhu1, Qingfeng He1, Xiaoyan Mao7, Xiaoqiang Xiang1
1Department of Clinical Pharmacy and Pharmacy Administration, School of Pharmacy, Fudan University, 2Clinical Research Centre, Penang General Hospital, Institute for Clinical Research, National Institute of Health, 3School of Computer Science and Technology, Fudan University, 4Laboratory of Data Discovery for Health (D24H), 5Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 6Center for Health Outcomes and PharmacoEconomic Research, University of Arizona, 7Department of Assisted Reproduction, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine
Background Despite advances in assisted reproductive technologies (ART), in vitro fertilization/frozen embryo transfer (IVF/FET) remained a challenge for women of advanced maternal age (AMA)[1,2]. Clinicians often rely on generalized protocols, as older women are underrepresented in research, hindering personalized approaches[3]. Additionally, Asian women show poorer IVF/FET outcomes than Caucasians, including lower fertilization rate and clinical pregnancy rate, despite receiving similar treatment protocols[4-6]. While AI-driven models are increasingly used to identify potential predictors[7], few attempts to integrate mechanistic models to account for underlying physiological differences. The lack of a strong predictive biomarkers of live birth further complicates success prediction[8]. These disparities highlight the urgent need for ethnicity-specific fertility research and personalized treatment approaches to optimize reproductive outcomes after ART. Given the complexity of fertility treatment, AI-augmented QSP models that integrate mechanistic insights with data-driven feature selection provide a robust approach to optimize IVF/FET treatment for women of AMA. This approach allows systematic screening of large-scale datasets for subtle relationships between covariates, while the mechanistic components offer insights into complex relationship between unique patient physiology, treatment, and outcome. Objectives We aimed to characterize the impact of IVF/FET treatments on the probability of miscarriage in Chinese infertile women through three distinct phases: (1) To train machine learning (ML) models on screening important features of reproductive ageing (2) To develop a semi-mechanistic model of menstrual cycle that integrates these features, and characterize hormonal and endometrium response to IVF/FET treatment (3) Then, to investigate the impact of two treatment protocols (late stimulate vs natural cycle) on affecting miscarriage risk in Chinese women of AMA. Methods Clinical features identification We trained three machine learning (ML) models (random forest (RF), XGBoost, and multilayer perceptron) on a large internal dataset (115,091 IVF/FET cycles) to determine the most influential features affecting live birth and assessed their relevance for inclusion in a QSP model. Model hyperparameters were optimized using GridSearch cross-validation. Model performance was evaluated based on the AUC-ROC and accuracy on the test set. Based on the best performing model, feature importance was assessed using SHAP values and Gini importance. The marginal effect of selected features on the predicted outcome was assessed using partial dependence plot (PDP). QSP model Concurrently, a comprehensive menstrual cycle model to describe the dynamics of follicle (growth rate), hormonal concentration, and endometrium thickness were developed based on two published models: menstrual cycle model from Fischer et al (2021)[9] as the structural backbone, and uterus model from Arbelaez-Gomez et al (2022)[10] for endometrium dynamics. The Fischer model was partially calibrated based on data derived from the same population in this study. Temporal estradiol (E2) and progesterone (P4) concentration generated from the first model served as the input to the second model to drive endometrial growth over time. Bayesian Optimization (BO) was used for parameter estimation of the final joint model. Subsequently, Approximate Bayesian Computation (ABC) was performed to evaluate uncertainty of parameter estimates. Prior distribution of parameters was uniformly sampled from a ±10% limit from the best-fit estimates produced in the BO step. The mean absolute error (MAE) between simulated and observed endometrium thickness was calculated. Important clinical feature(s) identified by ML models was then incorporated into the final QSP model either mechanistically or statistically. Model calibration through BO and ABC was repeated. Development and utilization of a miscarriage risk prediction model A parametric time-to-event model of miscarriage was previously developed and validated. Hormone concentration and endometrium thickness of a typical infertile Chinese women undergoing either late stimulation or natural cycle was simulated, and used to estimate the probability of miscarriage using the TTE model. Results: Across all ML models, maternal age consistently emerged as the strongest negative predictor of live birth. The RF model had the highest ROC-AUC (0.692) and high sensitivity for negative cases (precision 0.67, recall 0.84, F1-score 0.75). Gini importance score also indicated age as the most important feature (Gini score 0.036). Other influential features included endometrium thickness (Gini 0.024) and baseline FSH (Gini 0.020). Calibration of the QSP model through BO identified a set of best-fit parameters that minimized the objective function value (OFV) from 10,863 to 1,844. During ABC, the model was simulated over 1,000 parameter sets and the resultant acceptance rate was 50.2% (acceptance threshold at MAE=3mm). Sensitivity analysis suggested that parameters associated with E2-mediated growth of proliferative (kpro) and secretory (ksec) endometrial cell growth had less variation and more reliably estimated by the data. Given the strong predictive influence of maternal age, it was incorporated as a covariate into the QSP model, influencing the synthesis and release of E2 over the menstrual cycle. An exponential decay function (ageFactor), with a lower age threshold where no change was observed, was implemented on the parameter that affected E2 synthesis. Subsequent BO analysis was completed successfully and identified an optimized ageFactor (0.150, from the original 0.25), and the lower age threshold value was 41.9 years. From the AI-augmented QSP model, a typical hormonal and endometrial profile of Chinese infertile women was simulated (Age: 45 years; endometrium thickness: 6.57mm after late stimulation, and 10.8mm after natural cycle; E2 on transplanted day, 181 ng/mL and 524ng/mL respectively). The virtual patient profile was used to estimate the probability of miscarriage via the TTE model. The resultant probability of miscarriage in Chinese women of AMA by the end of first trimester was 37.5% and 47.5% respectively, which corroborated with published Singaporean data of 55.3%[11]. Conclusion: The study presented a framework that integrated AI-driven feature selection with a QSP model to predict hormonal and endometrial changes in Chinese infertile woman undergoing IVF/FET. The AI-augmented model indicated maternal age as a key predictor that influence E2 levels and endometrium thickness. Simulation with a clinically validated TTE model suggested that late stimulation may offer an advantage over natural cycle on sustaining pregnancy until first trimester.