Konstantina Koliopoulou1, Aristides Dokoumetzidis1
1Department of Pharmacy, National and Kapodistrian University of Athens
Introduction: This study focuses on developing a predictive framework that leverages prior knowledge to predict and optimize key quality attributes of multi-unit particulate systems (MUPS). By combining experimental results and Machine Learning (ML) predictions across a range of pharmaceutical pellets, our approach provides a thorough, data-informed perspective on MUPS development, making it useful for relevant applications, such as formulation development, process optimization, and quality control in manufacturing [1]. Although Design of Experiments (DoE) has been widely used in pharmaceutical development [2,3], existing approaches often struggle to generalize to untested formulation combinations, due to their inability to capture formulation-specific deviations, limiting their predictive power in scenarios with sparse data. To address this, residual error modeling is incorporated to account for unexplained variability [4]. The proposed framework bridges the gap by exploiting DoE for structured data collection, ML for baseline predictions, and residual error meta-modeling for formulation-specific adaptation, thus creating a more flexible predictive tool that works even with limited experimental data. This approach is aligned with the principles of model-informed-drug development (MIDD), supporting informed decision-making, while significantly reducing the dependence on extensive, time-consuming experimentation [5]. Furthermore, this framework could constitute a streamlined approach for developing highly tailored drug delivery platforms, such as MUPS, which allow for precisely engineered modified drug release formulations and enable patient-centered dosing [6,7]. Objectives: The primary research question is centered on the feasibility, performance, and potential benefits of integrating formulation-specific ML models into a unified predictive framework. Importantly, one of the key aspects of the suggested residual error additive meta-modeling approach is that it can capture formulation-specific behavior of unseen data, thus improving prediction accuracy and helping overcome data scarcity challenges. Methods: An experimental dataset, obtained through DoE was utilized to develop supervised ML models. Homogenous-matrix type pharmaceutical pellets, containing ropinirole were manufactured through direct pelletization with the input variables being the concentration of carnauba wax (0%, 50%, and 100%) and the process parameters: spray rate, quantity of binding material, powder feeding rate, and quantity of powder feeder. The response variables assessed included the particle size, in terms of Geometric Mean Diameter (GMD) and Geometric Standard Deviation (GSD), the pellet morphology in terms of sphericity index (eR), and the batch yield (%) [8]. ML models were developed using H2O AutoML [9] within R and Rstudio (Version 2024.12.0+467, Posit) [10]. Three modeling approaches were evaluated: [a] Four separate ML models, each one of them trained for a specific response variable across all carnauba wax levels, [b] A stacked ML model strategy integrating the predictions of three formulation-specific ML base models, each one of them trained for a specific response, to improve generalizability, [c] Residual error modeling to capture concentration-specific deviations for each response by incorporating base model predictions and the interaction term between the carnauba wax concentration and those predictions. Validation methods included K-fold cross-validation (n=5, n=10) and leave-one-out-cross validation (LOOCV) [11]. The performance of the ML models was evaluated using common regression metrics, such as the Coefficient of Determination (R²), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) [11]. The generalization of the residual error meta-modeling approach was further challenged by a sparse dataset, including a limited subset of the unseen experimental combinations at 75% carnauba wax and a simulation reflecting real-world conditions, where only a limited amount of experimental data tend to be available during initial formulation development phases. To overcome this, a data augmentation method was used to add realistic noise, based on natural process variability, to generate additional synthetic data [12]. Results: Various supervised ML algorithms were tested for this study, with multi-layer artificial neural networks (ANNs) [11] emerging as the leading algorithm with regard to performance. The four distinct ML models optimized key outputs (particle size, pellet morphology, and batch yield) across all carnauba wax levels but struggled to generalize at the unseen 75% concentration. The stacked ML models exhibited improved prediction accuracy through transfer learning but still demonstrated poor generalization. This indicates that input-output relationships might be significantly influenced by unmeasured factors within the structured DoE framework. This would mean that the same process parameters might behave differently at the unseen 75% concentration, compared to the 0%, 50%, and 100% concentration levels, used to train the base ML models. Residual error modeling effectively captured the effect of carnauba wax concentration on pellet formation by combining base ML model predictions with correction terms, confirming its value in addressing concentration-specific trends. Conclusion: DoE provides a solid foundation for understanding key relationships between formulation aspects, process parameters, and quality attributes. However, ML modeling can exhibit a significant impact by interpolating between tested points, accounting for complex interactions, and supporting predictive insights. Additionally, meta-modeling of the residual error efficiently transfers knowledge from base ML models to a new concentration level, eliminating the need for extensive experimentation and full-scale DoE whenever the formulation is adjusted. This approach supports MIDD by offering an adaptable, predictive framework grounded in experimental data.
[1] Mendyk, A., et al. (2010). European Journal of Pharmaceutical Sciences, 41, 421–429. https://doi.org/10.1016/j.ejps.2010.07.010 [2] Fukuda, I. (2018). Braz. J. Pharm. Sci., 54(Special), e01006. https://doi.org/10.1590/s2175-97902018000001006 [3] Politis, S. N., Colombo, P., Colombo, G., & Rekkas, D. M. (2017). Drug Development and Industrial Pharmacy, 43(6), 889–901. https://doi.org/10.1080/03639045.2017.1291672 [4] Kaikousidis, C., et al. (2024). CPT: Pharmacometrics & Systems Pharmacology, 13(9), 1476–1487. https://doi.org/10.1002/psp4.13182 [5] Marshall, S., et al. (2019). CPT: Pharmacometrics & Systems Pharmacology, 8, 87–96. https://doi.org/10.1002/psp4.12372 [6] Szabó, N. (2022). Pharmaceutics, 14, 1299. https://doi.org/10.3390/pharmaceutics14061299 [7] Al-Hashimi, N., Begg, N., Alany, R. G., Hassanin, H., & Elshaer, A. (2018). Pharmaceutics, 10(4), 176. https://doi.org/10.3390/pharmaceutics10040176 [8] Politis, S. (2010). Development of a fast, lean and agile direct pelletization process for the production of modified release spheroids, using experimental design techniques [PhD dissertation]. https://doi.org/10.12681/eadd/26890 [9] LeDell, E., & Poirier, S. (2020). H2O AutoML: Scalable automatic machine learning. 7th ICML Workshop on Automated Machine Learning (AutoML). https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf [10] Posit Team. (2024). RStudio: Integrated Development Environment for R. Posit Software, PBC. http://www.posit.co/ [11] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An Introduction to Statistical Learning: with Applications in R (2nd ed.). Springer. https://doi.org/10.1007/978-1-0716-1418-1 [12] Freer, D., & Yang, G. Z. (2020). Journal of Neural Engineering, 17, 016041. https://doi.org/10.1088/1741-2552/ab57c0
Reference: PAGE 33 (2025) Abstr 11510 [www.page-meeting.org/?abstract=11510]
Poster: Methodology - New Modelling Approaches