2025 - Thessaloniki - Greece

PAGE 2025: Methodology - New Modelling Approaches
 

Free-form text as an NLME covariate and how Embedding models enable effective use of predictive factors from a wide range of complex data

Niklas Korsbo

PumasAI

Introduction Patient data in pharmacometrics often extend beyond numerical covariates, encompassing complex structured information (e.g., medical images) or unstructured information (e.g., doctor’s notes, patient self-assessments). While machine learning advancements have increased the potential to extract valuable insights from such data, integration into pharmacometric analyses remains limited. Pre-trained embedding models provide a powerful yet straightforward approach for pharmacometricians to access these advanced machine learning techniques without extensive retraining or large proprietary datasets. We present a novel method leveraging pre-trained embedding models [1, 2] to convert complex patient covariates into numerical vectors for direct integration into nonlinear mixed-effects (NLME) frameworks. Exploiting the mathematical similarity between embeddings and random effects, our approach is broadly applicable but illustrated here using embeddings derived from free-form textual patient self-assessments to inform patient-specific NLME predictions. Objectives Demonstrate and evaluate a methodology for incorporating textual patient self-assessments as covariates in a classical population PK/PD NLME model. Methods To validate the method under controlled conditions, we generated a synthetic dataset where text-derived covariates had a known but incomplete relationship with patient outcomes. Patient self-assessment texts were created using a large language model conditioned on a wellness score (1–10), integrated into longitudinal data via an indirect-response PK/PD NLME model. Additional, deliberately omitted sources of between-patient variability were introduced to reflect real-world complexity. While real-world validation is a necessary next step, this proof-of-concept study isolates the predictive power of text embeddings within an NLME framework and provides a robust foundation for future applications. Using this dataset of 100 patients, we initially fit a covariate-free NLME model, encoding patient responses into empirical Bayes estimates (EBEs). A pre-trained text embedding model was used without retraining, demonstrating minimal integration effort. Embedding dimensionality was reduced from 384 to 10 via principal component analysis, and these features were regressed to EBEs using a neural network. We then incorporated this fitted covariate pipeline into the NLME model, allowing text embeddings to account deterministically for some between-subject variability. Results The embedding-augmented NLME model significantly reduced unexplained between-subject variability and improved predictive accuracy, lowering the mean absolute residual error (MAE) from 0.32 to 0.21. When benchmarked against the data-generating model’s theoretical limit, the MAE between model predictions and the best possible prediction decreased from 0.27 to 0.098, indicating that text-derived covariates captured a substantial portion of the explainable variance. These results demonstrate the method’s ability to extract meaningful patient-specific information, even in small datasets with limited predictive power. Conclusion Embedding models offer a standardized, effortless way to integrate diverse complex covariates into NLME frameworks, enabling pharmacometricians to leverage rapid advancements from the broader machine learning community. This approach simplifies the use of complex patient data, such as free-form textual assessments, even in low-data scenarios common to pharmacometrics. In our prototype utilizing patient self-assessment texts, embedding models significantly reduced unexplained between-subject variability and improved predictive accuracy, demonstrating effective extraction of predictive information from complex textual data. The standardized embedding approach means images, text, or EKG data can all be easily converted to fixed-size numerical vectors, allowing effortless switching between data modalities with minimal impact on the existing modeling pipeline. Overall, combining advanced machine learning embedding techniques with NLME models enhances holistic patient characterization and precision in longitudinal predictions, maintaining interpretability and significantly reducing barriers to innovation.


Reference: PAGE 33 (2025) Abstr 11784 [www.page-meeting.org/?abstract=11784]
Oral: Methodology - New Modelling Approaches
Top