III-079

Data-Driven Approach to Covariate Selection in PopPK: Integrating Variational Autoencoders (VAE) and LASSO Regression

Diego Perazzolo1,2, Chiara Castellani1, Prof. Enrico Grisan2

1Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 2School of Engineering, London South Bank University

Introduction / Objectives: Population pharmacokinetic (PopPK) modeling plays a pivotal role in understanding drug behaviour and variability across diverse patient populations, crucial for optimizing personalized dosing [1]. Traditional approaches often struggle in capturing nonlinear and complex covariate relationships [2]. To overcome these limitations, we propose a fully data-driven and compartment-model-independent framework that leverages Variational Autoencoders (VAEs) [3] to represent pharmacokinetic variability through a structured latent space. By mapping covariates to the VAE-derived latent space using LASSO regression, we identify the most influential factors, while systematically assessing their importance through L1 regularization. This hybrid methodology offers a scalable and interpretable solution for covariate selection, with promising applications in drug development.
Methods: A dataset comprising 10,000 synthetic pharmacokinetic (PK) profiles of tacrolimus was generated to evaluate the applicability of the proposed framework. Each PK profile simulated tacrolimus concentration (mg/L) over a 48-hour period following a single administered dose, based on a one-compartment model with first-order elimination kinetics, as specified in Di Chen et al. [5]. The covariates included in the dataset were age, sex, weight, haemoglobin concentration, albumin levels, CYP3A5 genotype (Single Nucleotide Polymorphism SNP), and ethnicity (categorized into Caucasian-American, African-American, Hispanic, Asian, and Other). Clearance (CL) was explicitly modeled as a function of SNP, age, albumin, and haemoglobin according to the parameterized equations provided by Chen et al. [5]. Additionally, two random noise variables were included among the covariates to assess the ability of the proposed method to differentiate relevant predictors from irrelevant ones. To obtain an effective latent representation of the PK profiles, a Variational Autoencoder (VAE) was trained using these simulated data. The VAE architecture consisted of an encoder network mapping PK profile into a probabilistic latent distribution, characterized by mean (μ) and standard deviation (σ), from which latent variables were sampled [6]. The decoder network reconstructed PK profiles from these latent variables. The training Loss function combined the minimization of Mean Absolute Error (MAE) for profile reconstruction with a Kullback-Leibler divergence term. After deriving latent representations from the trained VAE, LASSO regression was applied to map patient-specific covariates to their latent profiles. LASSO regression, incorporating an L1 regularization penalty, was used to enforce sparsity in covariate selection, systematically shrinking coefficients of irrelevant covariates towards zero. The impact of different regularization strengths was explored by varying the λ parameter across a comprehensive range (λ = 0.0001, 0.002, 0.005, 0.008, 0.01, 0.1, and 1.0). For each λ, the selected covariates and their associated coefficients were analyzed to determine the robustness, interpretability, and practical utility of the covariate selection process.
Results: Reconstruction of PK profiles from the VAE’s latent space yielded a mean absolute percentage error (MAPE) of 2.26%. LASSO regression applied to the latent space representations derived from the VAE consistently identified SNP genotype, age, haemoglobin, and albumin as significant covariates influencing tacrolimus pharmacokinetics (PK). Notably, these four covariates were the same variables explicitly used in the clearance calculation during the dataset generation phase, confirming accurate recovery of relevant factors by the proposed framework. These covariates were robustly selected across the tested regularization parameter values (λ: 0.0001, 0.002, 0.005, 0.008, 0.01, 0.1, 1.0). At lower regularization levels (λ ≤ 0.005) additional covariates such as weight and ethnicity occasionally presented small, non-zero coefficients. However, increasing λ values (≥ 0.01) led to greater model sparsity, with only SNP genotype retained at the highest levels. Reaching a regularization strength of λ = 1.0, none of the covariates are retained. The two randomly generated noise variables were consistently assigned coefficients of zero or close to zero across all λ values, confirming the framework’s ability to discard irrelevant features.
Conclusions: The proposed integration of Variational Autoencoders and LASSO regression enables robust, data-driven covariate selection in population pharmacokinetics, without requiring prior specification of the underlying PK model. The method consistently identified key covariates used in the generation of the dataset, while effectively discarding irrelevant variables through appropriate tuning of the LASSO regularization parameter (λ). Despite the limitations imposed by the linearity of LASSO in reconstructing full PK profiles, the latent space provided an interpretable structure for covariate influence. Future developments will focus on incorporating non-linear regression methods to improve reconstruction accuracy. The framework offers a scalable and interpretable solution with strong potential for advancing precision dosing and individualized pharmacotherapy.

 [1] C. M. T. Sherwin, T. K. L. Kiang, M. G. Spigarelli, and M. H. H. Ensom, “Fundamentals of population pharmacokinetic modelling: Validation methods,” Clinical Pharmacokinetics, vol. 51, pp. 573–590, 2012. [2] M. M. Hutmacher and K. G. Kowalski, “Covariate selection in pharmacometric analyses: A review of methods,” British Journal of Clinical Pharmacology, vol. 79, no. 1, pp. 132–147, 2015. [3] D. P. Kingma, M. Welling et al., “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019. [4] R. Muthukrishnan and R. Rohini, “Lasso: A feature selection technique in predictive modeling for machine learning,” in 2016 IEEE International Conference on Advances in Computer Applications (ICACA), 2016, pp. 18–20. [5] D. Chen, Q. Yao, W. Chen, J. Yin, S. Hou, X. Tian, M. Zhao, H. Zhang, L. Yang, and T. Zhou, “Population pk/pd model of tacrolimus for exploring the relationship between accumulated exposure and quantitative scores in myasthenia gravis patients,” CPT: Pharmacometrics & Systems Pharmacology, vol. 12, no. 7, pp. 963–976, 2023. [6] D. P. Gomari, A. Schweickart, L. Cerchietti, E. Paietta, H. Fernandez, H. Al-Amin, K. Suhre, and J. Krumsiek, “Variational autoencoders learn transferrable representations of metabolomics data.”, Communications Biology, vol. 5, no. 1, p. 645, 2022. 

Reference: PAGE 33 (2025) Abstr 11621 [www.page-meeting.org/?abstract=11621]

Poster: Methodology - New Modelling Approaches

PDF poster / presentation (click to open)