CVAE–LASSO: INTEGRATED L1-REGULARIZED COVARIATE SELECTION WITHIN A CONDITIONAL LATENT FRAMEWORK FOR POPULATION PHARMACOKINETIC MODELLING - PAGE Meeting (Population Approach Group Europe)

Diego Perazzolo ^1,2, Enrico Grisan ²

1 University of Padova, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, Padova, Italy (Padova, Italia), 2 London South Bank University, School of Computer Science and Digital Technologies, London, UK (London, UK)

Objectives:

Population pharmacokinetic (PopPK) modelling is central to quantify inter-individual variability and identify covariates driving drug exposure, supporting individualized dosing decisions [1]. Traditional approaches based on nonlinear mixed-effects models often struggle to capture complex or nonlinear relationships [2,3].

Building on our earlier VAE–LASSO approach for tacrolimus PopPK covariate selection, where representation learning and sparse inference were decoupled [4], in this work we develop and evaluate an upgraded, integrated framework in which sparse covariate selection directly conditions PK profile generation. Specifically, we aimed to (i) embed an L1-regularized regression module within both encoder and decoder of a conditional variational autoencoder (CVAE) to make covariates explicitly guide concentration time profiles reconstruction, (ii) quantify whether the resulting sparse coefficients reliably recover influential covariates under limited training sample size, and (iii) demonstrate a fast and interpretable covariate screening tool for PopPK analyses [5].

Methods:

Dorzagliatin was used as a case study because a published PopPK model reports covariate effects on clearance, volume and absorption duration [5]. Synthetic concentration time profiles were generated from a two-compartment disposition model with sequential zero-order and first-order absorption and first-order elimination. Typical values for Q/F (3.02 L/h), Vp/F (26.5 L) and Ka (3.29 h⁻¹) were fixed, while clearance depended on total body weight (TBW), AST and age (AGE), central volume depended on TBW and sex (SEX), and absorption duration depended on feeding time (FOOD) [5].

Covariates (AGE, TBW, HT, BMI, RBC, ALB, ALT, AST, GGT, CR, TBIL, SEX and FOOD) were sampled from truncated normal distributions using reported medians and bounds; two additional noise covariates (extra 1, extra 2) were added to test specificity and covariate selection capacity.
Two datasets were generated: 300 profiles for training (mirroring a realistic clinical sample size) and 2000 profiles for independent testing, with a 24 h window after a single 25 mg oral dose.

In the proposed CVAE–LASSO architecture, covariates are processed by a LASSO layer enforcing sparsity via an L1 penalty [2,7], and the selected sparse representation conditions both encoder and decoder of the CVAE, which extends classical VAEs by conditioning the latent encoding/decoding on auxiliary variables [6]. The training objective combined mean absolute error (MAE) reconstruction loss, KL divergence regularization, and an L1 penalty term controlled by λ.

Continuous covariates were min–max scaled and categorical variables were one-hot encoded. λ was tested over {0.01, 0.1, 0.25, 0.5, 0.75, 1, 2, 3}. Reconstruction performance was assessed using MAE, mean absolute percentage error (MAPE), and dynamic time warping (DTW) distance; a post-processing layer using quadratic interpolation was applied to refine smoothness and physiological plausibility.

Results:

The integrated CVAE–LASSO produced accurate reconstructions across all λ settings, with MAE in the 10⁻³ mg/L range and MAPE consistently below 5%. Specifically, MAE ranged from 0.0015 to 0.0086 mg/L and MAPE from 1.984% to 4.782% across the tested λ values, while DTW remained close to zero (0.011–0.138), indicating strong temporal alignment between generated and reference trajectories.

Extracted LASSO coefficients showed progressive shrinkage with increasing λ, enabling tunable sparsity. Across all λ values, AGE, TBW, SEX and FOOD consistently retained the largest coefficients, matching the covariates encoded in the dorzagliatin PopPK generation model [5], whereas weaker covariates and the injected noise variables (extra 1 and extra 2) were driven towards zero and effectively excluded at higher λ.
Covariate-count analyses across coefficient thresholds (0.001–0.003) showed a monotonic decrease in selected covariates with increasing λ followed by a plateau, determining a stable subset of dominant influencing covariates.

Conclusions:

Embedding LASSO as a conditioning mechanism within a CVAE provides an integrated, interpretable framework that jointly reconstructs PopPK concentration time profiles and yields sparse, data-driven covariate selection.

Using 300 training profiles and evaluating on 2000 independent profiles, the method maintained stable reconstruction quality and consistently recovered the most influential covariates in a more complex PK structure than in our previous work [4]. The approach offers practical control over sparsity through λ, enabling rapid prioritization of covariates for subsequent mechanistic PopPK modelling and reducing dependence on exhaustive stepwise searches commonly used in traditional workflows [3].
Current limitations are linked to the use of simulated, regularly sampled data; future work will extend evaluation to irregular sampling, missing observations, and multi-centre clinical datasets to benchmark against standard PopPK frameworks and assess translational robustness.

References:
[1] C. M. T. Sherwin, T. K. L. Kiang, M. G. Spigarelli, and M. H. H. Ensom, “Fundamentals of population pharmacokinetic modelling: Validation methods,” Clinical Pharmacokinetics, vol. 51, pp. 573–590, 2012.

[2] M. M. Hutmacher and K. G. Kowalski, “Covariate selection in pharmacometric analyses: A review of methods,” British Journal of Clinical Pharmacology, vol. 79, no. 1, pp. 132–147, 2015.

[3] E. Niclas Jonsson and M. O. Karlsson, “Automated covariate model building within NONMEM,” Pharmaceutical Research, vol. 15, no. 9, pp. 1463–1468, 1998.

[4] D. Perazzolo, C. Castellani, and E. Grisan, “Uncovering population PK covariates from VAE-generated latent spaces,” in 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, July 2025, Available at: https://arxiv.org/abs/2505.02514.

[5] K. Wang, L. Feng, J. Zhang, Q. Zou, F. Xu, Z. Sun, F. Tang, and L. Chen, “Population pharmacokinetic analysis of dorzagliatin in healthy subjects and patients with type 2 diabetes mellitus,” Clinical Pharmacokinetics, vol. 62, no. 10, pp. 1413–1425, 2023.

[6] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 201.

[7] R. Muthukrishnan and R. Rohini, “Lasso: A feature selection technique in predictive modeling for machine learning,” in 2016 IEEE International Conference on Advances in Computer Applications (ICACA), 2016, pp.18–20.

Reference: PAGE 34 (2026) Abstr 11886 [www.page-meeting.org/?abstract=11886]

Poster: Methodology - New Modelling Approaches

PDF poster / presentation (click to open)