Bartosz Bartmanski 1, Ana Victoria Ponce-Bobadilla 2
1 European Molecular Biology Laboratory (Heidelberg, Germany), 2 Certara (Radnor, United States)
Introduction
Model development in population pharmacokinetics is inherently iterative and largely guided by qualitative interpretation of diagnostic plots and fit statistics. However, distinct model pathologies can generate similar diagnostic signatures, particularly under sparse sampling and moderate to high η-shrinkage [1]. In such settings, standard diagnostics including EBEs, IWRES, CWRES and VPCs may allow for multiple plausible interpretations, such that different pharmacometricians examining the same outputs may pursue different subsequent modeling strategies.
Common challenges include distinguishing structural misspecification from residual error misspecification, separating absorption model inadequacy from sampling-time errors, and differentiating overparameterization from true mechanistic deficiencies. When model diagnostics are ambiguous, model development may lead to unnecessary complexity or inflated uncertainty that can induce poor extrapolation and bias in statistical conclusions [2].
Machine learning (ML) offers the ability to identify multivariate diagnostic patterns that may not be apparent through visual inspection or univariate metrics. By learning relationships between quantitative diagnostic features and known model pathologies, ML can support structured reasoning about competing explanations.
We propose an uncertainty-aware framework based on interpretable machine learning to rank competing diagnostic hypotheses and suggest plausible next modeling actions. As a proof of concept, we focus on distinguishing structural from residual error misspecification under realistic shrinkage and sparsity conditions.
Methods
Pharmacokinetic data were simulated from a one-compartment model with first-order absorption and time-varying clearance. Log-normal interindividual variability was included on clearance and volume, with combined proportional and additive residual error. Sampling density, interindividual variability magnitude, residual error scale, dose level, and typical parameter values were systematically varied to induce different levels of shrinkage and diagnostic ambiguity and to ensure that conclusions were generalizable and not dependent on specific design settings.
Three scenarios were evaluated: (i) correct structural and residual model; (ii) structural misspecification; and (iii) residual error misspecification. For each scenario, 450 datasets were generated and fitted with the corresponding candidate model. In the correct scenario, the fitted model matched the data-generating model. Structural misspecification was induced by fitting a constant-clearance model to data simulated with time-varying clearance, while residual error misspecification was induced by fitting a proportional-only error model to data simulated under a combined residual error structure. This generated labeled datasets with the true underlying model scenario.
Quantitative diagnostic features were extracted from each fitted dataset, including phase-specific CWRES statistics, objective function values, and η-shrinkage metrics. Feature extraction was limited to standard diagnostic metrics and estimated parameters. The datasets were split into a 70:30 training–test set, ensuring that the test set did not contain dose levels or typical parameter values present in the training set. A gradient boosted classifier was trained to discriminate between scenarios, and SHAP-based feature attribution was applied to interpret feature importance. The framework outputs class probabilities across competing hypotheses, explicitly quantifying diagnostic uncertainty.
Results
Model performance demonstrated that quantitative diagnostic features can effectively discriminate between structural and residual misspecification, with classification performance primarily influenced by shrinkage and sampling design. Across the test scenarios, the classifier achieved an overall macro-F1 of 0.85, indicating strong discriminative ability. In the dense sampling, low-shrinkage scenario, macro-F1 reached 0.93, reflecting clear separation between diagnostic classes. In the sparser, high-shrinkage scenario, macro-F1 decreased to 0.72. Confusion matrices revealed that misclassification patterns mirrored situations in which modelers commonly disagree, highlighting the inherent ambiguity in diagnostic interpretation.
SHAP analysis identified the most influential features for detecting structural misspecification as the phase-specific CWRES linear slope with respect to time since last dose and the mean of η_ka and η_Vc. For residual misspecification, the top-ranked features included objective function value (OFV), the estimated correlation between IIV of clearance and IIV of the central volume, and the standard deviation of η_ka. These results indicate that different subsets of diagnostic metrics drive discrimination for structural versus residual error misspecification, supporting the interpretability of the framework.
Conclusions
By probabilistically ranking competing hypotheses, the framework can reduce unnecessary model iterations, improve transparency in diagnostic reasoning, provide training support for junior pharmacometricians, and mitigate individual modeler bias. This framework can be integrated into standard pharmacometric workflows as a decision-support layer, where diagnostics are run, features extracted, hypotheses ranked, and next modeling steps recommended. Future extensions include the development of a broader taxonomy of diagnostic failure modes.
References:
[1]: Savic, R. M., & Karlsson, M. O. (2009). Importance of shrinkage in empirical bayes estimates for diagnostics: problems and solutions. The AAPS journal, 11(3), 558-569.
[2]: Guhl, M., Mercier, F., Hofmann, C., Sharan, S., Donnelly, M., Feng, K., … & Bertrand, J. (2022). Impact of model misspecification on model-based tests in PK studies with parallel design: real case and simulation studies. Journal of Pharmacokinetics and Pharmacodynamics, 49(5), 557-577.
Reference: PAGE 34 (2026) Abstr 11987 [www.page-meeting.org/?abstract=11987]
Poster: Methodology – AI/Machine Learning