James Craig 1, Mark Sale 1, Keith Nieforth 1
1 Certara Data Sciences (Radnor, United States)
Introduction:
Model evaluation in pharmacometrics relies on multiple criteria, including likelihood-based statistics, parameter precision, plausibility of covariate effects, and graphical diagnostics. Among non-numeric diagnostics, the Visual Predictive Check (VPC) remains one of the most widely used tools for assessing predictive performance. By comparing observed and simulated quantiles over the independent variable (typically time), the VPC provides insight into structural adequacy, variability representation, and distributional agreement. It also guides model refinement; for example, systematic underprediction of Cmax may suggest misspecification of absorption kinetics, whereas divergence in variability over time may indicate deficiencies in the variance model. Despite methodological advances such as the Quantified VPC (QVPC) [1] and regression-based binless VPC approaches [2], interpretation of predictive adequacy remains primarily visual, and existing quantitative summaries do not provide a unified scalar metric suitable for automated optimization or model search.
Objectives:
To develop a scalar, interpretable, and optimization-ready metric derived from VPC diagnostics that quantitatively summarizes predictive adequacy while preserving key diagnostic dimensions.
Methods:
The Quantitative Predictive Check (QPC) transforms VPC outputs into a composite predictive quality score. Applicable to both binless [2] and traditionally binned VPCs, QPC operates on observed and simulated quantile curves across the independent variable. Five interpretable components are computed: (i) coverage, defined as the proportion of observed quantiles contained within simulated prediction intervals; (ii) scaled deviation, measuring the magnitude of deviation between observed and simulated medians normalized by the prediction interval half-width; (iii) drift, quantified via Spearman correlation of residuals (observed minus simulated median) across the independent variable to detect monotonic bias; (iv) sharpness, assessing predictive efficiency through relative interval width; and (v) a proper interval (Winkler) score [3] penalizing both missed coverage and unnecessarily wide intervals. Each component captures a distinct aspect of predictive adequacy—calibration, bias, structural trend, variance representation, and interval efficiency. Component penalties are aggregated across quantiles and combined using user-defined weights to produce a single scalar QPC score (lower values indicate better predictive performance). The framework supports prediction-corrected VPCs and has been implemented in the open source tidyvpc [4] R package, enabling integration into existing pharmacometric workflows.
Results:
In controlled simulation experiments examining structural and variance misspecification, QPC demonstrated sensitivity to distinct model deficiencies. Models with inflated variability achieved visually acceptable coverage yet produced increased QPC scores due to penalties on sharpness and interval efficiency, highlighting degraded predictive precision. Conversely, models with improved structural adequacy and variance explanation yielded consistent reductions in QPC. The metric provided stable and reproducible ranking across competing models, including scenarios where visual interpretation was ambiguous. When incorporated into machine learning–based search procedures, QPC enabled objective discrimination between candidate models and facilitated multi-objective optimization strategies combining likelihood-based and predictive criteria. After generating VPC summaries in tidyvpc (binless or binned; optionally stratified and/or prediction-corrected) and computing vpcstats(), users call a single post-processing function, qpcstats() [5], to obtain qpc.stats and the composite scalar qpc_score. For large-scale model comparisons (e.g., automated search/optimization), sharp_ref and interval_ref can be supplied to anchor the sharpness and interval-score penalties for improved cross-run comparability; component contributions can be tuned via the named weight vector w.
Conclusions:
QPC converts VPC-based evaluation from a qualitative diagnostic into a quantitative, reproducible, and optimization-compatible metric. By decomposing predictive adequacy into interpretable penalty components, the framework preserves diagnostic transparency while enabling objective model comparison and automated model selection. This approach bridges traditional visual diagnostics with modern machine learning–driven model development workflows, supporting improved reproducibility and scalability in pharmacometric evaluation.
References:
[1] Post, T. M., Freijer, J. I., Ploeger, B. A., & Danhof, M. (2008). Extensions to the visual predictive check to facilitate model performance evaluation. Journal of pharmacokinetics and pharmacodynamics, 35(2), 185–202.
[2] Jamsen, K. M., Patel, K., Nieforth, K., & Kirkpatrick, C. M. J. (2018). A Regression Approach to Visual Predictive Checks for Population Pharmacometric Models. CPT: pharmacometrics & systems pharmacology, 7(10), 678–686.
[3] Winkler, R. L. (1994). Evaluating Probabilities: Asymmetric Scoring Rules. Management Science, 40(11), 1395–1405.
[4] https://cran.r-project.org/web/packages/tidyvpc/index.html
[5] https://github.com/certara/tidyvpc/blob/master/vignettes/tidyvpc_qpc.pdf
Reference: PAGE 34 (2026) Abstr 12315 [www.page-meeting.org/?abstract=12315]
Poster: Methodology - Model Evaluation