Mark Sale 1, James Craig 1, Keith Nieforth 1
1 Certara (Radnor, USA)
Objectives:
Assess the predictive power of the Quantitative Predictive Check (QPC) as a metric of VPC quality.
Methods:
QPC is a potential objective function for machine learning–based model selection. It is defined as a weighted sum of penalties reflecting discrepancies between simulated and observed data. The components of QPC include:
• Coverage – whether the observed quantile lies within the simulated prediction interval
• Deviation – absolute difference between observed and median prediction scaled by the prediction interval half-width
• Drift – Spearman correlation of residuals (observed – median) versus the independent variable
• Sharpness – relative prediction interval width
• Winkler interval score [1] – proper scoring rule penalizing both interval width and missed observations
PK datasets were simulated from three models.
Simulation 1: linear, 2 compartment, first-order absorption V~WT, CL~CRCL, SEX; small residual error.
Simulation 2: linear, 2 compartment, first-order absorption V~WT, CL~CRCL, SEX; very large residual error.
Simulation 3: linear, 3 compartment, very slow zero-order infusion (48 hours) followed by first-order absorption V~WT, CL~CRCL, SEX; very large residual error.
pyDarwin [2] was used to generate >500 random models for each simulated dataset. The randomly generated models included 1, 2, or 3 compartments, combinations of first- and zero-order infusion, with V and CL modeled as functions of combinations of WT, CRCL, AGE, and SEX. Residual error models included multiplicative and combined additive–multiplicative structures.
These 500+ models for each simulated dataset were then sorted by -2LL. From these models, 10 were selected based on -2LL: the 0th percentile (best model), and the 10th, 20th, …, 90th percentile models. The selected models represented a range from the best to several poorly predictive models.
VPCs were generated for each model using tidyVPC [3]. VPC plots included the full time range (linear scale), the early phase (first 6 hours, linear scale), and the full time range (semi-log scale). These VPC plots (30 total; 10 from each of 3 simulated datasets) were then presented in pairs to a convenience sample of 13 experienced modelers who were asked to select the better VPC (or a tie) from each pair. No specific guidance was provided regarding evaluation criteria. Rankings were calculated using the Elo method [4].
Logistic regression was performed using cumulative link mixed models (R package ordinal [5]). The hypothesis tested was that inclusion of QPC provides additional predictive information beyond -2LL for human VPC rankings. McFadden’s R^2 [6] was calculated for two models: Human Ranking ~ -2LL and Human Ranking ~ -2LL + QPC. Between-reviewer variability was quantified using Kendall’s coefficient of concordance [7].
Results:
Model and AIC:
Base (neither OFV nor QPC); AIC = 1818.02
OFV only; AIC = 1809.07
OFV + QPC; AIC = 1688.41
Model, P value and McFadden R^2:
OFV vs base; P Value = 9.36e-04; McFadden R^2 = 0.0061
OFV + QPC vs OFV; P Value = 1.66e-28 ; McFadden R^2 = 0.0687
Inclusion of QPC substantially improved model fit compared to -2LL alone.
Conclusions:
QPC demonstrated significant predictive power for human VPC plot rankings. Additional work is needed to determine whether incorporating QPC into machine learning–based model selection results in models with systematically improved VPC quality.
Between-reviewer variability was high (Kendall’s coefficient of concordance = 0.051), indicating low agreement. Agreement may have been higher had specific evaluation criteria (e.g., alignment of Cmax) had been provided. In this regard, QPC is intentionally non-specific, reflecting overall predictive adequacy rather than targeting specific features. If evaluation of a specific time point(s) or feature is required, alternative methods (e.g., posterior predictive checks) may be more appropriate.
References:
[1] Evaluating Probabilities: Asymmetric Scoring Rules Robert L. Winkler Management Science 1994 40:11, 1395-1405
[2]https://certara.github.io/pyDarwin/html/index.html
[3] https://github.com/certara/tidyvpc/blob/master/vignettes/tidyvpc_qpc.pdf
[4] Elo, A. E. 1978. The Rating of Chess Players, Past and Present. New York: Arco.
[5] https://cran.r-project.org/web/packages/ordinal/index.html
[6] https://eml.berkeley.edu/reprints/mcfadden/zarembka.pdf
[7] Kendall, M.G. (1948). Rank correlation methods. London: Griffin.
Reference: PAGE 34 (2026) Abstr 11962 [www.page-meeting.org/?abstract=11962]
Poster: Methodology - New Modelling Approaches