2010 - Berlin - Germany

PAGE 2010: Methodology- Model evaluation
Paul Baverel

Informativeness of Internal and External Validation Techniques in Various Simulation Settings

Paul G Baverel, Kristin E Karlsson, Mats O Karlsson

Dept of Pharmaceutical Biosciences, Uppsala University, Sweden

Objectives: Internal (IV) and external (EV) validation procedures are two well-established statistical methods testing the predictability of a model. Selection of the appropriate model-based diagnostic is essential, both techniques presenting limitations in specific circumstances. The aim was to compare the predictive performance of IV and EV for similar sized learning datasets and in various simulation settings, a population (PV) validation scheme being used as reference. 

Methods: An automated procedure was implemented in PsN [1] to operate series of stochastic simulations followed by estimations (SSE) in NONMEM 6.2, coupled to 3 distinct numerical predictive checks (NPC: corresponding to IV, EV, and PV) based on an oral one-compartment PK model. Random effects were included on all model parameters (30% CV) and residual variability was set to 10% CV. For IV and EV, simulated datasets were designed so that the number of individuals (IDs) ranged from 3 to 384, with an increment multiplication factor of 2 within each new SSE series (i.e. 3,6,,...192,384). Each individual contributed to 3 sampling points obtained from a preceding optimization in PopED 2.0 [2]. For PV, a large number of individuals (1000) were simulated and symbolized the population pool from which IV and EV individual samples were drawn from. In such settings, from each set of SSE final parameter estimates, NPC was applied to initiate IV, EV, and PV. For IV, the same dataset was used for estimation and prediction, whereas for EV and PV, a new (validating) dataset of similar size as the learning was simulated. Finally, mean errors (MEs) and mean absolute errors (MAEs) of (IV-PV) and (EV-PV) NPC outcomes were computed as indicator of bias and imprecision respectively. [3]

Results: At the median of observations/predictions and across the range of data size investigated, both IV and EV were in good agreement with PV, no consistent bias and little imprecision being revealed. However, discrepancies occurred between IV and EV at the tail of the observations/predictions distribution (90% prediction interval): for small learning datasets (<48 IDs), IV predictions were more biased and imprecise than EV. As expected, increasing data size reduced bias and imprecision for both IV and EV.

Conclusions: Results suggest that when the dataset is small (<48 IDs), data splitting followed by EV is recommended, while when dataset is large, use of IV is advised. However, these outcomes are undoubtedly model and design dependent and cannot be generalized.

Ackowledgement:  This work has previously been presented at ACoP 2009 conference.

[1] Perl-Speaks-NONMEM (PsN software). Lindbom L, Karlsson MO, Jonsson EN. http://psn.sourceforge.net/
[2] PopED version 2.08 (2008). http://poped.sf.net/
[3] Sheiner LB, Beal SL. Some suggestions for measuring predictive performance. J Pharmacokin Biopharm 9(4): 504-512 (1981)

Reference: PAGE 19 (2010) Abstr 1916 [www.page-meeting.org/?abstract=1916]
Poster: Methodology- Model evaluation
Click to open PDF poster/presentation (click to open)