Independent-model diagnostics for a priori identification and interpretation of outliers from a full pharmacokinetic database: correspondence analysis, Mahalanobis distance and Andrews curves
Nabil Semmar, Saik Urien, Bernard Bruguerolle and Nicolas Simon
Laboratory of Clinical Pharmacology, EA3784, Medical School of Marseilles, Marseilles, France
Objectives: This work aimed to extract outliers from a full PK dataset independently of any PK model. This provides information on the variability structure of a PK (PD) population and on its homogeneity/heterogeneity level before modelling. The usefulness of a priori outlier diagnostics was underlined by highlighting a positive link between the degree of outlier values and their predictability error given by a PK (PD) model. This can help modeller to select the most appropriate model among different candidate ones.
Methods: Outlier diagnostics concerned both the extraction of outlying PK (PD) profiles (subjects) and their outlier concentration values. Multivariate outlier diagnostics were applied and consisted in combining all the concentration-time values of a PK (PD) profile to compute a scalar from which a subject will be classified as outlier or non-outlier (1). Then, the outlier concentrations were identified in each outlying profile by computing the scalar without one concentration at once. In order to examine the outliers under different aspects, three multivariate diagnostics corresponding to three different distance metrics were applied: Andrews curves, correspondence analysis and jackknifed Mahalanobis distance, which are based on Euclidean, Chi-square and Mahalanobis distances respectively (2-4) (5-6) (7). These multivariate analyses were carried out by using Excel, ADE and JMP softwares respectively (8, 9, 10). After the application of the three detection methods, the outliers were classified by the number of times they were detected (0<= <= 3). These three multivariate diagnostics were illustrated on a full PK dataset consisting of capecitabine orally administrated (11). A posteriori, the dataset was modelled with NONMEM by using a first order absorption model. From the modeling results, normalized prediction distribution errors (NPDE) of concentrations were computed (12). Links between a posteriori and a priori results were examined by analysis of the NPDE absolute values in relation to the number of times (0<= <= 3) corresponding concentrations were identified as outlier.
Results: The outliers confirmed by the three diagnostics a priori corresponded to the most atypical concentrations because of their atypical absolute (Euclidean) and relative (Chi 2) values, and their atypical location (Mahalanobis distance) on the PK profiles: According to Andrews curves (Euclidean distance), the outlier concentrations had atypically high absolute values. They corresponded to very high absorption peaks. Correspondence analysis (Chi-2 distance) showed outlier concentrations as relatively high concentrations compared both with the concentrations at other times in the same subject, and with the concentrations at the same time in the whole population. These concentrations corresponded to early or delayed absorption peaks (at unusual times). The jackknifed Mahalanobis distance extracted outliers as concentrations linked to atypical variations, e.g. an increase rather a decrease. Such case can be represented by patient showing peak concentrations during the elimination phase of the whole population.
After PK modeling, a positive correlation was found between the NPDE and the number of times each concentration was detected as outlier: higher the number of detection of an outlier was a priori (0<= <=3), higher was its NPDE absolute value a posteriori.
Conclusions: The application of multivariate diagnostics for extraction of outliers from a full PK (PD) dataset provided key information on the variability of a PK (PD) population independently of any PK (PD) modelling. This variability analysis a priori was based on identification of outlier concentrations in some outlying subjects which corresponded to extreme PK (PD) states according to a certain distance metric. The use of three distance metrics (Euclidean, Chi-square and Mahalanobis) was advantageous in two ways: first, some concentrations were detected as outliers only under one or two criteria. This can help the clinicians to suitably classify and interpret outlying patients according to the outlier detection criterion (metric). Second, some outlier concentrations were confirmed by all the diagnostics and they were considered as the most significant outliers. The number of times (0<= <=3) where each concentration was detected as outlier a priori was positively correlated to its NPDE a posteriori, i.e. after the application of a PK model on the dataset. This can usefully help the modeller to select among several PK (PD) models, the most appropriate one according to its ability to fit well the most confirmed outliers.
 R. Gnanadesikan. Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York, 1977.
 M.J. Greenacre. Theory and Applications of Correspondence Analysis. Academic, London, 1984.
 M.J. Greenacre. Correspondence Analysis in Practice. Academic, London, 1993.
 F. Mortier, A. and Bar-Hen, 2004. Influence and sensitivity measures in correspondence analysis. Statistics 38: 207-215 (2004).
 D. Andrews. Plots of high-dimensional data. Biometrics 28, 125-136 (1972).
 V. Barnett. The ordering of multivariate data (with discussion). J. R. Stat. Soc. 139: 318-354 (1976).
 R. Swaroop and W.R. Winter. A Statistical Technique for Computer Identification of Outliers in Multivariate Data. NASA Technical Notes, D-6472 (1971).
 Thioulouse J, Chessel D, Dolédec S, Olivier JM (1996) ADE-4: a multivariate analysis and graphical display software. Statistics and Computing 7:75-83.
 Frye C, Freeze WS, Buckingham FK (2004) Microsoft Office Excel 2003 Programming Inside Out. Microsoft Pr, Washington.
 SAS Institute Inc. . JMP3.2. SAS Institute, Carry, North Carolina, 1987.
 S. Urien, K. Rezaï, and F. Lokiec. Pharmacokinetic Modelling of 5-FU Production from Capecitabine - A Population Study in 40 Adult Patients with metastatic Cancer. J. Pharmacokin. Pharmacodyn. 32: 817-833 (2005).
 K. Brendel, E. Commets, C. Laffont, C. Laveille and F. Mentré. Metric for external Model Evaluation with an Application to the Population Pharmacokinetics of Gliclazide. Pharmaceutical Research 23: 2036-2049 (2006).