Britta Steffens (1,2), Marc Pfister (1,2), Andrew Atkinson (1), Sven Wellmann (2,3), Gilbert Koch (1,2)
(1) Pediatric Pharmacology and Pharmacometrics, University Children’s Hospital Basel (UKBB), University of Basel, Basel, Switzerland, (2) NeoPrediX AG, Basel, Switzerland, (3) Department of Neonatology, Hospital St. Hedwig of the Order of St. John of God, University Children's Hospital Regensburg (KUNO), University of Regensburg, Regensburg, Germany
Introduction:
Clinical decision support systems are becoming increasingly important in daily clinical practice. Most current tools are designed for diagnosis of medical conditions and provide either a yes/no answer or a probability that a certain event will happen at pre-defined time-points [1]. In general, such tools are based on machine learning approaches and do not provide information regarding the severity of the medical condition [1,2]. For determining the prediction accuracy of diagnostic tools statistical measures, such as sensitivity, specificity, and area under the receiver operating characteristic, are commonly in use. This sensitivity/specificity approach has gained increasing importance as an easy-to-understand metric for the accuracy of such diagnostic tools, especially in the context of medical device regulations.
Decision support tools that are based on pharmacometrics (PMX) methods predict the time course of a medical condition. Consequently, PMX-based tools provide information on the dynamics of disease progression, rather than a simple yes/no outcome at a given time point. For applying the sensitivity/specificity approach in this setting, the physician has to determine a clinically relevant threshold, which e.g. indicates a confirmed diagnosis, by considering the disease progression. Thus, the translation into this setting is not straightforward.
Objectives:
- Introduce two approaches for translating the sensitivity/specificity concept into the context of PMX-based tools.
- Present a case study of bilirubin kinetics in neonates where two total serum Bilirubin measurements are utilized to predict bilirubin changes up to 48 hours after the last measurement.
Methods:
First, given the threshold y pre-determined by the physician, we obtain the binary outcome variable. Second, a so-called “acceptance range” Racc that explicitly compares each prediction with the observation is defined.
We present two approaches:
1) Parametric (B-A-)approach: Racc is defined based on the Bland-Altman method with (1-α)-limits of agreement:
Racc = MWdiff ± z1-α/2 · SDdiff
where MWdiff is the mean and SDdiff the standard deviation of the prediction differences, and z1-α/2 is the (1-α/2)-percentile of the standard normal distribution.
2) Empirical approach: Racc is pre-defined based on empirical data and clinical experience.
Based on either one of these approaches, the standard confusion matrix can be constructed as follows:
- True positive: Subject with confirmed diagnosis (i.e. observed level > y) either with predicted level > y, or with both, predicted level ≤ y and prediction difference within Racc
- True negative: Healthy subject (i.e. observed level ≤ y) either with predicted level ≤ y, or with both, predicted level > y and prediction difference within Racc
- False positive: Healthy subject with predicted level > y, but with prediction difference above the upper limit of Racc
- False negative: Subject with confirmed diagnosis with predicted level ≤ y, but with prediction difference below the lower limit of Racc
Results:
This sensitivity/specificity concept was applied to validate the accuracy of a PMX-based tools predicting neonates at risk for hyperbilirubinemia. This tool was developed based on a training dataset (n=342 neonates) from the University Children’s Hospital Basel, Switzerland [3]. Sensitivity and specificity were calculated for an independent validation dataset from the University Children’s Hospital Regensburg, Germany, including neonates born > 34 weeks of gestation [3]. According to international guidelines, the condition for hyperbilirubinemia was set to a clinically relevant threshold [4]. For the B-A-approach we utilized the 90% limits of agreement (scenario 1). For the empirical approach, the acceptance range was defined as ±70 resp. ±85 µmol/L (scenarios 2 and 3). Sensitivity and specificity were comparable for both approaches (sensitivity / specificity with 95%-CI):
- Scenario 1: 93% [88%;98%] / 92% [87%;97%]
- Scenario 2: 98% [95%;100%] / 92% [87%;97%]
- Scenario 3: 98% [95%;100%] / 97% [94%;100%]
Conclusions:
The sensitivity/specificity approach for diagnostic tools can be translated into the context of PMX-based tools provided a clinically reasonable and relevant treatment threshold is known. Moreover, by using empirical data and clinical experience the Bland-Altman-based approach may be simplified to yield an easy-to-understand and pragmatic metric for prediction accuracy in clinical practice.
References:
[1] Daunhawer I, Kasser S, Koch G, Sieber L, Cakal H, Tütsch J, Pfister M, Wellmann S, Vogt JE (2019). Enhanced early prediction of clinically relevant neonatal hyperbilirubinemia with machine learning. Pediatr Res 86(1):122-127
[2] Koch G, Pfister M, Daunhawer I, Wilbaux M, Wellmann S, Vogt JE (2020). Pharmacometrics and Machine Learning Partner to Advance Clinical Data Analysis. Clin Pharmacol Ther 107(4):926-933
[3] Koch G, Wilbaux M, Kasser S, Schumacher K, Steffens B, Wellmann S, Pfister M (submitted). Leveraging predictive pharmacometrics-based algorithms to enhance perinatal care – application to neonatal jaundice. Frontiers in Pharmacology
[4] Bhutani VK, Committee on Fetus and Newborn; American Academy of Pediatrics (2011). Phototherapy to prevent severe neonatal hyperbilirubinemia in the newborn infant 35 or more weeks of gestation. Pediatrics 128(4):e1046-52
Reference: PAGE 30 (2022) Abstr 10187 [www.page-meeting.org/?abstract=10187]
Poster: Methodology - Model Evaluation