IV-11 Undine Falkenhagen

Structural model selection: Is the chi-square distribution appropriate for likelihood ratio tests?

Undine Falkenhagen (1,2), Charlotte Kloft (3), Wilhelm Huisinga (2)

(1) PharMetrX Graduate Research Training Program: Pharmacometrics & Computational Disease Modelling, Freie Universität Berlin/Universität Potsdam, (2) Mathematical Modelling and Systems Biology, Institut für Mathematik, Universität Potsdam, (3) Institut für Pharmazie, Freie Universität Berlin

Objectives: Many software packages used in pharmacokinetic modelling use log-likelihood associated objective functions to fit and select models. This model selection framework corresponds to a likelihood ratio test comparing models of different complexity. A model with more parameters is chosen only if it reduces the objective function value by more than a specific threshold in comparison to a simpler model [1]. This threshold is commonly chosen as a quantile of a chi-square distribution, which originated from Wilks’ theorem on the distribution of the likelihood ratio test statistic (LRTS) [2]. The theorem states that under certain conditions and if the simpler model is the true model, the LRTS is asymptotically chi-square distributed with degree of freedom equal to the number of added parameters in the more complex model. This would ensure that the type I error equals alpha if we use the (1-alpha)-quantile of the chi-square distribution as threshold. However, the premises of Wilks’ theorem, including the identifiability of the parameters, are not always fulfilled. In these cases the distribution of the LRTS does not necessarily behave like a chi-square distribution and can deviate substantially [3, 4]. One example where this occurs is the comparison of one- and two-compartmental models [5]. The objective was to illustrate and quantify the resulting differences between the chi-square distribution and the correct distribution including the implications on type I error and power in a simulation study.

Methods: We considered the scenario of the comparison of classical one- and two-compartment models. Here the difference of parameters is two and therefore one would commonly use the quantile of a chi-square distribution with two degrees of freedom. To quantify the deviation of the correct test statistic distribution from the chi-square distribution, we simulated 10,000 concentration time profiles of a classical one-compartment model and calculated the LRTS for each. We assumed a multiplicative log-normally distributed residual error and used different volumes of distributions, clearances, magnitudes of residual error and sampling time points for the one-compartment model. The resulting distributions of the simulated test statistics were compared to the chi-square distribution with two degrees of freedom. In particular the 95%-quantiles were compared.

Results: While the chi-square distribution is independent of design and model parameters, the correct distribution of the likelihood ratio test statistic does depend on the parameter values of the one-compartment model and also on the sampling time points. None of the simulated distributions coincided with the chi-square distribution with two degrees of freedom, some coincided with a chi-square distribution with one degree of freedom. In all considered cases, the quantiles of the chi-square distribution were larger than the quantiles of the simulated distribution. This implies that the use of the chi-square quantiles is more restrictive than intended, i.e. aiming for an alpha level of 5% resulted in an actual alpha level of approximately 1-2%. As a consequence, the power of the test can be reduced leading to a higher likelihood of accepting a one-compartment model where a two-compartment model would be correct. The 95%-quantiles of the simulated test statistics deviate up to two-fold from the 95%-quantile of the chi-square distribution.

Conclusion: The commonly used chi-square quantiles, only dependent on the number of parameters but not on the specific model and design, are not very accurate as thresholds for model selection decisions. The deviations can have a substantial influence on type I error and power. Therefore, simulating the quantiles rather than using the chi-square quantiles should be considered.

References:
[1] L. B. Sheiner. Analysis of pharmacokinetic data using parametric models. III. Hypothesis tests and confidence intervals. Journal of Pharmacokinetics and Biopharmaceutics, 14, 539-555, 1986.
[2] S. S. Wilks. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. The Annals of Mathematical Statistics, 9(1), 60-62, 1938.
[3] J. Pinheiro and D. Bates. Theory and Computational Methods for Linear Mixed-Effects Models. Springer New York, 57-96, 2000.
[4] Y. Shao and X. Liu. Asymptotics for likelihood ratio tests under loss of identifiability. The Annals of Statistics, 31, 807-832, 2003.
[5] T. Machewitz. Likelihood-Ratio-Test bei nicht identifizierbaren Parametern. Master’s thesis, Universität Potsdam, 2016.

Reference: PAGE 28 (2019) Abstr 9090 [www.page-meeting.org/?abstract=9090]

Poster: Methodology - Model Evaluation

PDF poster / presentation (click to open)