A simulation-based evaluation of a hidden Markov model to characterize disease transitions using frequently sampled spirometry data
Ludvig Jakobsson1,2,3, Jacob Leander3, Marcus Baaz1, Philip Gerlee2, Mats Jirstrand1
1Fraunhofer-Chalmers Research Centre for Industrial Mathematics, 2Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, 3Clinical Pharmacology and Quantitative Pharmacology, Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca
Introduction: Hidden Markov models (HMM) have previously been used to describe disease dynamics characterized by a set of discrete states (1). These models consist of an observable longitudinal process and an unobservable latent state process modelled as a discrete-time Markov chain. Inference of these models involves both latent state probability estimates, and maximum likelihood estimates of the model parameters. One possible application of HMMs is in respiratory diseases, such as asthma and chronic obstructive pulmonary disease, in which the risk of having an exacerbation, an acute worsening event, is a primary endpoint in clinical trials. In general, the incidence of exacerbations is low, leading to long clinical trials. A composite event endpoint, CompEx (2), has therefore been developed to reduce the necessary clinical trial duration by making use of home-measured spirometry data, such as peak expiratory flow (PEF ). An alternate approach to this is to model PEF with worsening events of varying magnitudes included explicitly in a dynamic model. This can be done using an HMM. In this work, a simulation study was conducted to ensure unbiased estimates across several simulation scenarios. Objective: Investigate how HMMs can be used to make inference about discrete disease transitions using simulated data meant to resemble PEF in asthmatic patients. Methods: The observable data was modelled as a discrete-time Gaussian process with state dependent mean and variance. The discrete latent state process was modelled as a discrete-time Markov chain with two states, describing two hypothetical disease states. Data were simulated from this model for a population , with individual parameters drawn from known probability distributions. These parameter distributions were chosen to ensure a similarity between simulated data and clinical PEF data obtained from home-measured spirometry. The estimation was performed using an implementation of the expectation-maximization (EM) algorithm in which the forward-backward (FB) algorithm was used to calculate state probabilities , and the parameters were estimated to maximize the complete data log-likelihood (an instance of EM also known as the Baum-Welch algorithm (3)). The estimation procedure was performed per simulated individual and evaluated by how well the resulting histograms of parameter estimates resembled the shape of the parameter distributions used for generating the data. The estimates of the latent state probabilities were evaluated using the mean square error (MSE) as well as a confusion matrix. As the probability of transitioning between states was hypothesized to differ between treatment arms, two populations were simulated with different transition probabilities. These parameters were then estimated as above, and the ability to discriminate between the two groups, in terms of statistically different estimates, was evaluated. Evaluation of estimates was performed on varying data lengths (T=100 to T=1000) to investigate the impact of observation length. All simulation, estimation, and analysis was performed in R. Results : For long data series, the distributions of parameter estimates accurately resembled the parameter distributions used for generating the data. This, in combination with statistical tests, showed that discrimination between treatment groups by transition probability estimates was effective. The predicted states from the FB algorithm were shown to align well with the true states. For shorter data series, however, the increased occurrence of cases without any transitions between states made inference of several parameters difficult. In these cases, bias was observed in the estimates. Previously known numerical issues with the Baum-Welch algorithm as well as overfitting were also noted during these simulations. Conclusions: The implemented algorithm accurately estimated state probabilities and parameters when applied to longer data series in which the dynamics of the HMM were explicitly present. However, for this approach to be viable for the intended application, the issues concerning individuals without any transitions must be addressed. Possible avenues for this are mixed effects HMMs as described in (1) or a sequential Monte Carlo approach.
(1) Lavielle, M. Journal of Pharmacokinetics and Pharmacodynamics, 45(1), 91–105. (2018) (2) Fuhlbrigge A. L. et al. The Lancet Respiratory Medicine 5, 577–590 (2017) (3) Bilmes, Jeff A. Berkeley, CA: International Computer Science Institute. pp. 7–13 (1998)