2011 - Athens - Greece

PAGE 2011: Other topics - Methodology
Ron Keizer

The bootstrap of Stepwise Covariate Modeling using linear approximations

Ron J Keizer, Akash Khandelwal, Andrew C Hooker, Mats O Karlsson

Uppsala University

Background: Stepwise covariate modeling (scm) is a tool for automatized building of a covariate model on top of a base structural model [1]. Previously we have shown that using a linear approximation to the base model in the scm provides similar results compared to an scm based on the original model, while greatly reducing computation times [2]. Performing a bootstrap analysis of the scm can provide valuable information, e.g. about the type I error of covariate inclusion (through inclusion of a random covariate), correlations between covariate inclusions, and influential individuals. However, a bootstrap of a regular (non-linearized) scm may require considerable computation time, easily taking up to weeks or months depending on model complexity and size of the dataset. A bootstrap of a linearized scm can be performed much faster, typically within a day.

Objectives: To compare results obtained with the linearized and non-linearized bootstrap-scm.

Methods: Two real datasets were used for which a structural model had been developed earlier. Dummy covariates were introduced into the datasets based on the randomized original covariates, to allow for investigation of type I error for covariate inclusion. Three methods of performing the bootstrap-scm were implemented: (i) linearized scm, with bootstrap based on the original dataset, (ii) linearized scm, with bootstrap based on the dataset with derivatives obtained from linearizing the base model and fitting the original dataset (for a faster, but more approximate, linearized scm), (iii) non-linearized scm with bootstrap based on the original dataset.

Results obtained with the linearized and non-linearized bootstrap methods were compared using histograms of covariate inclusion and plots showing the distribution of covariate model size. Based on the full covariate models constructed for each bootstrap sample, several additional diagnostic plots were constructed to study the variability and correlations in covariate inclusion, and the prevalence of influential individuals in the dataset. Additionally, the 200 final covariate models obtained in the bootstrap-scms were re-estimated using the original dataset, and the OFV compared to the OFV of the final models obtained from the scm on the original dataset.

Results: Covariate inclusion rates were very similar between the two linearized bootstrap-scm methods. Generally, the linearized bootstrap-scm showed slightly lower covariate inclusion rates than the non-linearized scm. Distributions of covariate model sizes were highly similar between the two linearized methods, and were also similar to the non-linearized bootstrap-scm. Diagnostic plots for the bootstrap-scm included covariate inclusion rates for single covariates, inclusion rates for combinations of covariates (correlation), histograms of the most common combinations of covariates, the distribution of covariate model size, and plots to study influential individuals. Overall, these plots showed similar results for the linearized and non-linearized bootstrap-scm. Interestingly, a small fraction (~10% for both datasets) of the final full covariate models obtained in the bootstrap procedures showed a lower OFV than the final model in the original scm, when the final (non-linearized) model was refitted on the original dataset. For both datasets, the linearized methods both completed within a day, while the non-linearized bootstrap-scm took several days to complete. The linearized bootstrap based on the dataset with derivatives (ii) was fastest.

Conclusion: This analysis showed that linearization of the model allows the implementation of a bootstrap-scm within a reasonable time-span, while producing results comparable to a bootstrap-scm based on the original non-linearized model. Several diagnostic plots were proposed for the bootstrap-scm to aid the construction of the covariate model.

[1] Karlsson & Jonsson. PAGE 7 (1998) Abstr 678 [www.page-meeting.org/?abstract=678]
[2] Khandelwal et al. PAGE 19 (2010) Abstr 1925 [www.page-meeting.org/?abstract=1925]

Reference: PAGE 20 (2011) Abstr 2161 [www.page-meeting.org/?abstract=2161]
Poster: Other topics - Methodology
Click to open PDF poster/presentation (click to open)