Jason Chittenden (1), Jan Huisman (1), Kevin Dykstra (1)
(1) qPharmetra, LLC
Objectives: The creation of NONMEM or similar datasets often involves interactions between multiple groups: a vendor that creates the raw datasets; the pharmacometrician that specifies the desired content and format of the analysis data set; data programmers who construct the dataset according to specifications; and quality control personnel who verify the dataset contents. Tools that preserve the rigor of the traditional approach while reducing the cycle time and enabling pharmacometricians to take greater ownership of the final dataset can increase the efficiency of the overall modeling and simulation workflow.
Methods: PMDatR builds on top of popular R packages such as dplyr[1] and tidyr[2], so much of the syntax is well known and documented. The key features that facilitate data work flows include: functions for transforming, reformatting (such as automatically unstacking covariate columns in result domains like LB), and verifying inputs; common transformations such as filling forward; computation of ADDL dosing; unit aware columns and transformations; and automated code generation from a settings file. Some common transformations include: automatic conversion of date/time formats; change from baseline; time after dose and similar calculations; and filling of missing covariate values. The settings file, provided in YAML format, allows for integration with customized graphical user interfaces. In addition, the settings file can be used to provide sensible and standardized default settings.
The overall process for dataset construction follows: 1) load source data and convert to standardized formats using customizable mappings; 2) assign source data sets to a ‘type’ of data (observation, dose, merged covariates, event covariates) and apply mappings and transformations; 3) stack event type data and merge covariates by key columns; 4) apply transformations and filters that require the entire dataset (e.g. time after dose).
Results: PMDatR is already in use in-house where it enables standardization of scripting style and quality control efforts. It is also in use at a major pharmaceutical company where it underpins a dataset creation tool having a graphical user interface to provide settings to the PMDatR package and collect and display results. The approach that links PMDatR as a back-end to a graphical user interface allows for additional features such as: drag-and-drop selection of columns and transformations; point-and-click selection of options and templates; syntax and semantic error checking; and additional help features for less experienced R programmers.
Conclusion: PMDatR provides a framework for pharmacometric dataset creation that is useful both as a standalone R package that provides a few additional tools for data manipulation, and as a powerful backend to more feature rich graphical user interface base applications that can integrate it into a data management ecosystem. In both cases, the benefits of a reusable, templatized, workflow can result in faster dataset creation and improved dataset quality.
References:
[1] Hadley Wickham and Romain Francois (2016). dplyr: A Grammar of Data Manipulation. R package version 0.5.0. https://CRAN.R-project.org/package=dplyr
[2] Hadley Wickham (2016). tidyr: Easily Tidy Data with `spread()` and `gather()` Functions. R package version 0.4.0. https://CRAN.R-project.org/package=tidyr
Reference: PAGE 27 (2018) Abstr 8761 [www.page-meeting.org/?abstract=8761]
Poster: Methodology - Other topics