Pharmacometrics workflow: standards for provenance capture and workflow definition
Jonathan Chard (1), Justin Wilkins (2), Amy Cheung (3), Evan Wang (4), Mike K Smith (5), Phylinda Chan (5), Gareth Smith (6), Richard Kaye (1), Maria Luisa Sardu (7), Stuart Moodie (8)
(1) Mango Solutions, (2) Occams, (3) Astra Zeneca, (4) Eli Lilly, (5) Pfizer, (6) Cyprotex, (7) Merck Serono, (8) Eight Pillars Ltd
Objectives: To develop a standard, implemented with a workflow software tool, for capturing the full range of activities and entities that are performed during a pharmacometric analysis, based on existing standards. Capturing the provenance of task outputs (how was this created) as well as providing knowledge management for the pharmacometrics workflow (how did we get to this model) facilitates reproducibility, sharing, and communication of results with others. Using this standard, we can visualise the steps taken during the analysis, reproduce analysis steps, and capture decisions, assumptions, key steps, and support the process of quality control, as suggested in the definition of Model-Informed Drug Discovery and Development (MID3)
Methods: Several existing workflow tools and provenance capture standards were evaluated [3,4,5], but the PROV-O ontology was selected due to its wide adoption, extensibility and suitability for capturing the provenance and relationships between activities and entities within and across projects. Analysis artefacts, actions, information and relationships were mapped onto concepts defined within PROV-O. Tools were developed to support the pharmacometric workflow; storing files in Git , generating the provenance information representing the steps taken by the pharmacometrician, and to query the captured information to visualise, report, and regenerate the artefacts within an analysis.
Results: The standard allows tracking of users, software tools, and files in an analysis, while capturing assumptions, decisions and relationships extending beyond input to output. Information can be captured at multiple levels of detail, allowing a reviewer to understand key decisions taken during an analysis, or to trace through the software used to generate results. It is possible to identify project artefacts that are out of date (e.g. a diagnostic plot that should be recreated due to dataset change), and re-run activities. Analysts can apply this information to generate documentation, from run records to complete reports. Knowledge shared between team members is enhanced, avoiding duplication of work, increasing quality and reproducibility. Traceability assists reviewers and regulators to evaluate assumptions, results and conclusions.
Conclusions: Capturing structured information with software tools helps to ensure data integrity, facilitating QC and adoption of MID3 concepts.
Acknowledgements: This work is presented on behalf of the DDMoRe project (www.ddmore.eu).
 EFPIA MID3 Workgroup. Good Practices in Model-Informed Drug Discovery and Development (MID3): Practice, Application and Documentation. CPT: Pharmacometrics Syst. Pharmacol. 2016 doi: 10.1002/psp4.12049
 PROV-O: The PROV Ontology. Timothy Lebo, Satya Sahoo, Deborah McGuinness, https://www.w3.org/TR/prov-o/
 Paolo Missier, Saumen Dey, Khalid Belhajjame, Victor Cuevas-Vicenttin, Bertram Ludascher. D-PROV: Extending the PROV Provenance Model with Workflow Structure. https://www.usenix.org/system/files/conference/tapp13/tapp13-final3.pdf
 Khalid Belhajjame , Oscar Corcho , Daniel Garijo , Jun Zhao , Paolo Missier , David Newman , Raul Palma , Sean Bechhofer , Esteban Garcia Cuesta , Jose Manuel Gomez-Perez , Graham Klyne , Kevin Page , Marco Roos , Jose Enrique Ruiz , Stian Soiland-Reyes , Lourdes Verdes-Montenegro , David De Roure , Carole A. Goble. Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse. http://users.ox.ac.uk/~oerc0033/preprints/sepublica2012.pdf
 Khalid Belhajjame, Jun Zhao, Daniel Garijo, Matthew Gamble, Kristina Hettne, Raul Palma, Eleni Mina, Oscar Corcho, José Manuel Gómez-Pérez, Sean Bechhofer, Graham Klyne, Carole Goble : Using a suite of ontologies for preserving workflow-centric research objects. http://www.sciencedirect.com/science/article/pii/S1570826815000049
 Git Distributed Version Control System. https://git-scm.com/