Jawad Iqbal 1, Shahad Mahmud 1, Galib Shams 1, Shetu Mohanto 1, Khomveer Singh 1, Shazzad Hossain 1, Emily Nieves 1
1 Delineate Inc. (Cambridge, United States)
Objectives:
Literature-based workflows are integral to drug development research, yet remain largely manual and resource-intensive. These workflows encompass a broad range of activities, including evidence synthesis for Model-Based Meta-Analysis, population of parameter databases, and the curation of training datasets for specialized AI models — as well as model reuse, where the mathematical models described in publications are reconstructed as executable code. Delineate’s software addresses these challenges through specialized generative AI methods that streamline the entire pipeline, from systematic literature review and automated graph digitization to the construction of analysis-ready datasets formatted for platforms such as NONMEM. Complementing this, Delineate’s Model Copilot automates the reconstruction of runnable model code directly from PDFs and supplementary files, with built-in validation against the results reported in the source publication. This work presents an evaluation of Delineate’s accuracy and performance across these key literature-based workflows.
Methods:
The performance of Delineate’s core capabilities was evaluated across three key domains: automated systematic literature review, literature-based database curation, and model replication. For the literature review and database curation domains, Delineate’s outputs were benchmarked against manually curated datasets. Literature review capabilities were assessed through replication of the literature review component of three published MBMA studies. Database curation performance was evaluated across a range of tasks, including plot digitization and tasks specific to MBMA applications, such as the extraction of dosing events from publication text and the retrieval of baseline patient characteristics from text and tables. Accuracy metrics were computed for each task.
Model replication accuracy was assessed across 10 models spanning a range of complexity, from population pharmacokinetic (PK) models to larger Quantitative Systems Pharmacology (QSP) models. Error was quantified by comparing digitized simulation results reported in the source publications against those produced by the replicated models.
Results:
For automated systematic literature review, Delineate’s literature review agent successfully identified all publications present in the ground truth sets of three published MBMA studies, achieving 100% recall of relevant records. Additionally, the search space was reduced by an average of 89.4%, substantially decreasing the number of records requiring expert review while preserving full coverage of relevant literature.
Database curation performance was evaluated across data extracted from 60 publications, with Delineate achieving strong accuracy across all tasks: plot point detection (94.3%), error bar detection (87.1%), plot-associated data accuracy (93.8%), dosing record extraction (90.4%), and baseline patient characteristic extraction (87.0%).
Model replication accuracy was assessed across 10 benchmark models spanning population PK to QSP systems. Structural reconstruction accuracy, evaluated using a component-wise assessment matrix covering species, parameters, assignments, and full ODE expressions, ranged from 95–100% depending on model complexity. Simulation fidelity, assessed by overlaying reconstructed model outputs with digitized publication figures and quantifying error via normalized root mean square error (NRMSE), yielded 2–10% error across all benchmark models, demonstrating high structural and dynamic agreement with the source publications.
Conclusions:
Delineate demonstrated high accuracy across all evaluated literature-based workflows, including evidence synthesis, database curation, and model replication, establishing it as a viable solution for automating labor-intensive tasks. Beyond reducing manual effort, Delineate lowers the barrier to sophisticated quantitative analyses such as MBMA and QSP modeling, and opens the door to literature mining applications that were previously not feasible at scale.
Reference: PAGE 34 (2026) Abstr 12133 [www.page-meeting.org/?abstract=12133]
Poster: Methodology – AI/Machine Learning