II-084

Homoscedastic Uncertainty Weighting-Enhanced Multi-Task Multi-Granularity NER for Pharmacokinetic Parameters

Lara Carter 1, Watjana Lilaonitkul 1, Frank Kloprogge 1, Joseph Standing 1

1 University College London (London, United Kingdom)

Objectives: The exponential growth of pharmacological information in unstructured sources such as scientific publications and clinical trial reports presents a barrier to large-scale pharmacometric data integration. Current biomedical text mining systems remain sensitive to annotation sparsity and limited training data, limiting reliable extraction of quantitative pharmacokinetic (PK) parameters and associated covariates required for model-informed drug development. Improving robustness under data-constrained conditions is therefore critical.
We present an enhanced BioBERT-based multi-task, multi-granularity named entity recognition (NER) framework designed to improve learning stability through homoscedastic uncertainty-based task weighting. The model jointly learns token-level PK entity recognition alongside auxiliary token- and sentence-level objectives, with adaptive uncertainty weighting regulating task contributions during optimisation. This formulation acts as an uncertainty-aware regulariser, stabilising gradient updates and reducing overfitting in low-resource regimes. Applied to a gold-standard corpus of PubMed-indexed PK articles, the approach improves robustness in extracting PK entities and associated numerical values and units, particularly under simulated data scarcity.
Our objectives were: (i) to evaluate multi-task, multi-granularity learning for PK NER; (ii) to assess homoscedastic uncertainty-based loss weighting; (iii) to determine robustness under data scarcity; and (iv) to characterise task interaction dynamics in an uncertainty-weighted system.

Methods: We adapted the multi-task, multi-granularity BioBERT framework of Tong et al. [1] to PK parameter extraction, using hard parameter sharing across four classification heads built on a base BioBERT model: (i) primary token-level NER; (ii) token-level multi-token classification (mtCLS) distinguishing single- vs multi-token entities; (iii) sentence-level binary classification (bCLS) for PK presence; and (iv) sentence-level multi-class classification (mCLS) estimating mentions per sentence.
Evaluation used two gold-standard corpora: PK2020 (where only PK parameter entities were labelled) and PK2022 (expanded labels: PK parameter, VALUE, UNIT, RANGE, COMPARE).
We compared: (1) a fine-tuned single-task BioBERT baseline [2]; (2) multi-task learning with heuristic loss weighting; (3) homoscedastic uncertainty-weighted multi-task learning [3]; and (4) hybrid weighting schemes. Training used batch size 32, maximum sequence length 256, dropout 0.1, Adam (3×10⁻⁵) with warmup and linear decay, weight decay 0.01, early stopping (patience 3), and 10 matched seeds.
Robustness was evaluated via progressive downsampling (75%, 50%, 25%, 10%, 7.5%, 5%, 2.5%) with class balance preserved. Repeated-measures ANOVA assessed configuration effects and configuration × data-size interactions. We also conducted auxiliary ablation and pairwise task interaction analysis using cosine similarity of shared encoder representations.

Results: Under full-data conditions on PK2020, differences between multi-task variants and the single-task baseline were minimal. Mean NER F1-scores for multi-task models were ~85% versus 83.11% for the single-task model, with no significant effect of model formulation (p = 0.9998), suggesting a ceiling effect of the BioBERT encoder.
Under data scarcity, training size significantly influenced performance. For PK2020, ANOVA revealed significant main effects of data size (p < 0.001) and model configuration (p < 0.001), and a significant interaction (p < 0.001), indicating differential robustness. In extreme low-data regimes, the single-task baseline exhibited greater variance across seeds, whereas multi-task models, particularly the uncertainty-weighted variant, showed more stable scaling and clear performance gains relative to the single-task state-of-the-art. These effects were amplified on the higher-cardinality PK2022 dataset, where the multi-task model better captured complex patterns under severe data limitation. Auxiliary task ablation produced small, inconsistent effects, suggesting robustness was not driven by any single objective. Pairwise interaction analysis showed a significant effect of task pairing on cosine similarity (p < 0.001): sentence-level tasks were most aligned, while sentence-token pairs were more dissimilar. Weighting strategy did not significantly alter this structure, indicating that robustness arises from complementary multi-granular representations rather than isolated task effects. Conclusions: Multi-task, multi-granularity learning did not significantly outperform a strong single-task BioBERT baseline when data were abundant. Its primary benefit emerged under data-scarce conditions, where auxiliary supervision acted as a regulariser, reducing variance and slowing degradation in low-resource regimes, particularly for the higher-cardinality PK2022 dataset. Homoscedastic uncertainty weighting did not change full-data mean performance but contributed to stable optimisation under limited supervision. Overall, for PK NER, multi-task learning appears most valuable for improving generalisation and stability under annotation scarcity rather than maximising peak F1 in high-resource settings. Robust low-resource extraction of PK entities and quantitative attributes strengthens the evidentiary foundation for integrating AI-driven methods into large-scale parameter aggregation, thereby supporting more reliable pharmacometric modelling. References: [1] Yiqi Tong, Yidong Chen, and Xiaodong Shi. “A Multi-Task Approach for Improving Biomedical Named Entity Recognition by Incorporating Multi-Granularity Information”. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online: Association for Computational Linguistics, Aug. 2021, pp. 4804-4813. url: https://aclanthology.org/2021.findings-acl.426. [2] Ferran Gonzalez Hernandez et al. “Named entity recognition of pharmacokinetic parameters in the scientific literature”. In: Scientific Reports 14.1 (2024), p. 23485. doi: 10.1038/s41598-024-60077-6. [3] Alex Kendall, Yarin Gal, and Roberto Cipolla. “Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics”. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018, pp. 7482-7491.89

Reference: PAGE 34 (2026) Abstr 11895 [www.page-meeting.org/?abstract=11895]

Poster: Methodology – AI/Machine Learning