Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Painter, Jeffery L, Haguinet, François, Powell, Gregory E, Bate, Andrew
Format:	Preprint
Published:	2025
Subjects:	Computation and Language I.2.4; G.3; H.3.3
Online Access:	https://arxiv.org/abs/2503.20737
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908314549354496
author	Painter, Jeffery L Haguinet, François Powell, Gregory E Bate, Andrew
author_facet	Painter, Jeffery L Haguinet, François Powell, Gregory E Bate, Andrew
contents	Semantic similarity measures (SSMs) are widely used in biomedical research but remain underutilized in pharmacovigilance. This study evaluates six ontology-based SSMs for clustering MedDRA Preferred Terms (PTs) in drug safety data. Using the Unified Medical Language System (UMLS), we assess each method's ability to group PTs around medically meaningful centroids. A high-throughput framework was developed with a Java API and Python and R interfaces support large-scale similarity computations. Results show that while path-based methods perform moderately with F1 scores of 0.36 for WUPALMER and 0.28 for LCH, intrinsic information content (IC)-based measures, especially INTRINSIC-LIN and SOKAL, consistently yield better clustering accuracy (F1 score of 0.403). Validated against expert review and standard MedDRA queries (SMQs), our findings highlight the promise of IC-based SSMs in enhancing pharmacovigilance workflows by improving early signal detection and reducing manual review.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_20737
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety Painter, Jeffery L Haguinet, François Powell, Gregory E Bate, Andrew Computation and Language I.2.4; G.3; H.3.3 Semantic similarity measures (SSMs) are widely used in biomedical research but remain underutilized in pharmacovigilance. This study evaluates six ontology-based SSMs for clustering MedDRA Preferred Terms (PTs) in drug safety data. Using the Unified Medical Language System (UMLS), we assess each method's ability to group PTs around medically meaningful centroids. A high-throughput framework was developed with a Java API and Python and R interfaces support large-scale similarity computations. Results show that while path-based methods perform moderately with F1 scores of 0.36 for WUPALMER and 0.28 for LCH, intrinsic information content (IC)-based measures, especially INTRINSIC-LIN and SOKAL, consistently yield better clustering accuracy (F1 score of 0.403). Validated against expert review and standard MedDRA queries (SMQs), our findings highlight the promise of IC-based SSMs in enhancing pharmacovigilance workflows by improving early signal detection and reducing manual review.
title	Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety
topic	Computation and Language I.2.4; G.3; H.3.3
url	https://arxiv.org/abs/2503.20737

Similar Items