Saved in:
Bibliographic Details
Main Authors: Painter, Jeffery L, Haguinet, François, Powell, Gregory E, Bate, Andrew
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.20737
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908314549354496
author Painter, Jeffery L
Haguinet, François
Powell, Gregory E
Bate, Andrew
author_facet Painter, Jeffery L
Haguinet, François
Powell, Gregory E
Bate, Andrew
contents Semantic similarity measures (SSMs) are widely used in biomedical research but remain underutilized in pharmacovigilance. This study evaluates six ontology-based SSMs for clustering MedDRA Preferred Terms (PTs) in drug safety data. Using the Unified Medical Language System (UMLS), we assess each method's ability to group PTs around medically meaningful centroids. A high-throughput framework was developed with a Java API and Python and R interfaces support large-scale similarity computations. Results show that while path-based methods perform moderately with F1 scores of 0.36 for WUPALMER and 0.28 for LCH, intrinsic information content (IC)-based measures, especially INTRINSIC-LIN and SOKAL, consistently yield better clustering accuracy (F1 score of 0.403). Validated against expert review and standard MedDRA queries (SMQs), our findings highlight the promise of IC-based SSMs in enhancing pharmacovigilance workflows by improving early signal detection and reducing manual review.
format Preprint
id arxiv_https___arxiv_org_abs_2503_20737
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety
Painter, Jeffery L
Haguinet, François
Powell, Gregory E
Bate, Andrew
Computation and Language
I.2.4; G.3; H.3.3
Semantic similarity measures (SSMs) are widely used in biomedical research but remain underutilized in pharmacovigilance. This study evaluates six ontology-based SSMs for clustering MedDRA Preferred Terms (PTs) in drug safety data. Using the Unified Medical Language System (UMLS), we assess each method's ability to group PTs around medically meaningful centroids. A high-throughput framework was developed with a Java API and Python and R interfaces support large-scale similarity computations. Results show that while path-based methods perform moderately with F1 scores of 0.36 for WUPALMER and 0.28 for LCH, intrinsic information content (IC)-based measures, especially INTRINSIC-LIN and SOKAL, consistently yield better clustering accuracy (F1 score of 0.403). Validated against expert review and standard MedDRA queries (SMQs), our findings highlight the promise of IC-based SSMs in enhancing pharmacovigilance workflows by improving early signal detection and reducing manual review.
title Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety
topic Computation and Language
I.2.4; G.3; H.3.3
url https://arxiv.org/abs/2503.20737