Saved in:
Bibliographic Details
Main Authors: Kim, Hajung, Kim, Chanhwi, Sohn, Jiwoong, Beck, Tim, Rei, Marek, Kim, Sunkyu, Simpson, T Ian, Posma, Joram M, Lain, Antoine, Sung, Mujeen, Kang, Jaewoo
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.09744
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916568828477440
author Kim, Hajung
Kim, Chanhwi
Sohn, Jiwoong
Beck, Tim
Rei, Marek
Kim, Sunkyu
Simpson, T Ian
Posma, Joram M
Lain, Antoine
Sung, Mujeen
Kang, Jaewoo
author_facet Kim, Hajung
Kim, Chanhwi
Sohn, Jiwoong
Beck, Tim
Rei, Marek
Kim, Sunkyu
Simpson, T Ian
Posma, Joram M
Lain, Antoine
Sung, Mujeen
Kang, Jaewoo
contents The objective of BioCreative8 Track 3 is to extract phenotypic key medical findings embedded within EHR texts and subsequently normalize these findings to their Human Phenotype Ontology (HPO) terms. However, the presence of diverse surface forms in phenotypic findings makes it challenging to accurately normalize them to the correct HPO terms. To address this challenge, we explored various models for named entity recognition and implemented data augmentation techniques such as synonym marginalization to enhance the normalization step. Our pipeline resulted in an exact extraction and normalization F1 score 2.6\% higher than the mean score of all submissions received in response to the challenge. Furthermore, in terms of the normalization F1 score, our approach surpassed the average performance by 1.9\%. These findings contribute to the advancement of automated medical data extraction and normalization techniques, showcasing potential pathways for future research and application in the biomedical domain.
format Preprint
id arxiv_https___arxiv_org_abs_2501_09744
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle KU AIGEN ICL EDI@BC8 Track 3: Advancing Phenotype Named Entity Recognition and Normalization for Dysmorphology Physical Examination Reports
Kim, Hajung
Kim, Chanhwi
Sohn, Jiwoong
Beck, Tim
Rei, Marek
Kim, Sunkyu
Simpson, T Ian
Posma, Joram M
Lain, Antoine
Sung, Mujeen
Kang, Jaewoo
Artificial Intelligence
The objective of BioCreative8 Track 3 is to extract phenotypic key medical findings embedded within EHR texts and subsequently normalize these findings to their Human Phenotype Ontology (HPO) terms. However, the presence of diverse surface forms in phenotypic findings makes it challenging to accurately normalize them to the correct HPO terms. To address this challenge, we explored various models for named entity recognition and implemented data augmentation techniques such as synonym marginalization to enhance the normalization step. Our pipeline resulted in an exact extraction and normalization F1 score 2.6\% higher than the mean score of all submissions received in response to the challenge. Furthermore, in terms of the normalization F1 score, our approach surpassed the average performance by 1.9\%. These findings contribute to the advancement of automated medical data extraction and normalization techniques, showcasing potential pathways for future research and application in the biomedical domain.
title KU AIGEN ICL EDI@BC8 Track 3: Advancing Phenotype Named Entity Recognition and Normalization for Dysmorphology Physical Examination Reports
topic Artificial Intelligence
url https://arxiv.org/abs/2501.09744