Saved in:
Bibliographic Details
Main Authors: Nguyen, Mai H., Likhite, Shibani, Tang, Jiawei, Mahendran, Darshini, McInnes, Bridget T.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.18605
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912873957031936
author Nguyen, Mai H.
Likhite, Shibani
Tang, Jiawei
Mahendran, Darshini
McInnes, Bridget T.
author_facet Nguyen, Mai H.
Likhite, Shibani
Tang, Jiawei
Mahendran, Darshini
McInnes, Bridget T.
contents The extraction of chemical-gene relations plays a pivotal role in understanding the intricate interactions between chemical compounds and genes, with significant implications for drug discovery, disease understanding, and biomedical research. This paper presents a data set created by merging the ChemProt and DrugProt datasets to augment sample counts and improve model accuracy. We evaluate the merged dataset using two state of the art relationship extraction algorithms: Bidirectional Encoder Representations from Transformers (BERT) specifically BioBERT, and Graph Convolutional Networks (GCNs) combined with BioBERT. While BioBERT excels at capturing local contexts, it may benefit from incorporating global information essential for understanding chemical-gene interactions. This can be achieved by integrating GCNs with BioBERT to harness both global and local context. Our results show that by integrating the ChemProt and DrugProt datasets, we demonstrated significant improvements in model performance, particularly in CPR groups shared between the datasets. Incorporating the global context using GCN can help increase the overall precision and recall in some of the CPR groups over using just BioBERT.
format Preprint
id arxiv_https___arxiv_org_abs_2405_18605
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Merged ChemProt-DrugProt for Relation Extraction from Biomedical Literature
Nguyen, Mai H.
Likhite, Shibani
Tang, Jiawei
Mahendran, Darshini
McInnes, Bridget T.
Computation and Language
Artificial Intelligence
Information Retrieval
Molecular Networks
The extraction of chemical-gene relations plays a pivotal role in understanding the intricate interactions between chemical compounds and genes, with significant implications for drug discovery, disease understanding, and biomedical research. This paper presents a data set created by merging the ChemProt and DrugProt datasets to augment sample counts and improve model accuracy. We evaluate the merged dataset using two state of the art relationship extraction algorithms: Bidirectional Encoder Representations from Transformers (BERT) specifically BioBERT, and Graph Convolutional Networks (GCNs) combined with BioBERT. While BioBERT excels at capturing local contexts, it may benefit from incorporating global information essential for understanding chemical-gene interactions. This can be achieved by integrating GCNs with BioBERT to harness both global and local context. Our results show that by integrating the ChemProt and DrugProt datasets, we demonstrated significant improvements in model performance, particularly in CPR groups shared between the datasets. Incorporating the global context using GCN can help increase the overall precision and recall in some of the CPR groups over using just BioBERT.
title Merged ChemProt-DrugProt for Relation Extraction from Biomedical Literature
topic Computation and Language
Artificial Intelligence
Information Retrieval
Molecular Networks
url https://arxiv.org/abs/2405.18605