Saved in:
Bibliographic Details
Main Authors: Wang, Xindi, Mercer, Robert E., Rudzicz, Frank
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.19084
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916265116827648
author Wang, Xindi
Mercer, Robert E.
Rudzicz, Frank
author_facet Wang, Xindi
Mercer, Robert E.
Rudzicz, Frank
contents The International Classification of Diseases (ICD) is an authoritative medical classification system of different diseases and conditions for clinical and management purposes. ICD indexing assigns a subset of ICD codes to a medical record. Since human coding is labour-intensive and error-prone, many studies employ machine learning to automate the coding process. ICD coding is a challenging task, as it needs to assign multiple codes to each medical document from an extremely large hierarchically organized collection. In this paper, we propose a novel approach for ICD indexing that adopts three ideas: (1) we use a multi-level deep dilated residual convolution encoder to aggregate the information from the clinical notes and learn document representations across different lengths of the texts; (2) we formalize the task of ICD classification with auxiliary knowledge of the medical records, which incorporates not only the clinical texts but also different clinical code terminologies and drug prescriptions for better inferring the ICD codes; and (3) we introduce a graph convolutional network to leverage the co-occurrence patterns among ICD codes, aiming to enhance the quality of label representations. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures.
format Preprint
id arxiv_https___arxiv_org_abs_2405_19084
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification
Wang, Xindi
Mercer, Robert E.
Rudzicz, Frank
Computation and Language
The International Classification of Diseases (ICD) is an authoritative medical classification system of different diseases and conditions for clinical and management purposes. ICD indexing assigns a subset of ICD codes to a medical record. Since human coding is labour-intensive and error-prone, many studies employ machine learning to automate the coding process. ICD coding is a challenging task, as it needs to assign multiple codes to each medical document from an extremely large hierarchically organized collection. In this paper, we propose a novel approach for ICD indexing that adopts three ideas: (1) we use a multi-level deep dilated residual convolution encoder to aggregate the information from the clinical notes and learn document representations across different lengths of the texts; (2) we formalize the task of ICD classification with auxiliary knowledge of the medical records, which incorporates not only the clinical texts but also different clinical code terminologies and drug prescriptions for better inferring the ICD codes; and (3) we introduce a graph convolutional network to leverage the co-occurrence patterns among ICD codes, aiming to enhance the quality of label representations. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures.
title Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification
topic Computation and Language
url https://arxiv.org/abs/2405.19084