Saved in:
Bibliographic Details
Main Authors: Gomes, Gonçalo, Coutinho, Isabel, Martins, Bruno
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03172
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917582567636992
author Gomes, Gonçalo
Coutinho, Isabel
Martins, Bruno
author_facet Gomes, Gonçalo
Coutinho, Isabel
Martins, Bruno
contents Although the International Classification of Diseases (ICD) has been adopted worldwide, manually assigning ICD codes to clinical text is time-consuming, error-prone, and expensive, motivating the development of automated approaches. This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work. We specifically employ a strong Transformer-based model as a text encoder and, to handle lengthy clinical narratives, we explored either (a) adapting the base encoder model into a Longformer, or (b) dividing the text into chunks and processing each chunk independently. The representations produced by the encoder are combined with a label embedding mechanism that explores diverse ICD code synonyms. Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding, with the label embeddings significantly contributing to the good performance. Our approach also leads to properly calibrated classification results, which can effectively inform downstream tasks such as quantification.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03172
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings
Gomes, Gonçalo
Coutinho, Isabel
Martins, Bruno
Computation and Language
Artificial Intelligence
Although the International Classification of Diseases (ICD) has been adopted worldwide, manually assigning ICD codes to clinical text is time-consuming, error-prone, and expensive, motivating the development of automated approaches. This paper describes a novel approach for automated ICD coding, combining several ideas from previous related work. We specifically employ a strong Transformer-based model as a text encoder and, to handle lengthy clinical narratives, we explored either (a) adapting the base encoder model into a Longformer, or (b) dividing the text into chunks and processing each chunk independently. The representations produced by the encoder are combined with a label embedding mechanism that explores diverse ICD code synonyms. Experiments with different splits of the MIMIC-III dataset show that the proposed approach outperforms the current state-of-the-art models in ICD coding, with the label embeddings significantly contributing to the good performance. Our approach also leads to properly calibrated classification results, which can effectively inform downstream tasks such as quantification.
title Accurate and Well-Calibrated ICD Code Assignment Through Attention Over Diverse Label Embeddings
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2402.03172