Saved in:
Bibliographic Details
Main Authors: Pan, Yilin, Shi, Yanpei, Zhang, Yijia, Lu, Mingyu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.07277
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910643618054144
author Pan, Yilin
Shi, Yanpei
Zhang, Yijia
Lu, Mingyu
author_facet Pan, Yilin
Shi, Yanpei
Zhang, Yijia
Lu, Mingyu
contents Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts. To compensate for the removed rhythm-related information, the character-level transcripts are proposed to be used as the extra input of a word-level BERT-style system. Finally, the Swin-BERT combines the acoustic features learned from our proposed acoustic-based system with our linguistic-based system. The experiments are based on the two datasets provided by the international dementia detection challenges: the ADReSS and ADReSSo. The results show that both the proposed acoustic and linguistic systems can be better or comparable with previous research on the two datasets. Superior results are achieved by the proposed Swin-BERT system on the ADReSS and ADReSSo datasets, which are 85.58\% F-score and 87.32\% F-score respectively.
format Preprint
id arxiv_https___arxiv_org_abs_2410_07277
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection
Pan, Yilin
Shi, Yanpei
Zhang, Yijia
Lu, Mingyu
Audio and Speech Processing
Artificial Intelligence
Computation and Language
Sound
Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts. To compensate for the removed rhythm-related information, the character-level transcripts are proposed to be used as the extra input of a word-level BERT-style system. Finally, the Swin-BERT combines the acoustic features learned from our proposed acoustic-based system with our linguistic-based system. The experiments are based on the two datasets provided by the international dementia detection challenges: the ADReSS and ADReSSo. The results show that both the proposed acoustic and linguistic systems can be better or comparable with previous research on the two datasets. Superior results are achieved by the proposed Swin-BERT system on the ADReSS and ADReSSo datasets, which are 85.58\% F-score and 87.32\% F-score respectively.
title Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection
topic Audio and Speech Processing
Artificial Intelligence
Computation and Language
Sound
url https://arxiv.org/abs/2410.07277