Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pan, Yilin, Shi, Yanpei, Zhang, Yijia, Lu, Mingyu
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language Sound
Online Access:	https://arxiv.org/abs/2410.07277
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910643618054144
author	Pan, Yilin Shi, Yanpei Zhang, Yijia Lu, Mingyu
author_facet	Pan, Yilin Shi, Yanpei Zhang, Yijia Lu, Mingyu
contents	Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts. To compensate for the removed rhythm-related information, the character-level transcripts are proposed to be used as the extra input of a word-level BERT-style system. Finally, the Swin-BERT combines the acoustic features learned from our proposed acoustic-based system with our linguistic-based system. The experiments are based on the two datasets provided by the international dementia detection challenges: the ADReSS and ADReSSo. The results show that both the proposed acoustic and linguistic systems can be better or comparable with previous research on the two datasets. Superior results are achieved by the proposed Swin-BERT system on the ADReSS and ADReSSo datasets, which are 85.58\% F-score and 87.32\% F-score respectively.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_07277
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection Pan, Yilin Shi, Yanpei Zhang, Yijia Lu, Mingyu Audio and Speech Processing Artificial Intelligence Computation and Language Sound Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD, is removed while transcribing the audio recordings into transcripts. To compensate for the removed rhythm-related information, the character-level transcripts are proposed to be used as the extra input of a word-level BERT-style system. Finally, the Swin-BERT combines the acoustic features learned from our proposed acoustic-based system with our linguistic-based system. The experiments are based on the two datasets provided by the international dementia detection challenges: the ADReSS and ADReSSo. The results show that both the proposed acoustic and linguistic systems can be better or comparable with previous research on the two datasets. Superior results are achieved by the proposed Swin-BERT system on the ADReSS and ADReSSo datasets, which are 85.58\% F-score and 87.32\% F-score respectively.
title	Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection
topic	Audio and Speech Processing Artificial Intelligence Computation and Language Sound
url	https://arxiv.org/abs/2410.07277

Similar Items