Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Nurfidausi, Annisaa Fitri, Mancini, Eleonora, Torroni, Paolo
Format:	Preprint
Publié:	2025
Sujets:	Artificial Intelligence Computation and Language Machine Learning Audio and Speech Processing Signal Processing
Accès en ligne:	https://arxiv.org/abs/2510.14922
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866908906269179904
author	Nurfidausi, Annisaa Fitri Mancini, Eleonora Torroni, Paolo
author_facet	Nurfidausi, Annisaa Fitri Mancini, Eleonora Torroni, Paolo
contents	Depression is a widespread mental health disorder, yet its automatic detection remains challenging. Prior work has explored unimodal and multimodal approaches, with multimodal systems showing promise by leveraging complementary signals. However, existing studies are limited in scope, lack systematic comparisons of features, and suffer from inconsistent evaluation protocols. We address these gaps by systematically exploring feature representations and modelling strategies across EEG, together with speech and text. We evaluate handcrafted features versus pre-trained embeddings, assess the effectiveness of different neural encoders, compare unimodal, bimodal, and trimodal configurations, and analyse fusion strategies with attention to the role of EEG. Consistent subject-independent splits are applied to ensure robust, reproducible benchmarking. Our results show that (i) the combination of EEG, speech and text modalities enhances multimodal detection, (ii) pretrained embeddings outperform handcrafted features, and (iii) carefully designed trimodal models achieve state-of-the-art performance. Our work lays the groundwork for future research in multimodal depression detection.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_14922
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG Nurfidausi, Annisaa Fitri Mancini, Eleonora Torroni, Paolo Artificial Intelligence Computation and Language Machine Learning Audio and Speech Processing Signal Processing Depression is a widespread mental health disorder, yet its automatic detection remains challenging. Prior work has explored unimodal and multimodal approaches, with multimodal systems showing promise by leveraging complementary signals. However, existing studies are limited in scope, lack systematic comparisons of features, and suffer from inconsistent evaluation protocols. We address these gaps by systematically exploring feature representations and modelling strategies across EEG, together with speech and text. We evaluate handcrafted features versus pre-trained embeddings, assess the effectiveness of different neural encoders, compare unimodal, bimodal, and trimodal configurations, and analyse fusion strategies with attention to the role of EEG. Consistent subject-independent splits are applied to ensure robust, reproducible benchmarking. Our results show that (i) the combination of EEG, speech and text modalities enhances multimodal detection, (ii) pretrained embeddings outperform handcrafted features, and (iii) carefully designed trimodal models achieve state-of-the-art performance. Our work lays the groundwork for future research in multimodal depression detection.
title	TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG
topic	Artificial Intelligence Computation and Language Machine Learning Audio and Speech Processing Signal Processing
url	https://arxiv.org/abs/2510.14922

Documents similaires