Enregistré dans:
Détails bibliographiques
Auteurs principaux: Nurfidausi, Annisaa Fitri, Mancini, Eleonora, Torroni, Paolo
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2510.14922
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866908906269179904
author Nurfidausi, Annisaa Fitri
Mancini, Eleonora
Torroni, Paolo
author_facet Nurfidausi, Annisaa Fitri
Mancini, Eleonora
Torroni, Paolo
contents Depression is a widespread mental health disorder, yet its automatic detection remains challenging. Prior work has explored unimodal and multimodal approaches, with multimodal systems showing promise by leveraging complementary signals. However, existing studies are limited in scope, lack systematic comparisons of features, and suffer from inconsistent evaluation protocols. We address these gaps by systematically exploring feature representations and modelling strategies across EEG, together with speech and text. We evaluate handcrafted features versus pre-trained embeddings, assess the effectiveness of different neural encoders, compare unimodal, bimodal, and trimodal configurations, and analyse fusion strategies with attention to the role of EEG. Consistent subject-independent splits are applied to ensure robust, reproducible benchmarking. Our results show that (i) the combination of EEG, speech and text modalities enhances multimodal detection, (ii) pretrained embeddings outperform handcrafted features, and (iii) carefully designed trimodal models achieve state-of-the-art performance. Our work lays the groundwork for future research in multimodal depression detection.
format Preprint
id arxiv_https___arxiv_org_abs_2510_14922
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG
Nurfidausi, Annisaa Fitri
Mancini, Eleonora
Torroni, Paolo
Artificial Intelligence
Computation and Language
Machine Learning
Audio and Speech Processing
Signal Processing
Depression is a widespread mental health disorder, yet its automatic detection remains challenging. Prior work has explored unimodal and multimodal approaches, with multimodal systems showing promise by leveraging complementary signals. However, existing studies are limited in scope, lack systematic comparisons of features, and suffer from inconsistent evaluation protocols. We address these gaps by systematically exploring feature representations and modelling strategies across EEG, together with speech and text. We evaluate handcrafted features versus pre-trained embeddings, assess the effectiveness of different neural encoders, compare unimodal, bimodal, and trimodal configurations, and analyse fusion strategies with attention to the role of EEG. Consistent subject-independent splits are applied to ensure robust, reproducible benchmarking. Our results show that (i) the combination of EEG, speech and text modalities enhances multimodal detection, (ii) pretrained embeddings outperform handcrafted features, and (iii) carefully designed trimodal models achieve state-of-the-art performance. Our work lays the groundwork for future research in multimodal depression detection.
title TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG
topic Artificial Intelligence
Computation and Language
Machine Learning
Audio and Speech Processing
Signal Processing
url https://arxiv.org/abs/2510.14922