Salvato in:
Dettagli Bibliografici
Autori principali: Wang, Yusong, Mao, Keyu, Obi, Takao, Shao, Minghao, Funakoshi, Kotaro
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2605.23328
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866913155133734912
author Wang, Yusong
Mao, Keyu
Obi, Takao
Shao, Minghao
Funakoshi, Kotaro
author_facet Wang, Yusong
Mao, Keyu
Obi, Takao
Shao, Minghao
Funakoshi, Kotaro
contents Emotion Recognition in Conversation is a core component of affective computing, while current resources of sign language emotion datasets primarily focus on isolated sentences and lack conversational context. Models trained exclusively on these isolated utterances demonstrate degraded performance in real world scenarios because they cannot utilize historical dialogue flow. To address this structural limitation, we introduce the ERC task to sign language video analysis and propose the eJSL Dialog dataset. Constructed using the scripts from the STUDIES corpus, the dataset contains 1,920 video samples organized into 480 unique dialogues. We conduct systematic benchmarking on this dataset using models ranging from isolated visual networks to multimodal conversational architectures. The results reveal a domain gap when applying generic multimodal conversational emotion recognition models to sign language. These findings demonstrate the explicit need for context aware visual extractors specific to sign language and indicate that expanding the scale of conversational datasets to support large scale pre-training is a necessary next step for future research.
format Preprint
id arxiv_https___arxiv_org_abs_2605_23328
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Emotion Recognition in Sign Language Conversation
Wang, Yusong
Mao, Keyu
Obi, Takao
Shao, Minghao
Funakoshi, Kotaro
Computation and Language
Emotion Recognition in Conversation is a core component of affective computing, while current resources of sign language emotion datasets primarily focus on isolated sentences and lack conversational context. Models trained exclusively on these isolated utterances demonstrate degraded performance in real world scenarios because they cannot utilize historical dialogue flow. To address this structural limitation, we introduce the ERC task to sign language video analysis and propose the eJSL Dialog dataset. Constructed using the scripts from the STUDIES corpus, the dataset contains 1,920 video samples organized into 480 unique dialogues. We conduct systematic benchmarking on this dataset using models ranging from isolated visual networks to multimodal conversational architectures. The results reveal a domain gap when applying generic multimodal conversational emotion recognition models to sign language. These findings demonstrate the explicit need for context aware visual extractors specific to sign language and indicate that expanding the scale of conversational datasets to support large scale pre-training is a necessary next step for future research.
title Emotion Recognition in Sign Language Conversation
topic Computation and Language
url https://arxiv.org/abs/2605.23328