MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Fan, Qi, Li, Yutong, Xin, Yi, Cheng, Xinyu, Gao, Guanglai, Ma, Miao
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Sound Artificial Intelligence Audio and Speech Processing
Accesso online:	https://arxiv.org/abs/2409.04447
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866916384875741184
author	Fan, Qi Li, Yutong Xin, Yi Cheng, Xinyu Gao, Guanglai Ma, Miao
author_facet	Fan, Qi Li, Yutong Xin, Yi Cheng, Xinyu Gao, Guanglai Ma, Miao
contents	The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_04447
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples Fan, Qi Li, Yutong Xin, Yi Cheng, Xinyu Gao, Guanglai Ma, Miao Sound Artificial Intelligence Audio and Speech Processing The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.
title	Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
topic	Sound Artificial Intelligence Audio and Speech Processing
url	https://arxiv.org/abs/2409.04447

Documenti analoghi