Salvato in:
Dettagli Bibliografici
Autori principali: Fan, Qi, Li, Yutong, Xin, Yi, Cheng, Xinyu, Gao, Guanglai, Ma, Miao
Natura: Preprint
Pubblicazione: 2024
Soggetti:
Accesso online:https://arxiv.org/abs/2409.04447
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866916384875741184
author Fan, Qi
Li, Yutong
Xin, Yi
Cheng, Xinyu
Gao, Guanglai
Ma, Miao
author_facet Fan, Qi
Li, Yutong
Xin, Yi
Cheng, Xinyu
Gao, Guanglai
Ma, Miao
contents The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.
format Preprint
id arxiv_https___arxiv_org_abs_2409_04447
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
Fan, Qi
Li, Yutong
Xin, Yi
Cheng, Xinyu
Gao, Guanglai
Ma, Miao
Sound
Artificial Intelligence
Audio and Speech Processing
The Multimodal Emotion Recognition challenge MER2024 focuses on recognizing emotions using audio, language, and visual signals. In this paper, we present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI), which tackles the issue of limited annotated data in emotion recognition. Firstly, to address the class imbalance, we adopt an oversampling strategy. Secondly, we propose a modality representation combinatorial contrastive learning (MR-CCL) framework on the trimodal input data to establish robust initial models. Thirdly, we explore a self-training approach to expand the training set. Finally, we enhance prediction robustness through a multi-classifier weighted soft voting strategy. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard. Our project is available at https://github.com/WooyoohL/MER2024-SEMI.
title Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples
topic Sound
Artificial Intelligence
Audio and Speech Processing
url https://arxiv.org/abs/2409.04447