Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Ming, Liu, Yong-Jin, Liu, Fang, Sheng, Huankun, Fan, Yeying, Wei, Yixiang, Luo, Minnan, Zhang, Weizhan, Wang, Wenping
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2602.20530
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914346906419200
author	Li, Ming Liu, Yong-Jin Liu, Fang Sheng, Huankun Fan, Yeying Wei, Yixiang Luo, Minnan Zhang, Weizhan Wang, Wenping
author_facet	Li, Ming Liu, Yong-Jin Liu, Fang Sheng, Huankun Fan, Yeying Wei, Yixiang Luo, Minnan Zhang, Weizhan Wang, Wenping
contents	Emotion recognition from multi-modal physiological and behavioral signals plays a pivotal role in affective computing, yet most existing models remain constrained to the prediction of singular emotions in controlled laboratory settings. Real-world human emotional experiences, by contrast, are often characterized by the simultaneous presence of multiple affective states, spurring recent interest in mixed emotion recognition as an emotion distribution learning problem. Current approaches, however, often neglect the valence consistency and structured correlations inherent among coexisting emotions. To address this limitation, we propose a Memory-guided Prototypical Co-occurrence Learning (MPCL) framework that explicitly models emotion co-occurrence patterns. Specifically, we first fuse multi-modal signals via a multi-scale associative memory mechanism. To capture cross-modal semantic relationships, we construct emotion-specific prototype memory banks, yielding rich physiological and behavioral representations, and employ prototype relation distillation to ensure cross-modal alignment in the latent prototype space. Furthermore, inspired by human cognitive memory systems, we introduce a memory retrieval strategy to extract semantic-level co-occurrence associations across emotion categories. Through this bottom-up hierarchical abstraction process, our model learns affectively informative representations for accurate emotion distribution prediction. Comprehensive experiments on two public datasets demonstrate that MPCL consistently outperforms state-of-the-art methods in mixed emotion recognition, both quantitatively and qualitatively.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_20530
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition Li, Ming Liu, Yong-Jin Liu, Fang Sheng, Huankun Fan, Yeying Wei, Yixiang Luo, Minnan Zhang, Weizhan Wang, Wenping Machine Learning Sound Audio and Speech Processing Emotion recognition from multi-modal physiological and behavioral signals plays a pivotal role in affective computing, yet most existing models remain constrained to the prediction of singular emotions in controlled laboratory settings. Real-world human emotional experiences, by contrast, are often characterized by the simultaneous presence of multiple affective states, spurring recent interest in mixed emotion recognition as an emotion distribution learning problem. Current approaches, however, often neglect the valence consistency and structured correlations inherent among coexisting emotions. To address this limitation, we propose a Memory-guided Prototypical Co-occurrence Learning (MPCL) framework that explicitly models emotion co-occurrence patterns. Specifically, we first fuse multi-modal signals via a multi-scale associative memory mechanism. To capture cross-modal semantic relationships, we construct emotion-specific prototype memory banks, yielding rich physiological and behavioral representations, and employ prototype relation distillation to ensure cross-modal alignment in the latent prototype space. Furthermore, inspired by human cognitive memory systems, we introduce a memory retrieval strategy to extract semantic-level co-occurrence associations across emotion categories. Through this bottom-up hierarchical abstraction process, our model learns affectively informative representations for accurate emotion distribution prediction. Comprehensive experiments on two public datasets demonstrate that MPCL consistently outperforms state-of-the-art methods in mixed emotion recognition, both quantitatively and qualitatively.
title	Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition
topic	Machine Learning Sound Audio and Speech Processing
url	https://arxiv.org/abs/2602.20530

Similar Items