Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yu, Jun, Lu, Xilong
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.11241
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912274477744128
author	Yu, Jun Lu, Xilong
author_facet	Yu, Jun Lu, Xilong
contents	Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To address these issues, we propose a novel approach leveraging Large Vision-Language Models (LVLMs). Our method employs a two-stage fine-tuning process: first, pre-trained LVLMs are fine-tuned on basic facial expressions to establish foundational patterns; second, the model is further optimized on a compound-expression dataset to refine visual-language feature interactions. Our approach achieves advanced accuracy on the RAF-DB dataset and demonstrates strong zero-shot generalization on the C-EXPR-DB dataset, showcasing its potential for real-world applications in emotion analysis and human-computer interaction.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_11241
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Compound Expression Recognition via Large Vision-Language Models Yu, Jun Lu, Xilong Computer Vision and Pattern Recognition Artificial Intelligence Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To address these issues, we propose a novel approach leveraging Large Vision-Language Models (LVLMs). Our method employs a two-stage fine-tuning process: first, pre-trained LVLMs are fine-tuned on basic facial expressions to establish foundational patterns; second, the model is further optimized on a compound-expression dataset to refine visual-language feature interactions. Our approach achieves advanced accuracy on the RAF-DB dataset and demonstrates strong zero-shot generalization on the C-EXPR-DB dataset, showcasing its potential for real-world applications in emotion analysis and human-computer interaction.
title	Compound Expression Recognition via Large Vision-Language Models
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2503.11241

Similar Items