Saved in:
| Main Authors: | Zhang, Liyun, Sha, Xuanmeng, Wu, Shuqiong, Liu, Fengkai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.20894 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Unified Evaluation Framework for Multi-Annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues
by: Zhang, Liyun
Published: (2024)
by: Zhang, Liyun
Published: (2024)
3DGesPolicy: Phoneme-Aware Holistic Co-Speech Gesture Generation Based on Action Control
by: Sha, Xuanmeng, et al.
Published: (2026)
by: Sha, Xuanmeng, et al.
Published: (2026)
3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control
by: Sha, Xuanmeng, et al.
Published: (2024)
by: Sha, Xuanmeng, et al.
Published: (2024)
SpikEmo: Enhancing Emotion Recognition With Spiking Temporal Dynamics in Conversations
by: Yu, Xiaomin, et al.
Published: (2024)
by: Yu, Xiaomin, et al.
Published: (2024)
Towards Open-Vocabulary Video Semantic Segmentation
by: Li, Xinhao, et al.
Published: (2024)
by: Li, Xinhao, et al.
Published: (2024)
EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis
by: Lee, SangEun, et al.
Published: (2025)
by: Lee, SangEun, et al.
Published: (2025)
CLAIP-Emo: Parameter-Efficient Adaptation of Language-supervised models for In-the-Wild Audiovisual Emotion Recognition
by: Chen, Yin, et al.
Published: (2025)
by: Chen, Yin, et al.
Published: (2025)
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
by: Cong, Gaoxiang, et al.
Published: (2024)
by: Cong, Gaoxiang, et al.
Published: (2024)
Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding
by: Pan, Zhaoyan, et al.
Published: (2026)
by: Pan, Zhaoyan, et al.
Published: (2026)
Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition
by: Zhuang, Yan, et al.
Published: (2026)
by: Zhuang, Yan, et al.
Published: (2026)
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)
by: Zhang, Hezhao, et al.
Published: (2026)
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
by: Chen, Shengkai, et al.
Published: (2025)
by: Chen, Shengkai, et al.
Published: (2025)
Open-Vocabulary Audio-Visual Semantic Segmentation
by: Guo, Ruohao, et al.
Published: (2024)
by: Guo, Ruohao, et al.
Published: (2024)
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
QuMATL: Query-based Multi-annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
by: Ma, Ziyang, et al.
Published: (2024)
by: Ma, Ziyang, et al.
Published: (2024)
VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints
by: Tang, Jinghua, et al.
Published: (2024)
by: Tang, Jinghua, et al.
Published: (2024)
Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation
by: Ye, Chengyang, et al.
Published: (2024)
by: Ye, Chengyang, et al.
Published: (2024)
Multimodal Emotion Recognition with Large Language Models
by: Zhang, Hongrui, et al.
Published: (2026)
by: Zhang, Hongrui, et al.
Published: (2026)
Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling
by: Fan, Congyi, et al.
Published: (2026)
by: Fan, Congyi, et al.
Published: (2026)
SimLabel: Similarity-Weighted Iterative Framework for Multi-annotator Learning with Missing Annotations
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
by: Shen, Nanhan, et al.
Published: (2026)
by: Shen, Nanhan, et al.
Published: (2026)
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
CEM-Net: Cross-Emotion Memory Network for Emotional Talking Face Generation
by: Wu, Kangyi, et al.
Published: (2025)
by: Wu, Kangyi, et al.
Published: (2025)
Towards Open-Vocabulary Audio-Visual Event Localization
by: Zhou, Jinxing, et al.
Published: (2024)
by: Zhou, Jinxing, et al.
Published: (2024)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation
by: Liu, Dancheng, et al.
Published: (2025)
by: Liu, Dancheng, et al.
Published: (2025)
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
by: Wu, Xiongwei, et al.
Published: (2024)
by: Wu, Xiongwei, et al.
Published: (2024)
Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection
by: Zhu, Sa, et al.
Published: (2026)
by: Zhu, Sa, et al.
Published: (2026)
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
by: Chao, Jianghan, et al.
Published: (2025)
by: Chao, Jianghan, et al.
Published: (2025)
SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)
by: Zhu, Peican, et al.
Published: (2025)
An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)
by: Wang, Jianlu, et al.
Published: (2025)
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation
by: Yi, Zijian, et al.
Published: (2024)
by: Yi, Zijian, et al.
Published: (2024)
AV-EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Omni-modal LLMS with Audio-visual Cues
by: Zhou, Dingkun, et al.
Published: (2025)
by: Zhou, Dingkun, et al.
Published: (2025)
High-Fidelity 3D Gaussian Human Reconstruction via Region-Aware Initialization and Geometric Priors
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
by: Liu, Shuhang, et al.
Published: (2025)
by: Liu, Shuhang, et al.
Published: (2025)
ChronusOmni: Improving Time Awareness of Omni Large Language Models
by: Chen, Yijing, et al.
Published: (2025)
by: Chen, Yijing, et al.
Published: (2025)
QuMAB: Query-based Multi-Annotator Behavior Modeling with Reliability under Sparse Labels
by: Zhang, Liyun, et al.
Published: (2025)
by: Zhang, Liyun, et al.
Published: (2025)
EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing
by: Jiang, Diqiong, et al.
Published: (2026)
by: Jiang, Diqiong, et al.
Published: (2026)
Similar Items
-
A Unified Evaluation Framework for Multi-Annotator Tendency Learning
by: Zhang, Liyun, et al.
Published: (2025) -
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues
by: Zhang, Liyun
Published: (2024) -
3DGesPolicy: Phoneme-Aware Holistic Co-Speech Gesture Generation Based on Action Control
by: Sha, Xuanmeng, et al.
Published: (2026) -
3DFacePolicy: Audio-Driven 3D Facial Animation Based on Action Control
by: Sha, Xuanmeng, et al.
Published: (2024) -
SpikEmo: Enhancing Emotion Recognition With Spiking Temporal Dynamics in Conversations
by: Yu, Xiaomin, et al.
Published: (2024)