Saved in:
| Main Authors: | Huang, Jiayang, Li, Lingjie, Zhang, Kang, Yip, David |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.04968 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DreamLLM-3D: Affective Dream Reliving using Large Language Model and 3D Generative AI
by: Liu, Pinyao, et al.
Published: (2025)
by: Liu, Pinyao, et al.
Published: (2025)
Co-Director: Agentic Generative Video Storytelling
by: Song, Yale, et al.
Published: (2026)
by: Song, Yale, et al.
Published: (2026)
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
by: Yu, Jiashuo, et al.
Published: (2025)
by: Yu, Jiashuo, et al.
Published: (2025)
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)
by: Cheng, Zebang, et al.
Published: (2024)
Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)
by: Zhang, Yuan, et al.
Published: (2024)
EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
by: Qi, Xingqun, et al.
Published: (2023)
by: Qi, Xingqun, et al.
Published: (2023)
Sensorium Arc: AI Agent System for Oceanic Data Exploration and Interactive Eco-Art
by: Bissell, Noah, et al.
Published: (2025)
by: Bissell, Noah, et al.
Published: (2025)
Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
by: Chen, Yi-Chun, et al.
Published: (2025)
by: Chen, Yi-Chun, et al.
Published: (2025)
Memo2496: Expert-Annotated Dataset and Dual-View Adaptive Framework for Music Emotion Recognition
by: Li, Qilin, et al.
Published: (2025)
by: Li, Qilin, et al.
Published: (2025)
Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
by: Xia, Haiying, et al.
Published: (2025)
by: Xia, Haiying, et al.
Published: (2025)
Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
by: Shen, Nanhan, et al.
Published: (2026)
by: Shen, Nanhan, et al.
Published: (2026)
ELF: A Family of Encoder-Free ECG-Language Models
by: Han, William, et al.
Published: (2026)
by: Han, William, et al.
Published: (2026)
SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)
by: Zhu, Peican, et al.
Published: (2025)
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)
by: Zhu, Peican, et al.
Published: (2025)
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
by: Lin, Yuxiang, et al.
Published: (2025)
by: Lin, Yuxiang, et al.
Published: (2025)
CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition
by: Peng, Cheng, et al.
Published: (2023)
by: Peng, Cheng, et al.
Published: (2023)
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
by: He, Huiguo, et al.
Published: (2024)
by: He, Huiguo, et al.
Published: (2024)
AniME: Adaptive Multi-Agent Planning for Long Animation Generation
by: Zhang, Lisai, et al.
Published: (2025)
by: Zhang, Lisai, et al.
Published: (2025)
ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context
by: Zheng, Sixiao, et al.
Published: (2024)
by: Zheng, Sixiao, et al.
Published: (2024)
A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects
by: Ruan, Shulan, et al.
Published: (2025)
by: Ruan, Shulan, et al.
Published: (2025)
LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models
by: Chen, Jiangong, et al.
Published: (2025)
by: Chen, Jiangong, et al.
Published: (2025)
QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models
by: Lin, Zixing, et al.
Published: (2026)
by: Lin, Zixing, et al.
Published: (2026)
AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)
by: Wang, Yunsheng, et al.
Published: (2026)
Multi-Agent System for AI-Assisted Extraction of Narrative Arcs in TV Series
by: Balestri, Roberto, et al.
Published: (2025)
by: Balestri, Roberto, et al.
Published: (2025)
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
by: Li, Lin, et al.
Published: (2024)
by: Li, Lin, et al.
Published: (2024)
MetaDesigner: Advancing Artistic Typography Through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
by: He, Jun-Yan, et al.
Published: (2024)
by: He, Jun-Yan, et al.
Published: (2024)
Pedagogical Reflections on the Holistic Cognitive Development (HCD) Framework and AI-Augmented Learning in Creative Computing
by: Bhojan, Anand
Published: (2025)
by: Bhojan, Anand
Published: (2025)
Livia: An Emotion-Aware AR Companion Powered by Modular AI Agents and Progressive Memory Compression
by: Xi, Rui, et al.
Published: (2025)
by: Xi, Rui, et al.
Published: (2025)
Vlogger: Make Your Dream A Vlog
by: Zhuang, Shaobin, et al.
Published: (2024)
by: Zhuang, Shaobin, et al.
Published: (2024)
ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents
by: Yang, Tianyu, et al.
Published: (2025)
by: Yang, Tianyu, et al.
Published: (2025)
FakeParts: a New Family of AI-Generated DeepFakes
by: Liu, Ziyi, et al.
Published: (2025)
by: Liu, Ziyi, et al.
Published: (2025)
MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core
by: Liao, Callie C., et al.
Published: (2025)
by: Liao, Callie C., et al.
Published: (2025)
HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning
by: Zheng, Chuhang, et al.
Published: (2025)
by: Zheng, Chuhang, et al.
Published: (2025)
PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
by: Xie, Heng, et al.
Published: (2025)
by: Xie, Heng, et al.
Published: (2025)
Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
by: Tan, Rui Yang, et al.
Published: (2026)
by: Tan, Rui Yang, et al.
Published: (2026)
Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation
by: Huang, Zikai, et al.
Published: (2025)
by: Huang, Zikai, et al.
Published: (2025)
Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025)
by: Shen, Zhiqi, et al.
Published: (2025)
DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
by: Hong, Fa-Ting, et al.
Published: (2024)
by: Hong, Fa-Ting, et al.
Published: (2024)
HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection
by: Jung, Juho, et al.
Published: (2024)
by: Jung, Juho, et al.
Published: (2024)
MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
by: Farseev, Aleksandr, et al.
Published: (2025)
by: Farseev, Aleksandr, et al.
Published: (2025)
Similar Items
-
DreamLLM-3D: Affective Dream Reliving using Large Language Model and 3D Generative AI
by: Liu, Pinyao, et al.
Published: (2025) -
Co-Director: Agentic Generative Video Storytelling
by: Song, Yale, et al.
Published: (2026) -
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
by: Yu, Jiashuo, et al.
Published: (2025) -
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024) -
Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)