:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Jiayang, Li, Lingjie, Zhang, Kang, Yip, David
Format:	Preprint
Published:	2025
Subjects:	Multimedia Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.04968
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DreamLLM-3D: Affective Dream Reliving using Large Language Model and 3D Generative AI
by: Liu, Pinyao, et al.
Published: (2025)

Co-Director: Agentic Generative Video Storytelling
by: Song, Yale, et al.
Published: (2026)

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
by: Yu, Jiashuo, et al.
Published: (2025)

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
by: Cheng, Zebang, et al.
Published: (2024)

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios
by: Zhang, Yuan, et al.
Published: (2024)

EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
by: Qi, Xingqun, et al.
Published: (2023)

Sensorium Arc: AI Agent System for Oceanic Data Exploration and Interactive Eco-Art
by: Bissell, Noah, et al.
Published: (2025)

Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments
by: Chen, Yi-Chun, et al.
Published: (2025)

Memo2496: Expert-Annotated Dataset and Dual-View Adaptive Framework for Music Emotion Recognition
by: Li, Qilin, et al.
Published: (2025)

Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
by: Xia, Haiying, et al.
Published: (2025)

Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation
by: Shen, Nanhan, et al.
Published: (2026)

ELF: A Family of Encoder-Free ECG-Language Models
by: Han, William, et al.
Published: (2026)

SEER: Semantic Enhancement and Emotional Reasoning Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)

KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection
by: Zhu, Peican, et al.
Published: (2025)

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
by: Lin, Yuxiang, et al.
Published: (2025)

CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition
by: Peng, Cheng, et al.
Published: (2023)

DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
by: He, Huiguo, et al.
Published: (2024)

AniME: Adaptive Multi-Agent Planning for Long Animation Generation
by: Zhang, Lisai, et al.
Published: (2025)

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context
by: Zheng, Sixiao, et al.
Published: (2024)

A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects
by: Ruan, Shulan, et al.
Published: (2025)

LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models
by: Chen, Jiangong, et al.
Published: (2025)

QMAVIS: Long Video-Audio Understanding using Fusion of Large Multimodal Models
by: Lin, Zixing, et al.
Published: (2026)

AMB-DSGDN: Adaptive Modality-Balanced Dynamic Semantic Graph Differential Network for Multimodal Emotion Recognition
by: Wang, Yunsheng, et al.
Published: (2026)

Multi-Agent System for AI-Assisted Extraction of Narrative Arcs in TV Series
by: Balestri, Roberto, et al.
Published: (2025)

A Survey on Multimodal Benchmarks: In the Era of Large AI Models
by: Li, Lin, et al.
Published: (2024)

MetaDesigner: Advancing Artistic Typography Through AI-Driven, User-Centric, and Multilingual WordArt Synthesis
by: He, Jun-Yan, et al.
Published: (2024)

Pedagogical Reflections on the Holistic Cognitive Development (HCD) Framework and AI-Augmented Learning in Creative Computing
by: Bhojan, Anand
Published: (2025)

Livia: An Emotion-Aware AR Companion Powered by Modular AI Agents and Progressive Memory Compression
by: Xi, Rui, et al.
Published: (2025)

Vlogger: Make Your Dream A Vlog
by: Zhuang, Shaobin, et al.
Published: (2024)

ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents
by: Yang, Tianyu, et al.
Published: (2025)

FakeParts: a New Family of AI-Generated DeepFakes
by: Liu, Ziyi, et al.
Published: (2025)

MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core
by: Liao, Callie C., et al.
Published: (2025)

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning
by: Zheng, Chuhang, et al.
Published: (2025)

PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis
by: Xie, Heng, et al.
Published: (2025)

Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
by: Tan, Rui Yang, et al.
Published: (2026)

Think2Sing: Orchestrating Structured Motion Subtitles for Singing-Driven 3D Head Animation
by: Huang, Zikai, et al.
Published: (2025)

Modeling Human Responses to Multimodal AI Content
by: Shen, Zhiqi, et al.
Published: (2025)

DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
by: Hong, Fa-Ting, et al.
Published: (2024)

HiQuE: Hierarchical Question Embedding Network for Multimodal Depression Detection
by: Jung, Juho, et al.
Published: (2024)

MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation
by: Farseev, Aleksandr, et al.
Published: (2025)