:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Rui, Yang, Shichun, Chen, Yuyi, Li, Zhuoyang, Tong, Zexiang, Xu, Jianyi, Lu, Jiayi, Feng, Xinjie, Cao, Yaoguang
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Multimedia
Online Access:	https://arxiv.org/abs/2505.11066
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Synesthesia of Vehicles: Tactile Data Synthesis from Visual Inputs
by: Wang, Rui, et al.
Published: (2026)

Emotional Cues Extraction and Fusion for Multi-modal Emotion Prediction and Recognition in Conversation
by: Shi, Haoxiang, et al.
Published: (2024)

MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024)

Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023)

Clinical Multi-modal Fusion with Heterogeneous Graph and Disease Correlation Learning for Multi-Disease Prediction
by: Jiang, Yueheng, et al.
Published: (2025)

MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection
by: Lv, Hongzhen, et al.
Published: (2024)

Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
by: Zhang, Zhihao, et al.
Published: (2023)

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward
by: Tang, Yolo Yunlong, et al.
Published: (2022)

MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition
by: Zhang, Haoyang, et al.
Published: (2025)

Multi-modal and Metadata Capture Model for Micro Video Popularity Prediction
by: Lu, Jiacheng, et al.
Published: (2025)

Deep Mamba Multi-modal Learning
by: Zhu, Jian, et al.
Published: (2024)

An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)

Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval
by: Xu, Yang, et al.
Published: (2024)

Hierarchical Sub-action Tree for Continuous Sign Language Recognition
by: Yang, Dejie, et al.
Published: (2025)

High-level Codes and Fine-grained Weights for Online Multi-modal Hashing Retrieval
by: Zhan, Yu-Wei, et al.
Published: (2024)

EidetiCom: A Cross-modal Brain-Computer Semantic Communication Paradigm for Decoding Visual Perception
by: Zheng, Linfeng, et al.
Published: (2024)

Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion
by: Zhu, Yixin, et al.
Published: (2026)

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization
by: Chen, Tao, et al.
Published: (2023)

Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
by: Yuan, Bo, et al.
Published: (2024)

A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects
by: Ruan, Shulan, et al.
Published: (2025)

LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)

ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing
by: Wang, Yisu, et al.
Published: (2025)

AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
by: Gao, Jun, et al.
Published: (2024)

LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction
by: Pramov, Aleksandar
Published: (2025)

CDI-DTI: A Strong Cross-domain Interpretable Drug-Target Interaction Prediction Framework Based on Multi-Strategy Fusion
by: Li, Xiangyu, et al.
Published: (2025)

Towards Structure-aware Model for Multi-modal Knowledge Graph Completion
by: Li, Linyu, et al.
Published: (2025)

Generative AI-enabled Mobile Tactical Multimedia Networks: Distribution, Generation, and Perception
by: Xu, Minrui, et al.
Published: (2024)

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)

Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling
by: Fan, Congyi, et al.
Published: (2026)

Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding
by: Zhou, Zhiyuan, et al.
Published: (2026)

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models
by: Wu, Qiong, et al.
Published: (2024)

Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos
by: Yousefi, Niloofar, et al.
Published: (2024)

Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification
by: Cao, Han, et al.
Published: (2024)

Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
by: Hu, Xiaowan, et al.
Published: (2024)

FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction
by: Shu, Dong, et al.
Published: (2025)

Quantifying and Enhancing Multi-modal Robustness with Modality Preference
by: Yang, Zequn, et al.
Published: (2024)

Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)

Interest-Aware Joint Caching, Computing, and Communication Optimization for Mobile VR Delivery in MEC Networks
by: Fu, Baojie, et al.
Published: (2024)

Perception-Aware Video Semantic Communication
by: Huang, Yinhuan, et al.
Published: (2026)

EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment
by: Gao, Lancheng, et al.
Published: (2025)