Saved in:
| Main Authors: | Wang, Rui, Yang, Shichun, Chen, Yuyi, Li, Zhuoyang, Tong, Zexiang, Xu, Jianyi, Lu, Jiayi, Feng, Xinjie, Cao, Yaoguang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.11066 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Synesthesia of Vehicles: Tactile Data Synthesis from Visual Inputs
by: Wang, Rui, et al.
Published: (2026)
by: Wang, Rui, et al.
Published: (2026)
Emotional Cues Extraction and Fusion for Multi-modal Emotion Prediction and Recognition in Conversation
by: Shi, Haoxiang, et al.
Published: (2024)
by: Shi, Haoxiang, et al.
Published: (2024)
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024)
by: Wang, Sen, et al.
Published: (2024)
Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023)
by: Hu, Zexin, et al.
Published: (2023)
Clinical Multi-modal Fusion with Heterogeneous Graph and Disease Correlation Learning for Multi-Disease Prediction
by: Jiang, Yueheng, et al.
Published: (2025)
by: Jiang, Yueheng, et al.
Published: (2025)
MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection
by: Lv, Hongzhen, et al.
Published: (2024)
by: Lv, Hongzhen, et al.
Published: (2024)
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer
by: Zhang, Zhihao, et al.
Published: (2023)
by: Zhang, Zhihao, et al.
Published: (2023)
Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward
by: Tang, Yolo Yunlong, et al.
Published: (2022)
by: Tang, Yolo Yunlong, et al.
Published: (2022)
MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition
by: Zhang, Haoyang, et al.
Published: (2025)
by: Zhang, Haoyang, et al.
Published: (2025)
Multi-modal and Metadata Capture Model for Micro Video Popularity Prediction
by: Lu, Jiacheng, et al.
Published: (2025)
by: Lu, Jiacheng, et al.
Published: (2025)
Deep Mamba Multi-modal Learning
by: Zhu, Jian, et al.
Published: (2024)
by: Zhu, Jian, et al.
Published: (2024)
An Emotion Recognition Framework via Cross-modal Alignment of EEG and Eye Movement Data
by: Wang, Jianlu, et al.
Published: (2025)
by: Wang, Jianlu, et al.
Published: (2025)
Structure-Aware Residual-Center Representation for Self-Supervised Open-Set 3D Cross-Modal Retrieval
by: Xu, Yang, et al.
Published: (2024)
by: Xu, Yang, et al.
Published: (2024)
Hierarchical Sub-action Tree for Continuous Sign Language Recognition
by: Yang, Dejie, et al.
Published: (2025)
by: Yang, Dejie, et al.
Published: (2025)
High-level Codes and Fine-grained Weights for Online Multi-modal Hashing Retrieval
by: Zhan, Yu-Wei, et al.
Published: (2024)
by: Zhan, Yu-Wei, et al.
Published: (2024)
EidetiCom: A Cross-modal Brain-Computer Semantic Communication Paradigm for Decoding Visual Perception
by: Zheng, Linfeng, et al.
Published: (2024)
by: Zheng, Linfeng, et al.
Published: (2024)
Interactive Spatial-Frequency Fusion Mamba for Multi-Modal Image Fusion
by: Zhu, Yixin, et al.
Published: (2026)
by: Zhu, Yixin, et al.
Published: (2026)
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization
by: Chen, Tao, et al.
Published: (2023)
by: Chen, Tao, et al.
Published: (2023)
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
by: Yuan, Bo, et al.
Published: (2024)
by: Yuan, Bo, et al.
Published: (2024)
A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects
by: Ruan, Shulan, et al.
Published: (2025)
by: Ruan, Shulan, et al.
Published: (2025)
LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)
by: Su, Taoyu, et al.
Published: (2024)
ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing
by: Wang, Yisu, et al.
Published: (2025)
by: Wang, Yisu, et al.
Published: (2025)
AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
by: Gao, Jun, et al.
Published: (2024)
by: Gao, Jun, et al.
Published: (2024)
LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction
by: Pramov, Aleksandar
Published: (2025)
by: Pramov, Aleksandar
Published: (2025)
CDI-DTI: A Strong Cross-domain Interpretable Drug-Target Interaction Prediction Framework Based on Multi-Strategy Fusion
by: Li, Xiangyu, et al.
Published: (2025)
by: Li, Xiangyu, et al.
Published: (2025)
Towards Structure-aware Model for Multi-modal Knowledge Graph Completion
by: Li, Linyu, et al.
Published: (2025)
by: Li, Linyu, et al.
Published: (2025)
Generative AI-enabled Mobile Tactical Multimedia Networks: Distribution, Generation, and Perception
by: Xu, Minrui, et al.
Published: (2024)
by: Xu, Minrui, et al.
Published: (2024)
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment
by: Su, Taoyu, et al.
Published: (2024)
by: Su, Taoyu, et al.
Published: (2024)
Physics-Aware Novel-View Acoustic Synthesis with Vision-Language Priors and 3D Acoustic Environment Modeling
by: Fan, Congyi, et al.
Published: (2026)
by: Fan, Congyi, et al.
Published: (2026)
Voices, Faces, and Feelings: Multi-modal Emotion-Cognition Captioning for Mental Health Understanding
by: Zhou, Zhiyuan, et al.
Published: (2026)
by: Zhou, Zhiyuan, et al.
Published: (2026)
Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models
by: Wu, Qiong, et al.
Published: (2024)
by: Wu, Qiong, et al.
Published: (2024)
Characterizing Multimedia Information Environment through Multi-modal Clustering of YouTube Videos
by: Yousefi, Niloofar, et al.
Published: (2024)
by: Yousefi, Niloofar, et al.
Published: (2024)
Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification
by: Cao, Han, et al.
Published: (2024)
by: Cao, Han, et al.
Published: (2024)
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval
by: Hu, Xiaowan, et al.
Published: (2024)
by: Hu, Xiaowan, et al.
Published: (2024)
FinCall-Surprise: A Large Scale Multi-modal Benchmark for Earning Surprise Prediction
by: Shu, Dong, et al.
Published: (2025)
by: Shu, Dong, et al.
Published: (2025)
Quantifying and Enhancing Multi-modal Robustness with Modality Preference
by: Yang, Zequn, et al.
Published: (2024)
by: Yang, Zequn, et al.
Published: (2024)
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
by: Wu, Zichen, et al.
Published: (2024)
by: Wu, Zichen, et al.
Published: (2024)
Interest-Aware Joint Caching, Computing, and Communication Optimization for Mobile VR Delivery in MEC Networks
by: Fu, Baojie, et al.
Published: (2024)
by: Fu, Baojie, et al.
Published: (2024)
Perception-Aware Video Semantic Communication
by: Huang, Yinhuan, et al.
Published: (2026)
by: Huang, Yinhuan, et al.
Published: (2026)
EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment
by: Gao, Lancheng, et al.
Published: (2025)
by: Gao, Lancheng, et al.
Published: (2025)
Similar Items
-
Synesthesia of Vehicles: Tactile Data Synthesis from Visual Inputs
by: Wang, Rui, et al.
Published: (2026) -
Emotional Cues Extraction and Fusion for Multi-modal Emotion Prediction and Recognition in Conversation
by: Shi, Haoxiang, et al.
Published: (2024) -
MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model
by: Wang, Sen, et al.
Published: (2024) -
Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023) -
Clinical Multi-modal Fusion with Heterogeneous Graph and Disease Correlation Learning for Multi-Disease Prediction
by: Jiang, Yueheng, et al.
Published: (2025)