Saved in:
| Main Authors: | Li, Songtao, Tang, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.17040 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multimodal Data Storage and Retrieval for Embodied AI: A Survey
by: Lu, Yihao, et al.
Published: (2025)
by: Lu, Yihao, et al.
Published: (2025)
TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots
by: Xie, Qinhua, et al.
Published: (2025)
by: Xie, Qinhua, et al.
Published: (2025)
DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis
by: Wen, Yuhua, et al.
Published: (2025)
by: Wen, Yuhua, et al.
Published: (2025)
MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment
by: wang, Shuo, et al.
Published: (2025)
by: wang, Shuo, et al.
Published: (2025)
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025)
by: Han, Xiaofeng, et al.
Published: (2025)
Multimodal Referring Segmentation: A Survey
by: Ding, Henghui, et al.
Published: (2025)
by: Ding, Henghui, et al.
Published: (2025)
The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment
by: Koutoupis, Stefanos, et al.
Published: (2025)
by: Koutoupis, Stefanos, et al.
Published: (2025)
Text-to-Image Synthesis: A Decade Survey
by: Zhang, Nonghai, et al.
Published: (2024)
by: Zhang, Nonghai, et al.
Published: (2024)
UAVD-Mamba: Deformable Token Fusion Vision Mamba for Multimodal UAV Detection
by: Li, Wei, et al.
Published: (2025)
by: Li, Wei, et al.
Published: (2025)
RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba
by: Lu, Andong, et al.
Published: (2024)
by: Lu, Andong, et al.
Published: (2024)
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
by: Li, Teng, et al.
Published: (2025)
by: Li, Teng, et al.
Published: (2025)
SEF-MAP: Subspace-Decomposed Expert Fusion for Robust Multimodal HD Map Prediction
by: Fu, Haoxiang, et al.
Published: (2026)
by: Fu, Haoxiang, et al.
Published: (2026)
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition
by: Li, Qifei, et al.
Published: (2024)
by: Li, Qifei, et al.
Published: (2024)
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation
by: Du, Yuxuan, et al.
Published: (2025)
by: Du, Yuxuan, et al.
Published: (2025)
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
by: Wang, Yaoting, et al.
Published: (2025)
by: Wang, Yaoting, et al.
Published: (2025)
VideoFusion: A Spatio-Temporal Collaborative Network for Multi-modal Video Fusion
by: Tang, Linfeng, et al.
Published: (2025)
by: Tang, Linfeng, et al.
Published: (2025)
STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion
by: Yao, Wei, et al.
Published: (2024)
by: Yao, Wei, et al.
Published: (2024)
Hyperbolic Cycle Alignment for Infrared-Visible Image Fusion
by: Li, Timing, et al.
Published: (2025)
by: Li, Timing, et al.
Published: (2025)
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
by: Jiang, Songtao, et al.
Published: (2024)
by: Jiang, Songtao, et al.
Published: (2024)
Omni Survey for Multimodality Analysis in Visual Object Tracking
by: Tang, Zhangyong, et al.
Published: (2025)
by: Tang, Zhangyong, et al.
Published: (2025)
Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification
by: Liu, Hao, et al.
Published: (2025)
by: Liu, Hao, et al.
Published: (2025)
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
by: Wang, Zeyu, et al.
Published: (2025)
by: Wang, Zeyu, et al.
Published: (2025)
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
by: Cao, Jianjian, et al.
Published: (2024)
by: Cao, Jianjian, et al.
Published: (2024)
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
by: Zou, Yuanhao, et al.
Published: (2025)
by: Zou, Yuanhao, et al.
Published: (2025)
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
by: Yin, Junbo, et al.
Published: (2024)
by: Yin, Junbo, et al.
Published: (2024)
Unbiased Dynamic Multimodal Fusion
by: Wei, Shicai, et al.
Published: (2026)
by: Wei, Shicai, et al.
Published: (2026)
LASFNet: A Lightweight Attention-Guided Self-Modulation Feature Fusion Network for Multimodal Object Detection
by: Hao, Lei, et al.
Published: (2025)
by: Hao, Lei, et al.
Published: (2025)
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
by: Tong, Ran, et al.
Published: (2025)
by: Tong, Ran, et al.
Published: (2025)
AGSP-DSA: An Adaptive Graph Signal Processing Framework for Robust Multimodal Fusion with Dynamic Semantic Alignment
by: Karthikeya, KV, et al.
Published: (2026)
by: Karthikeya, KV, et al.
Published: (2026)
SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
by: Kong, Lingtong, et al.
Published: (2024)
by: Kong, Lingtong, et al.
Published: (2024)
Semantic Alignment for Multimodal Large Language Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Feature Alignment Determines Fusion Strategy: A Comparative Study of Cross-Attention and Concatenation in Multimodal Learning
by: Zhou, Zhiqiang, et al.
Published: (2026)
by: Zhou, Zhiqiang, et al.
Published: (2026)
MMA: Multimodal Memory Agent
by: Lu, Yihao, et al.
Published: (2026)
by: Lu, Yihao, et al.
Published: (2026)
Unified Multimodal Coherent Field: Synchronous Semantic-Spatial-Vision Fusion for Brain Tumor Segmentation
by: Zhang, Mingda, et al.
Published: (2025)
by: Zhang, Mingda, et al.
Published: (2025)
Interactive Multimodal Fusion with Temporal Modeling
by: Yu, Jun, et al.
Published: (2025)
by: Yu, Jun, et al.
Published: (2025)
Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba
by: Xie, Xinyu, et al.
Published: (2024)
by: Xie, Xinyu, et al.
Published: (2024)
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
by: Jia, Ding, et al.
Published: (2024)
by: Jia, Ding, et al.
Published: (2024)
The Indra Representation Hypothesis for Multimodal Alignment
by: Lu, Jianglin, et al.
Published: (2026)
by: Lu, Jianglin, et al.
Published: (2026)
MSGFusion: Multimodal Scene Graph-Guided Infrared and Visible Image Fusion
by: Li, Guihui, et al.
Published: (2025)
by: Li, Guihui, et al.
Published: (2025)
Similar Items
-
Multimodal Data Storage and Retrieval for Embodied AI: A Survey
by: Lu, Yihao, et al.
Published: (2025) -
TTTFusion: A Test-Time Training-Based Strategy for Multimodal Medical Image Fusion in Surgical Robots
by: Xie, Qinhua, et al.
Published: (2025) -
DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis
by: Wen, Yuhua, et al.
Published: (2025) -
MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment
by: wang, Shuo, et al.
Published: (2025) -
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
by: Han, Xiaofeng, et al.
Published: (2025)