Saved in:
| Main Authors: | Lou, Haoran, Liu, Ziyan, Fan, Chunxiao, Wu, Yuexin, Ming, Yue, Wu, Hao, Zuo, Kai, Chen, Yibo, Tang, Xu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.13710 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
by: Lou, Haoran, et al.
Published: (2025)
by: Lou, Haoran, et al.
Published: (2025)
MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection
by: Liu, Ziyan, et al.
Published: (2025)
by: Liu, Ziyan, et al.
Published: (2025)
Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs
by: Tong, Jintao, et al.
Published: (2025)
by: Tong, Jintao, et al.
Published: (2025)
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
by: Wang, Zitian, et al.
Published: (2025)
by: Wang, Zitian, et al.
Published: (2025)
Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
by: Qian, Kai, et al.
Published: (2026)
by: Qian, Kai, et al.
Published: (2026)
Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging
by: Yang, Jiawen, et al.
Published: (2025)
by: Yang, Jiawen, et al.
Published: (2025)
Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization
by: Zhou, Hefeng, et al.
Published: (2026)
by: Zhou, Hefeng, et al.
Published: (2026)
Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
by: Lin, Pengxiao, et al.
Published: (2025)
by: Lin, Pengxiao, et al.
Published: (2025)
ROG: Retrieval-Augmented LLM Reasoning for Complex First-Order Queries over Knowledge Graphs
by: Zhang, Ziyan, et al.
Published: (2026)
by: Zhang, Ziyan, et al.
Published: (2026)
Transferable Multi-Bit Watermarking Across Frozen Diffusion Models via Latent Consistency Bridges
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2026)
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2026)
Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)
by: Chao, Lianying, et al.
Published: (2026)
Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads
by: Patel, Shaswat, et al.
Published: (2026)
by: Patel, Shaswat, et al.
Published: (2026)
Towards Convexity in Anomaly Detection: A New Formulation of SSLM with Unique Optimal Solutions
by: Liu, Hongying, et al.
Published: (2024)
by: Liu, Hongying, et al.
Published: (2024)
Growing Visual Generative Capacity for Pre-Trained MLLMs
by: Wang, Hanyu, et al.
Published: (2025)
by: Wang, Hanyu, et al.
Published: (2025)
Unveiling and Bridging the Functional Perception Gap in MLLMs: Atomic Visual Alignment and Hierarchical Evaluation via PET-Bench
by: Ye, Zanting, et al.
Published: (2026)
by: Ye, Zanting, et al.
Published: (2026)
CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)
by: Zhang, Xu, et al.
Published: (2025)
From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM
by: Wu, Xinyi, et al.
Published: (2025)
by: Wu, Xinyi, et al.
Published: (2025)
Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs
by: Huang, Jincai, et al.
Published: (2026)
by: Huang, Jincai, et al.
Published: (2026)
BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression
by: Li, Yuankai, et al.
Published: (2024)
by: Li, Yuankai, et al.
Published: (2024)
Bridging Search and Recommendation through Latent Cross Reasoning
by: Shi, Teng, et al.
Published: (2025)
by: Shi, Teng, et al.
Published: (2025)
TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
by: Chen, Yulin, et al.
Published: (2025)
by: Chen, Yulin, et al.
Published: (2025)
Multi-Masked Querying Network for Robust Emotion Recognition from Incomplete Multi-Modal Physiological Signals
by: Xu, Geng-Xin, et al.
Published: (2025)
by: Xu, Geng-Xin, et al.
Published: (2025)
Graph Transfer Learning via Shared Latent Geometry: Theory and Applications
by: Wu, Tong, et al.
Published: (2026)
by: Wu, Tong, et al.
Published: (2026)
VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs
by: Li, Qiaoru, et al.
Published: (2026)
by: Li, Qiaoru, et al.
Published: (2026)
MLLMs are Deeply Affected by Modality Bias
by: Zheng, Xu, et al.
Published: (2025)
by: Zheng, Xu, et al.
Published: (2025)
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
by: Huang, Zhe, et al.
Published: (2025)
by: Huang, Zhe, et al.
Published: (2025)
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
by: Xue, Wangyu, et al.
Published: (2024)
by: Xue, Wangyu, et al.
Published: (2024)
VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines
by: Huang, Kai, et al.
Published: (2026)
by: Huang, Kai, et al.
Published: (2026)
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
by: Zhou, Zhongzhu, et al.
Published: (2026)
by: Zhou, Zhongzhu, et al.
Published: (2026)
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025)
by: Chen, Guizhen, et al.
Published: (2025)
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025)
by: Li, Haoxuan, et al.
Published: (2025)
Bridging Queries and Tables through Entities in Table Retrieval
by: Li, Da, et al.
Published: (2025)
by: Li, Da, et al.
Published: (2025)
Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities
by: Fan, Qi, et al.
Published: (2024)
by: Fan, Qi, et al.
Published: (2024)
Universal Skeleton Understanding via Differentiable Rendering and MLLMs
by: Wang, Ziyi, et al.
Published: (2026)
by: Wang, Ziyi, et al.
Published: (2026)
Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
by: Zhang, Bob, et al.
Published: (2025)
by: Zhang, Bob, et al.
Published: (2025)
MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces
by: E, Shaojun, et al.
Published: (2025)
by: E, Shaojun, et al.
Published: (2025)
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
by: Wu, Hao, et al.
Published: (2026)
by: Wu, Hao, et al.
Published: (2026)
CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction
by: Ma, Hao, et al.
Published: (2024)
by: Ma, Hao, et al.
Published: (2024)
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)
by: Dai, Yifan, et al.
Published: (2026)
A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization
by: Xu, Wenyuan, et al.
Published: (2025)
by: Xu, Wenyuan, et al.
Published: (2025)
Similar Items
-
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
by: Lou, Haoran, et al.
Published: (2025) -
MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection
by: Liu, Ziyan, et al.
Published: (2025) -
Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs
by: Tong, Jintao, et al.
Published: (2025) -
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
by: Wang, Zitian, et al.
Published: (2025) -
Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
by: Qian, Kai, et al.
Published: (2026)