:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lou, Haoran, Liu, Ziyan, Fan, Chunxiao, Wu, Yuexin, Ming, Yue, Wu, Hao, Zuo, Kai, Chen, Yibo, Tang, Xu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.13710
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
by: Lou, Haoran, et al.
Published: (2025)

MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection
by: Liu, Ziyan, et al.
Published: (2025)

Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs
by: Tong, Jintao, et al.
Published: (2025)

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
by: Wang, Zitian, et al.
Published: (2025)

Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
by: Qian, Kai, et al.
Published: (2026)

Heterogeneous-Modal Unsupervised Domain Adaptation via Latent Space Bridging
by: Yang, Jiawen, et al.
Published: (2025)

Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization
by: Zhou, Hefeng, et al.
Published: (2026)

Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory
by: Lin, Pengxiao, et al.
Published: (2025)

ROG: Retrieval-Augmented LLM Reasoning for Complex First-Order Queries over Knowledge Graphs
by: Zhang, Ziyan, et al.
Published: (2026)

Transferable Multi-Bit Watermarking Across Frozen Diffusion Models via Latent Consistency Bridges
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2026)

Multi-Modal LLM based Image Captioning in ICT: Bridging the Gap Between General and Industry Domain
by: Chao, Lianying, et al.
Published: (2026)

Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads
by: Patel, Shaswat, et al.
Published: (2026)

Towards Convexity in Anomaly Detection: A New Formulation of SSLM with Unique Optimal Solutions
by: Liu, Hongying, et al.
Published: (2024)

Growing Visual Generative Capacity for Pre-Trained MLLMs
by: Wang, Hanyu, et al.
Published: (2025)

Unveiling and Bridging the Functional Perception Gap in MLLMs: Atomic Visual Alignment and Hierarchical Evaluation via PET-Bench
by: Ye, Zanting, et al.
Published: (2026)

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks
by: Zhang, Xu, et al.
Published: (2025)

From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM
by: Wu, Xinyi, et al.
Published: (2025)

Towards Unified Surgical Scene Understanding:Bridging Reasoning and Grounding via MLLMs
by: Huang, Jincai, et al.
Published: (2026)

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression
by: Li, Yuankai, et al.
Published: (2024)

Bridging Search and Recommendation through Latent Cross Reasoning
by: Shi, Teng, et al.
Published: (2025)

TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
by: Chen, Yulin, et al.
Published: (2025)

Multi-Masked Querying Network for Robust Emotion Recognition from Incomplete Multi-Modal Physiological Signals
by: Xu, Geng-Xin, et al.
Published: (2025)

Graph Transfer Learning via Shared Latent Geometry: Theory and Applications
by: Wu, Tong, et al.
Published: (2026)

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs
by: Li, Qiaoru, et al.
Published: (2026)

MLLMs are Deeply Affected by Modality Bias
by: Zheng, Xu, et al.
Published: (2025)

Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
by: Huang, Zhe, et al.
Published: (2025)

ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
by: Xue, Wangyu, et al.
Published: (2024)

VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines
by: Huang, Kai, et al.
Published: (2026)

CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention
by: Zhou, Zhongzhu, et al.
Published: (2026)

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025)

SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
by: Li, Haoxuan, et al.
Published: (2025)

Bridging Queries and Tables through Entities in Table Retrieval
by: Li, Da, et al.
Published: (2025)

Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities
by: Fan, Qi, et al.
Published: (2024)

Universal Skeleton Understanding via Differentiable Rendering and MLLMs
by: Wang, Ziyi, et al.
Published: (2026)

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
by: Zhang, Bob, et al.
Published: (2025)

MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces
by: E, Shaojun, et al.
Published: (2025)

HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
by: Wu, Hao, et al.
Published: (2026)

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction
by: Ma, Hao, et al.
Published: (2024)

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning
by: Dai, Yifan, et al.
Published: (2026)

A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization
by: Xu, Wenyuan, et al.
Published: (2025)