Saved in:
| Main Authors: | Lin, Yijie, Ding, Guofeng, Zhou, Haochen, Li, Haobin, Yang, Mouxing, Peng, Xi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.09839 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reliable Thinking with Images
by: Li, Haobin, et al.
Published: (2026)
by: Li, Haobin, et al.
Published: (2026)
LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
by: Lu, Yiding, et al.
Published: (2025)
by: Lu, Yiding, et al.
Published: (2025)
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
by: Lu, Yiding, et al.
Published: (2023)
by: Lu, Yiding, et al.
Published: (2023)
A Survey on Deep Clustering: From the Prior Perspective
by: Lu, Yiding, et al.
Published: (2024)
by: Lu, Yiding, et al.
Published: (2024)
An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model
by: Tian, Yuxin, et al.
Published: (2024)
by: Tian, Yuxin, et al.
Published: (2024)
AffectAgent: Collaborative Multi-Agent Reasoning for Retrieval-Augmented Multimodal Emotion Recognition
by: Wang, Zeheng, et al.
Published: (2026)
by: Wang, Zeheng, et al.
Published: (2026)
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
by: Tan, Cheng, et al.
Published: (2024)
by: Tan, Cheng, et al.
Published: (2024)
Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
by: Li, Haobin, et al.
Published: (2025)
by: Li, Haobin, et al.
Published: (2025)
V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval
by: Chen, Dongyang, et al.
Published: (2026)
by: Chen, Dongyang, et al.
Published: (2026)
MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents
by: Xu, Dannong, et al.
Published: (2026)
by: Xu, Dannong, et al.
Published: (2026)
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
by: Zhang, Jianing, et al.
Published: (2026)
by: Zhang, Jianing, et al.
Published: (2026)
Proactive Reasoning-with-Retrieval Framework for Medical Multimodal Large Language Models
by: Wang, Lehan, et al.
Published: (2025)
by: Wang, Lehan, et al.
Published: (2025)
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
by: Liu, Chunxu, et al.
Published: (2025)
by: Liu, Chunxu, et al.
Published: (2025)
CrossVid: A Comprehensive Benchmark for Evaluating Cross-Video Reasoning in Multimodal Large Language Models
by: Li, Jingyao, et al.
Published: (2025)
by: Li, Jingyao, et al.
Published: (2025)
Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval
by: Dong, Jianfeng, et al.
Published: (2025)
by: Dong, Jianfeng, et al.
Published: (2025)
Toward Robust and Harmonious Adaptation for Cross-modal Retrieval
by: Li, Haobin, et al.
Published: (2025)
by: Li, Haobin, et al.
Published: (2025)
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
by: Suo, Yucheng, et al.
Published: (2024)
by: Suo, Yucheng, et al.
Published: (2024)
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
by: Sun, Yuxuan, et al.
Published: (2024)
by: Sun, Yuxuan, et al.
Published: (2024)
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering
by: Li, Peize, et al.
Published: (2024)
by: Li, Peize, et al.
Published: (2024)
EscapeCraft: A 3D Room Escape Environment for Benchmarking Complex Multimodal Reasoning Ability
by: Wang, Ziyue, et al.
Published: (2025)
by: Wang, Ziyue, et al.
Published: (2025)
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
by: Ma, Wufei, et al.
Published: (2024)
by: Ma, Wufei, et al.
Published: (2024)
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
by: Hong, Yuyang, et al.
Published: (2025)
by: Hong, Yuyang, et al.
Published: (2025)
Multi-granularity Correspondence Learning from Long-term Noisy Videos
by: Lin, Yijie, et al.
Published: (2024)
by: Lin, Yijie, et al.
Published: (2024)
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding
by: Li, Jiaang, et al.
Published: (2025)
by: Li, Jiaang, et al.
Published: (2025)
CREM: Compression-Driven Representation Enhancement for Multimodal Retrieval and Comprehension
by: Liu, Lihao, et al.
Published: (2026)
by: Liu, Lihao, et al.
Published: (2026)
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
by: Wang, Zhikai, et al.
Published: (2025)
by: Wang, Zhikai, et al.
Published: (2025)
TRACE: Task-Adaptive Reasoning and Representation Learning for Universal Multimodal Retrieval
by: Hao, Xiangzhao, et al.
Published: (2026)
by: Hao, Xiangzhao, et al.
Published: (2026)
See the Forest and the Trees: A Synergistic Reasoning Framework for Knowledge-Based Visual Question Answering
by: Wang, Junjie, et al.
Published: (2025)
by: Wang, Junjie, et al.
Published: (2025)
DUDE: Diffusion-Based Unsupervised Cross-Domain Image Retrieval
by: Yang, Ruohong, et al.
Published: (2025)
by: Yang, Ruohong, et al.
Published: (2025)
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
by: Yuan, Haobo, et al.
Published: (2025)
by: Yuan, Haobo, et al.
Published: (2025)
A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study
by: Hu, Xia, et al.
Published: (2026)
by: Hu, Xia, et al.
Published: (2026)
MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
by: Zhang, Yuanhan, et al.
Published: (2024)
by: Zhang, Yuanhan, et al.
Published: (2024)
AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process
by: Zhang, Xintong, et al.
Published: (2026)
by: Zhang, Xintong, et al.
Published: (2026)
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
by: Xu, Zhaopan, et al.
Published: (2025)
by: Xu, Zhaopan, et al.
Published: (2025)
SOLAR: Self-supervised Joint Learning for Symmetric Multimodal Retrieval
by: Yang, Wenjie, et al.
Published: (2026)
by: Yang, Wenjie, et al.
Published: (2026)
MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles
by: Ji, Yuheng, et al.
Published: (2025)
by: Ji, Yuheng, et al.
Published: (2025)
Multimodal Fusion SLAM with Fourier Attention
by: Zhou, Youjie, et al.
Published: (2025)
by: Zhou, Youjie, et al.
Published: (2025)
Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models
by: Kim, Jeonghwan, et al.
Published: (2026)
by: Kim, Jeonghwan, et al.
Published: (2026)
Similar Items
-
Reliable Thinking with Images
by: Li, Haobin, et al.
Published: (2026) -
LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
by: Lu, Yiding, et al.
Published: (2025) -
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
by: Lu, Yiding, et al.
Published: (2023) -
A Survey on Deep Clustering: From the Prior Perspective
by: Lu, Yiding, et al.
Published: (2024) -
An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model
by: Tian, Yuxin, et al.
Published: (2024)