:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Yang, Chen, Xing, Liu, Yutao, Qi, Gege, BI, Yanxian, Wang, Zizhe, Zhang, Yunjian, Zhu, Yao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.09337
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Understanding How Knowledge Evolves in Large Vision-Language Models
by: Wang, Sudong, et al.
Published: (2025)

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
by: Xu, Peiran, et al.
Published: (2025)

RS-OOD: A Vision-Language Augmented Framework for Out-of-Distribution Detection in Remote Sensing
by: Wang, Chenhao, et al.
Published: (2025)

SOPSeg: Prompt-based Small Object Instance Segmentation in Remote Sensing Imagery
by: Wang, Chenhao, et al.
Published: (2025)

An Efficient Framework for Enhancing Discriminative Models via Diffusion Techniques
by: Li, Chunxiao, et al.
Published: (2024)

CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making
by: Jiang, Songtao, et al.
Published: (2025)

Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection
by: Zhu, Jiaqi, et al.
Published: (2024)

Think 360°: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
by: Chen, Mingrui, et al.
Published: (2026)

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
by: Miao, Boming, et al.
Published: (2024)

Exploring Decision-Making Capabilities of LLM Agents: An Experimental Study on Jump-Jump Game
by: Li, Juwu
Published: (2025)

Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs
by: Ding, Xuan, et al.
Published: (2025)

Angle of Arrival Estimation with Transformer: A Sparse and Gridless Method with Zero-Shot Capability
by: Zhu, Zhaoxuan, et al.
Published: (2024)

AdvLogo: Adversarial Patch Attack against Object Detectors based on Diffusion Models
by: Miao, Boming, et al.
Published: (2024)

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
by: Zhou, Qinhong, et al.
Published: (2024)

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games
by: He, Yidong, et al.
Published: (2026)

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
by: Shangguan, Ziyao, et al.
Published: (2024)

Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering
by: Li, Qiming, et al.
Published: (2025)

D2-Mamba: Dual-Scale Fusion and Dual-Path Scanning with SSMs for Shadow Removal
by: Li, Linhao, et al.
Published: (2025)

Making Large Language Models Better Planners with Reasoning-Decision Alignment
by: Huang, Zhijian, et al.
Published: (2024)

Unsupervised Domain Adaptive Lane Detection via Contextual Contrast and Aggregation
by: Zhou, Kunyang, et al.
Published: (2024)

Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation
by: Kuprashevich, Maksim, et al.
Published: (2024)

UCAgents: Unidirectional Convergence for Visual Evidence Anchored Multi-Agent Medical Decision-Making
by: Feng, Qianhan, et al.
Published: (2025)

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
by: Qiao, Yanyuan, et al.
Published: (2024)

RemoteZero: Geospatial Reasoning with Zero Human Annotations
by: Yao, Liang, et al.
Published: (2026)

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs
by: Kancheti, Sai Srinivas, et al.
Published: (2026)

V-Zero: Self-Improving Multimodal Reasoning with Zero Annotation
by: Wang, Han, et al.
Published: (2026)

DoubleTake: Contrastive Reasoning for Faithful Decision-Making in Medical Imaging
by: Patel, Daivik, et al.
Published: (2026)

Decoding Decision Reasoning: A Counterfactual-Powered Model for Knowledge Discovery
by: Fang, Yingying, et al.
Published: (2024)

AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
by: Wang, Diwei, et al.
Published: (2025)

ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
by: Guo, Zichun, et al.
Published: (2026)

Quantum Conflict Measurement in Decision Making for Out-of-Distribution Detection
by: Dong, Yilin, et al.
Published: (2025)

Bridging the Gap Between Ideal and Real-world Evaluation: Benchmarking AI-Generated Image Detection in Challenging Scenarios
by: Li, Chunxiao, et al.
Published: (2025)

CEOs, Information, and Decision Making: Scanning the Environment for Strategic Advantage.
by: Auster, Ethel, et al.
Published: (1994)

Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
by: Yang, Zhihe, et al.
Published: (2025)

SGDM: Static-Guided Dynamic Module Make Stronger Visual Models
by: Xing, Wenjie, et al.
Published: (2024)

RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
by: Wang, Yuran, et al.
Published: (2025)

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
by: Qiao, Yuxuan, et al.
Published: (2024)

Object Navigation with Structure-Semantic Reasoning-Based Multi-level Map and Multimodal Decision-Making LLM
by: Yan, Chongshang, et al.
Published: (2025)

Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments
by: Taourirte, Aya, et al.
Published: (2025)

Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs
by: Zhu, Rui, et al.
Published: (2026)