Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yu, Chung-En Johnny, Jalaian, Brian, Bastian, Nathaniel D.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Multiagent Systems
Online Access:	https://arxiv.org/abs/2509.16343
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Building robust vision systems for high-stakes domains such as remote sensing requires stronger visual reasoning than what single-pass inference typically provides; yet, retraining large models is often computationally expensive and data intensive. We present Visual Reasoning Agent (VRA), a training-free agentic visual reasoning framework that orchestrates off-the-shelf large vision-language models (LVLMs) with a large reasoning model (LRM) through an iterative Think-Critique-Act loop for cross-model verification, self-critique, and recursive refinement. On the remote sensing benchmark VRSBench VQA dataset, VRA consistently outperforms multiple standalone LVLM baselines and achieves up to 40.67\% improvement on challenging question types spanning both perception and reasoning tasks. In addition, integrating three LVLMs with VRA improves the overall accuracy of the standalone LVLMs from 52.8% to 78.8%, demonstrating the effectiveness of agentic reasoning with increased inference-time compute.

Similar Items