:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ma, Yan, Zhang, Weiyu, Li, Tianle, Du, Linge, Shen, Xuyang, Liu, Pengfei
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01334
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

One RL to See Them All: Visual Triple Unified Reinforcement Learning
by: Ma, Yan, et al.
Published: (2025)

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
by: Jiang, Dongfu, et al.
Published: (2025)

CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025)

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use
by: Zhang, Yabo, et al.
Published: (2025)

ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
by: Yang, Zuhao, et al.
Published: (2026)

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
by: Ma, Yan, et al.
Published: (2025)

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
by: Shen, Xiaoqian, et al.
Published: (2025)

OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis
by: Fan, Yuxuan, et al.
Published: (2026)

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
by: Huang, Zeyi, et al.
Published: (2025)

Visual Reasoning through Tool-supervised Reinforcement Learning
by: Dong, Qihua, et al.
Published: (2026)

Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains
by: Guo, Garvin, et al.
Published: (2026)

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
by: Yue, Yang, et al.
Published: (2025)

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
by: Wang, Chenyu, et al.
Published: (2024)

Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
by: Zhang, Haoji, et al.
Published: (2025)

MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
by: Li, Yifan, et al.
Published: (2025)

Reinforced Visual Perception with Tools
by: Zhou, Zetong, et al.
Published: (2025)

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
by: Su, Zhaochen, et al.
Published: (2025)

What Really Matters for Learning-based LiDAR-Camera Calibration
by: Huang, Shujuan, et al.
Published: (2025)

AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
by: Su, Jiaming, et al.
Published: (2026)

VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things
by: Zhong, Yaoyao, et al.
Published: (2023)

Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
by: Geigle, Gregor, et al.
Published: (2024)

Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?
by: Lyu, Yunbo, et al.
Published: (2025)

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)

CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains
by: Wang, Wenhan, et al.
Published: (2026)

ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)

Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision
by: S, Gabriel Pérez, et al.
Published: (2024)

Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models
by: Han, Yuhang, et al.
Published: (2026)

EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision
by: Dong, Yiting, et al.
Published: (2024)

OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
by: Jia, Hongrui, et al.
Published: (2025)

Reliable Disentanglement Multi-view Learning Against View Adversarial Attacks
by: Wang, Xuyang, et al.
Published: (2025)

VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
by: Wang, Yuji, et al.
Published: (2025)

A Calibration Tool for Refractive Underwater Vision
by: Seegräber, Felix, et al.
Published: (2024)

Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification
by: Qiu, Kunpeng, et al.
Published: (2024)

Does the Skeleton-Recall Loss Really Work?
by: Arora, Devansh, et al.
Published: (2025)

Cropper: Vision-Language Model for Image Cropping through In-Context Learning
by: Lee, Seung Hyun, et al.
Published: (2024)

Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
by: Kumar, Sunil, et al.
Published: (2025)

On the Global Photometric Alignment for Low-Level Vision
by: Li, Mingjia, et al.
Published: (2026)

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
by: Shen, Haozhan, et al.
Published: (2024)

PyVision: Agentic Vision with Dynamic Tooling
by: Zhao, Shitian, et al.
Published: (2025)

Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation
by: Cheng, Ruoxi, et al.
Published: (2026)