Saved in:
| Main Authors: | Ma, Yan, Zhang, Weiyu, Li, Tianle, Du, Linge, Shen, Xuyang, Liu, Pengfei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.01334 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
One RL to See Them All: Visual Triple Unified Reinforcement Learning
by: Ma, Yan, et al.
Published: (2025)
by: Ma, Yan, et al.
Published: (2025)
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
by: Jiang, Dongfu, et al.
Published: (2025)
by: Jiang, Dongfu, et al.
Published: (2025)
CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025)
by: Carvalho, Miguel, et al.
Published: (2025)
Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use
by: Zhang, Yabo, et al.
Published: (2025)
by: Zhang, Yabo, et al.
Published: (2025)
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
by: Yang, Zuhao, et al.
Published: (2026)
by: Yang, Zuhao, et al.
Published: (2026)
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
by: Ma, Yan, et al.
Published: (2025)
by: Ma, Yan, et al.
Published: (2025)
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in
by: Shen, Xiaoqian, et al.
Published: (2025)
by: Shen, Xiaoqian, et al.
Published: (2025)
OralGPT-Plus: Learning to Use Visual Tools via Reinforcement Learning for Panoramic X-ray Analysis
by: Fan, Yuxuan, et al.
Published: (2026)
by: Fan, Yuxuan, et al.
Published: (2026)
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
by: Huang, Zeyi, et al.
Published: (2025)
by: Huang, Zeyi, et al.
Published: (2025)
Visual Reasoning through Tool-supervised Reinforcement Learning
by: Dong, Qihua, et al.
Published: (2026)
by: Dong, Qihua, et al.
Published: (2026)
Do Multimodal Agents Really Benefit from Tool Use? A Systematic Study of Capability Gains
by: Guo, Garvin, et al.
Published: (2026)
by: Guo, Garvin, et al.
Published: (2026)
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
by: Yue, Yang, et al.
Published: (2025)
by: Yue, Yang, et al.
Published: (2025)
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
by: Wang, Chenyu, et al.
Published: (2024)
by: Wang, Chenyu, et al.
Published: (2024)
Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning
by: Zhang, Haoji, et al.
Published: (2025)
by: Zhang, Haoji, et al.
Published: (2025)
MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
Reinforced Visual Perception with Tools
by: Zhou, Zetong, et al.
Published: (2025)
by: Zhou, Zetong, et al.
Published: (2025)
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
by: Su, Zhaochen, et al.
Published: (2025)
by: Su, Zhaochen, et al.
Published: (2025)
What Really Matters for Learning-based LiDAR-Camera Calibration
by: Huang, Shujuan, et al.
Published: (2025)
by: Huang, Shujuan, et al.
Published: (2025)
AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
by: Su, Jiaming, et al.
Published: (2026)
by: Su, Jiaming, et al.
Published: (2026)
VIoTGPT: Learning to Schedule Vision Tools in LLMs towards Intelligent Video Internet of Things
by: Zhong, Yaoyao, et al.
Published: (2023)
by: Zhong, Yaoyao, et al.
Published: (2023)
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
by: Geigle, Gregor, et al.
Published: (2024)
by: Geigle, Gregor, et al.
Published: (2024)
Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?
by: Lyu, Yunbo, et al.
Published: (2025)
by: Lyu, Yunbo, et al.
Published: (2025)
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
by: Lu, Meng, et al.
Published: (2025)
by: Lu, Meng, et al.
Published: (2025)
CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains
by: Wang, Wenhan, et al.
Published: (2026)
by: Wang, Wenhan, et al.
Published: (2026)
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
by: Ding, Shengyuan, et al.
Published: (2025)
by: Ding, Shengyuan, et al.
Published: (2025)
Deep Learning at the Intersection: Certified Robustness as a Tool for 3D Vision
by: S, Gabriel Pérez, et al.
Published: (2024)
by: S, Gabriel Pérez, et al.
Published: (2024)
Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models
by: Han, Yuhang, et al.
Published: (2026)
by: Han, Yuhang, et al.
Published: (2026)
EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision
by: Dong, Yiting, et al.
Published: (2024)
by: Dong, Yiting, et al.
Published: (2024)
OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents
by: Jia, Hongrui, et al.
Published: (2025)
by: Jia, Hongrui, et al.
Published: (2025)
Reliable Disentanglement Multi-view Learning Against View Adversarial Attacks
by: Wang, Xuyang, et al.
Published: (2025)
by: Wang, Xuyang, et al.
Published: (2025)
VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning
by: Wang, Yuji, et al.
Published: (2025)
by: Wang, Yuji, et al.
Published: (2025)
A Calibration Tool for Refractive Underwater Vision
by: Seegräber, Felix, et al.
Published: (2024)
by: Seegräber, Felix, et al.
Published: (2024)
Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification
by: Qiu, Kunpeng, et al.
Published: (2024)
by: Qiu, Kunpeng, et al.
Published: (2024)
Does the Skeleton-Recall Loss Really Work?
by: Arora, Devansh, et al.
Published: (2025)
by: Arora, Devansh, et al.
Published: (2025)
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
by: Lee, Seung Hyun, et al.
Published: (2024)
by: Lee, Seung Hyun, et al.
Published: (2024)
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
by: Kumar, Sunil, et al.
Published: (2025)
by: Kumar, Sunil, et al.
Published: (2025)
On the Global Photometric Alignment for Low-Level Vision
by: Li, Mingjia, et al.
Published: (2026)
by: Li, Mingjia, et al.
Published: (2026)
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
by: Shen, Haozhan, et al.
Published: (2024)
by: Shen, Haozhan, et al.
Published: (2024)
PyVision: Agentic Vision with Dynamic Tooling
by: Zhao, Shitian, et al.
Published: (2025)
by: Zhao, Shitian, et al.
Published: (2025)
Adversarial Orthogonal Disentanglement for LVLM Hallucination Mitigation
by: Cheng, Ruoxi, et al.
Published: (2026)
by: Cheng, Ruoxi, et al.
Published: (2026)
Similar Items
-
One RL to See Them All: Visual Triple Unified Reinforcement Learning
by: Ma, Yan, et al.
Published: (2025) -
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
by: Jiang, Dongfu, et al.
Published: (2025) -
CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025) -
Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use
by: Zhang, Yabo, et al.
Published: (2025) -
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
by: Yang, Zuhao, et al.
Published: (2026)