:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Yang, Shen, Yufan, Huang, Wenxuan, Zhou, Sheng, Lin, Qunshu, Cai, Xinyu, Yu, Zhi, Bu, Jiajun, Shi, Botian, Qiao, Yu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.20766
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
by: Shao, Zirui, et al.
Published: (2024)

UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images
by: Li, Siqi, et al.
Published: (2025)

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
by: Chen, Yang, et al.
Published: (2025)

One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning
by: Zhou, Chunpeng, et al.
Published: (2025)

GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
by: Li, Siqi, et al.
Published: (2025)

Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
by: Liang, Guotao, et al.
Published: (2026)

Visual Acuity Consistent Foveated Rendering towards Retinal Resolution
by: Zhang, Zhi, et al.
Published: (2025)

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
by: Yang, Cheng, et al.
Published: (2025)

Less is More: A Closer Look at Semantic-based Few-Shot Learning
by: Zhou, Chunpeng, et al.
Published: (2024)

REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment
by: Ye, Kai, et al.
Published: (2026)

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)

Learning GUI Grounding with Spatial Reasoning from Visual Feedback
by: Zhao, Yu, et al.
Published: (2025)

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
by: Huang, Xinyu, et al.
Published: (2025)

Visual Reasoning through Tool-supervised Reinforcement Learning
by: Dong, Qihua, et al.
Published: (2026)

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
by: Chen, Jiali, et al.
Published: (2024)

Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
by: Zeng, Yu, et al.
Published: (2025)

Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning
by: Mo, Ye, et al.
Published: (2025)

MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
by: Meng, Fanqing, et al.
Published: (2025)

MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
by: Shen, Yufan, et al.
Published: (2024)

Grounded Reinforcement Learning for Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2025)

VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
by: Liu, Yuqi, et al.
Published: (2025)

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection
by: Fu, Daocheng, et al.
Published: (2025)

Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
by: Shen, Yiqing, et al.
Published: (2025)

Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization
by: Seo, Wonduk, et al.
Published: (2025)

Visual Planning: Let's Think Only with Images
by: Xu, Yi, et al.
Published: (2025)

Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation
by: Li, Yaqi, et al.
Published: (2025)

Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior
by: Wang, Sheng, et al.
Published: (2025)

StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
by: Xia, Renqiu, et al.
Published: (2023)

OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control
by: Xu, Botian, et al.
Published: (2023)

Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
by: Zhao, Shijie, et al.
Published: (2025)

GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
by: Yu, Kelin, et al.
Published: (2025)

Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
by: Yang, Chuanguang, et al.
Published: (2025)

Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
by: Wu, Yuhang, et al.
Published: (2026)

Reinforcing Multimodal Reasoning Against Visual Degradation
by: Liu, Rui, et al.
Published: (2026)

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
by: Wang, Yizhou, et al.
Published: (2025)

KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
by: Ma, Xinyu, et al.
Published: (2025)

Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)

Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
by: Gao, Shujian, et al.
Published: (2026)

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
by: Lin, Weifeng, et al.
Published: (2024)