Saved in:
| Main Authors: | Chen, Yang, Shen, Yufan, Huang, Wenxuan, Zhou, Sheng, Lin, Qunshu, Cai, Xinyu, Yu, Zhi, Bu, Jiajun, Shi, Botian, Qiao, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.20766 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
by: Shao, Zirui, et al.
Published: (2024)
by: Shao, Zirui, et al.
Published: (2024)
UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images
by: Li, Siqi, et al.
Published: (2025)
by: Li, Siqi, et al.
Published: (2025)
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning
by: Zhou, Chunpeng, et al.
Published: (2025)
by: Zhou, Chunpeng, et al.
Published: (2025)
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
by: Li, Siqi, et al.
Published: (2025)
by: Li, Siqi, et al.
Published: (2025)
Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
by: Liang, Guotao, et al.
Published: (2026)
by: Liang, Guotao, et al.
Published: (2026)
Visual Acuity Consistent Foveated Rendering towards Retinal Resolution
by: Zhang, Zhi, et al.
Published: (2025)
by: Zhang, Zhi, et al.
Published: (2025)
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
by: Yang, Cheng, et al.
Published: (2025)
by: Yang, Cheng, et al.
Published: (2025)
Less is More: A Closer Look at Semantic-based Few-Shot Learning
by: Zhou, Chunpeng, et al.
Published: (2024)
by: Zhou, Chunpeng, et al.
Published: (2024)
REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment
by: Ye, Kai, et al.
Published: (2026)
by: Ye, Kai, et al.
Published: (2026)
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
Learning GUI Grounding with Spatial Reasoning from Visual Feedback
by: Zhao, Yu, et al.
Published: (2025)
by: Zhao, Yu, et al.
Published: (2025)
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
by: Huang, Xinyu, et al.
Published: (2025)
by: Huang, Xinyu, et al.
Published: (2025)
Visual Reasoning through Tool-supervised Reinforcement Learning
by: Dong, Qihua, et al.
Published: (2026)
by: Dong, Qihua, et al.
Published: (2026)
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
by: Chen, Jiali, et al.
Published: (2024)
by: Chen, Jiali, et al.
Published: (2024)
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
by: Zeng, Yu, et al.
Published: (2025)
by: Zeng, Yu, et al.
Published: (2025)
Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning
by: Mo, Ye, et al.
Published: (2025)
by: Mo, Ye, et al.
Published: (2025)
MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
by: Meng, Fanqing, et al.
Published: (2025)
by: Meng, Fanqing, et al.
Published: (2025)
MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering
by: Mao, Xianwei, et al.
Published: (2026)
by: Mao, Xianwei, et al.
Published: (2026)
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
by: Shen, Yufan, et al.
Published: (2024)
by: Shen, Yufan, et al.
Published: (2024)
Grounded Reinforcement Learning for Visual Reasoning
by: Sarch, Gabriel, et al.
Published: (2025)
by: Sarch, Gabriel, et al.
Published: (2025)
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
by: Liu, Yuqi, et al.
Published: (2025)
by: Liu, Yuqi, et al.
Published: (2025)
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection
by: Fu, Daocheng, et al.
Published: (2025)
by: Fu, Daocheng, et al.
Published: (2025)
Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
by: Shen, Yiqing, et al.
Published: (2025)
by: Shen, Yiqing, et al.
Published: (2025)
Automated Visualization Code Synthesis via Multi-Path Reasoning and Feedback-Driven Optimization
by: Seo, Wonduk, et al.
Published: (2025)
by: Seo, Wonduk, et al.
Published: (2025)
Visual Planning: Let's Think Only with Images
by: Xu, Yi, et al.
Published: (2025)
by: Xu, Yi, et al.
Published: (2025)
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation
by: Li, Yaqi, et al.
Published: (2025)
by: Li, Yaqi, et al.
Published: (2025)
Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior
by: Wang, Sheng, et al.
Published: (2025)
by: Wang, Sheng, et al.
Published: (2025)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding
by: Xia, Renqiu, et al.
Published: (2023)
by: Xia, Renqiu, et al.
Published: (2023)
OmniDrones: An Efficient and Flexible Platform for Reinforcement Learning in Drone Control
by: Xu, Botian, et al.
Published: (2023)
by: Xu, Botian, et al.
Published: (2023)
Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
by: Zhao, Shijie, et al.
Published: (2025)
by: Zhao, Shijie, et al.
Published: (2025)
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
by: Yu, Kelin, et al.
Published: (2025)
by: Yu, Kelin, et al.
Published: (2025)
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
by: Yang, Chuanguang, et al.
Published: (2025)
by: Yang, Chuanguang, et al.
Published: (2025)
Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
by: Wu, Yuhang, et al.
Published: (2026)
by: Wu, Yuhang, et al.
Published: (2026)
Reinforcing Multimodal Reasoning Against Visual Degradation
by: Liu, Rui, et al.
Published: (2026)
by: Liu, Rui, et al.
Published: (2026)
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
by: Wang, Yizhou, et al.
Published: (2025)
by: Wang, Yizhou, et al.
Published: (2025)
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
by: Ma, Xinyu, et al.
Published: (2025)
by: Ma, Xinyu, et al.
Published: (2025)
Audio Spatially-Guided Fusion for Audio-Visual Navigation
by: Zhou, Xinyu, et al.
Published: (2026)
by: Zhou, Xinyu, et al.
Published: (2026)
Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
by: Gao, Shujian, et al.
Published: (2026)
by: Gao, Shujian, et al.
Published: (2026)
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
by: Lin, Weifeng, et al.
Published: (2024)
by: Lin, Weifeng, et al.
Published: (2024)
Similar Items
-
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
by: Shao, Zirui, et al.
Published: (2024) -
UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images
by: Li, Siqi, et al.
Published: (2025) -
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
by: Chen, Yang, et al.
Published: (2025) -
One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning
by: Zhou, Chunpeng, et al.
Published: (2025) -
GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling
by: Li, Siqi, et al.
Published: (2025)