:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Wenkai, Li, Xiyun, Guo, Hongcan, Yu, Wenhao, Fang, Tianqing, Mi, Haitao, Yu, Dong, Zhang, Shengyu
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.21268
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
by: Tang, Fei, et al.
Published: (2025)

A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
by: Wang, Wenkai, et al.
Published: (2025)

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)

WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
by: Zhang, Zhisong, et al.
Published: (2025)

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
by: Fang, Tianqing, et al.
Published: (2025)

Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
by: Cheng, Kanzhi, et al.
Published: (2024)

Verified Critical Step Optimization for LLM Agents
by: Li, Mukai, et al.
Published: (2026)

World-Model-Augmented Web Agents with Action Correction
by: Shen, Zhouzhou, et al.
Published: (2026)

Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
by: Ma, Junyu, et al.
Published: (2025)

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
by: Zhang, Yan, et al.
Published: (2026)

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)

WinClick: GUI Grounding with Multimodal Large Language Models
by: Hui, Zheng, et al.
Published: (2025)

Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
by: Wan, Yuxuan, et al.
Published: (2026)

WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback
by: Hu, Minda, et al.
Published: (2025)

“Measure Twice Cut Once” to Avoid Conduction System Injury and Eliminate Parahisian PVCs
by: Francis E. Marchlinski, et al.
Published: (2025)

Beyond Clicking:A Step Towards Generalist GUI Grounding via Text Dragging
by: Liao, Zeyi, et al.
Published: (2025)

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)

Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding
by: Jiang, Zhiyuan, et al.
Published: (2025)

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
by: Li, Hongxin, et al.
Published: (2026)

Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
by: Yuan, Xinbin, et al.
Published: (2025)

Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection
by: Phan, Hoang, et al.
Published: (2025)

POINTS-GUI-G: GUI-Grounding Journey
by: Zhao, Zhongyin, et al.
Published: (2026)

\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025)

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)

Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
by: Pang, Zongshang, et al.
Published: (2025)

GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
by: Fang, Tianqing, et al.
Published: (2025)

UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
by: Deng, Chenlong, et al.
Published: (2025)

InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing
by: Li, Shuaiyi, et al.
Published: (2025)

CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
by: Nong, Songqin, et al.
Published: (2025)

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
by: He, Hongliang, et al.
Published: (2024)

DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding
by: Liu, Yichao, et al.
Published: (2026)

Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
by: Wan, Xu, et al.
Published: (2025)

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
by: Liu, Shuming, et al.
Published: (2026)

A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals
by: Yin, Pingping, et al.
Published: (2025)

HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
by: Yao, Wenlin, et al.
Published: (2024)