Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Shaojie, Fu, Pei, Zhang, Ruoceng, Yang, Jiahui, Du, Anan, Xi, Xiuwen, Wang, Shaokang, Huang, Ying, Qin, Bin, Luo, Zhenbo, Luan, Jian
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.27266
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Autonomous graphical user interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement learning (RL), often provide confidence signals that are poorly aligned with actual grounding correctness, leading to overconfident and unreliable predictions. To address this, we propose HyperClick, a novel framework that enhances trustworthy GUI grounding through self-critiqued reinforcement learning (SCRL). HyperClick combines a correctness reward and a confidence alignment reward, training the policy model to output both a click prediction and an explicit confidence estimate. This approach jointly optimizes grounding accuracy and confidence reliability through confidence-based self-assessment. Extensive experiments on challenging benchmarks show that HyperClick maintains strong grounding performance while providing better-aligned confidence estimates. By exposing uncertainty alongside GUI actions, HyperClick supports confidence-based abstention in GUI automation. Code will be released here.

Similar Items