Saved in:
Bibliographic Details
Main Authors: Zhang, Shaojie, Fu, Pei, Zhang, Ruoceng, Yang, Jiahui, Du, Anan, Xi, Xiuwen, Wang, Shaokang, Huang, Ying, Qin, Bin, Luo, Zhenbo, Luan, Jian
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.27266
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Autonomous graphical user interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement learning (RL), often provide confidence signals that are poorly aligned with actual grounding correctness, leading to overconfident and unreliable predictions. To address this, we propose HyperClick, a novel framework that enhances trustworthy GUI grounding through self-critiqued reinforcement learning (SCRL). HyperClick combines a correctness reward and a confidence alignment reward, training the policy model to output both a click prediction and an explicit confidence estimate. This approach jointly optimizes grounding accuracy and confidence reliability through confidence-based self-assessment. Extensive experiments on challenging benchmarks show that HyperClick maintains strong grounding performance while providing better-aligned confidence estimates. By exposing uncertainty alongside GUI actions, HyperClick supports confidence-based abstention in GUI automation. Code will be released here.