Saved in:
| Main Authors: | Zhang, Shaojie, Fu, Pei, Zhang, Ruoceng, Yang, Jiahui, Du, Anan, Xi, Xiuwen, Wang, Shaokang, Huang, Ying, Qin, Bin, Luo, Zhenbo, Luan, Jian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.27266 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
by: Sun, Yuchen, et al.
Published: (2026)
by: Sun, Yuchen, et al.
Published: (2026)
GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models
by: Wang, Shaokang, et al.
Published: (2026)
by: Wang, Shaokang, et al.
Published: (2026)
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025)
by: Zhang, Shaojie, et al.
Published: (2025)
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
by: Xu, Longwei, et al.
Published: (2026)
by: Xu, Longwei, et al.
Published: (2026)
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
by: Gao, Longxi, et al.
Published: (2025)
by: Gao, Longxi, et al.
Published: (2025)
IMTBench: A Multi-Scenario Cross-Modal Collaborative Evaluation Benchmark for In-Image Machine Translation
by: Lyu, Jiahao, et al.
Published: (2026)
by: Lyu, Jiahao, et al.
Published: (2026)
PatchCue: Enhancing Vision-Language Model Reasoning with Patch-Based Visual Cues
by: Qi, Yukun, et al.
Published: (2026)
by: Qi, Yukun, et al.
Published: (2026)
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
by: Yuan, Xinbin, et al.
Published: (2025)
by: Yuan, Xinbin, et al.
Published: (2025)
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
by: Fang, Yiyang, et al.
Published: (2026)
by: Fang, Yiyang, et al.
Published: (2026)
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
by: Tang, Liujian, et al.
Published: (2025)
by: Tang, Liujian, et al.
Published: (2025)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)
by: Wu, Qianhui, et al.
Published: (2025)
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
by: Wu, Qinzhuo, et al.
Published: (2025)
by: Wu, Qinzhuo, et al.
Published: (2025)
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025)
by: Lei, Bin, et al.
Published: (2025)
GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
by: Ye, Xianhang, et al.
Published: (2025)
by: Ye, Xianhang, et al.
Published: (2025)
AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement
by: Pei, Siqi, et al.
Published: (2026)
by: Pei, Siqi, et al.
Published: (2026)
GUI-PRA: Process Reward Agent for GUI Tasks
by: Xiong, Tao, et al.
Published: (2025)
by: Xiong, Tao, et al.
Published: (2025)
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
by: Chen, Liangyu, et al.
Published: (2025)
by: Chen, Liangyu, et al.
Published: (2025)
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
by: Xu, Boshen, et al.
Published: (2025)
by: Xu, Boshen, et al.
Published: (2025)
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning
by: Lin, Jianbo, et al.
Published: (2026)
by: Lin, Jianbo, et al.
Published: (2026)
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle
by: Zhu, Linghao, et al.
Published: (2025)
by: Zhu, Linghao, et al.
Published: (2025)
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
by: Du, Yong, et al.
Published: (2025)
by: Du, Yong, et al.
Published: (2025)
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning
by: Ruan, Chi, et al.
Published: (2025)
by: Ruan, Chi, et al.
Published: (2025)
MEGA-GUI: Multi-stage Enhanced Grounding Agents for GUI Elements
by: Kwak, SeokJoo, et al.
Published: (2025)
by: Kwak, SeokJoo, et al.
Published: (2025)
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique
by: Li, Yansi, et al.
Published: (2025)
by: Li, Yansi, et al.
Published: (2025)
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
by: Li, Jiaze, et al.
Published: (2026)
by: Li, Jiaze, et al.
Published: (2026)
VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction
by: Yan, Xiaoyang, et al.
Published: (2026)
by: Yan, Xiaoyang, et al.
Published: (2026)
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
by: Hu, Junan, et al.
Published: (2026)
by: Hu, Junan, et al.
Published: (2026)
Doc-V*:Coarse-to-Fine Interactive Visual Reasoning for Multi-Page Document VQA
by: Zheng, Yuanlei, et al.
Published: (2026)
by: Zheng, Yuanlei, et al.
Published: (2026)
Multiple q-zeta values and traces
by: Qin, Zhenbo
Published: (2025)
by: Qin, Zhenbo
Published: (2025)
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
by: Zhang, Miaosen, et al.
Published: (2025)
by: Zhang, Miaosen, et al.
Published: (2025)
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
by: Tan, Wenhui, et al.
Published: (2025)
by: Tan, Wenhui, et al.
Published: (2025)
Towards Trustworthy GUI Agents: A Survey
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
by: Yang, Xiao, et al.
Published: (2025)
by: Yang, Xiao, et al.
Published: (2025)
GUI-C$^2$: Coarse-to-Fine GUI Grounding via Difficulty-Aware Reinforcement Learning
by: Li, Junlong, et al.
Published: (2026)
by: Li, Junlong, et al.
Published: (2026)
Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
by: Zhang, Yan, et al.
Published: (2026)
by: Zhang, Yan, et al.
Published: (2026)
GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding
by: Tang, Fei, et al.
Published: (2025)
by: Tang, Fei, et al.
Published: (2025)
Similar Items
-
Beyond Binary: Reframing GUI Critique as Continuous Semantic Alignment
by: Sun, Yuchen, et al.
Published: (2026) -
GAIA: A Data Flywheel System for Training GUI Test-Time Scaling Critic Models
by: Wang, Shaokang, et al.
Published: (2026) -
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
by: Zhang, Shaojie, et al.
Published: (2025) -
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs
by: Zhang, Shaojie, et al.
Published: (2025) -
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
by: Xu, Longwei, et al.
Published: (2026)