Saved in:
| Main Authors: | Wang, Wenkai, Li, Xiyun, Guo, Hongcan, Yu, Wenhao, Fang, Tianqing, Mi, Haitao, Yu, Dong, Zhang, Shengyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.21268 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
by: Tang, Fei, et al.
Published: (2025)
by: Tang, Fei, et al.
Published: (2025)
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
by: Wang, Wenkai, et al.
Published: (2025)
by: Wang, Wenkai, et al.
Published: (2025)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)
by: Zhang, Ce, et al.
Published: (2025)
WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
by: Zhang, Zhisong, et al.
Published: (2025)
by: Zhang, Zhisong, et al.
Published: (2025)
WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model
by: Fang, Tianqing, et al.
Published: (2025)
by: Fang, Tianqing, et al.
Published: (2025)
Guided Self-Evolving LLMs with Minimal Human Supervision
by: Yu, Wenhao, et al.
Published: (2025)
by: Yu, Wenhao, et al.
Published: (2025)
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
by: Cheng, Kanzhi, et al.
Published: (2024)
by: Cheng, Kanzhi, et al.
Published: (2024)
Verified Critical Step Optimization for LLM Agents
by: Li, Mukai, et al.
Published: (2026)
by: Li, Mukai, et al.
Published: (2026)
World-Model-Augmented Web Agents with Action Correction
by: Shen, Zhouzhou, et al.
Published: (2026)
by: Shen, Zhouzhou, et al.
Published: (2026)
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning
by: Panaganti, Kishan, et al.
Published: (2026)
by: Panaganti, Kishan, et al.
Published: (2026)
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
by: Ma, Junyu, et al.
Published: (2025)
by: Ma, Junyu, et al.
Published: (2025)
Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
by: Zhang, Yan, et al.
Published: (2026)
by: Zhang, Yan, et al.
Published: (2026)
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
WinClick: GUI Grounding with Multimodal Large Language Models
by: Hui, Zheng, et al.
Published: (2025)
by: Hui, Zheng, et al.
Published: (2025)
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
by: Wan, Yuxuan, et al.
Published: (2026)
by: Wan, Yuxuan, et al.
Published: (2026)
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback
by: Hu, Minda, et al.
Published: (2025)
by: Hu, Minda, et al.
Published: (2025)
“Measure Twice Cut Once” to Avoid Conduction System Injury and Eliminate Parahisian PVCs
by: Francis E. Marchlinski, et al.
Published: (2025)
by: Francis E. Marchlinski, et al.
Published: (2025)
Beyond Clicking:A Step Towards Generalist GUI Grounding via Text Dragging
by: Liao, Zeyi, et al.
Published: (2025)
by: Liao, Zeyi, et al.
Published: (2025)
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding
by: Jiang, Zhiyuan, et al.
Published: (2025)
by: Jiang, Zhiyuan, et al.
Published: (2025)
GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
by: Li, Hongxin, et al.
Published: (2026)
by: Li, Hongxin, et al.
Published: (2026)
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
by: Yuan, Xinbin, et al.
Published: (2025)
by: Yuan, Xinbin, et al.
Published: (2025)
Think Twice, Generate Once: Safeguarding by Progressive Self-Reflection
by: Phan, Hoang, et al.
Published: (2025)
by: Phan, Hoang, et al.
Published: (2025)
POINTS-GUI-G: GUI-Grounding Journey
by: Zhao, Zhongyin, et al.
Published: (2026)
by: Zhao, Zhongyin, et al.
Published: (2026)
\textsc{GUI-Spotlight}: Adaptive Iterative Focus Refinement for Enhanced GUI Visual Grounding
by: Lei, Bin, et al.
Published: (2025)
by: Lei, Bin, et al.
Published: (2025)
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis
by: Shi, Yucheng, et al.
Published: (2026)
by: Shi, Yucheng, et al.
Published: (2026)
Measure Twice, Cut Once: A Semantic-Oriented Approach to Video Temporal Localization with Video LLMs
by: Pang, Zongshang, et al.
Published: (2025)
by: Pang, Zongshang, et al.
Published: (2025)
GuirlVG: Incentivize GUI Visual Grounding via Empirical Exploration on Reinforcement Learning
by: Kang, Weitai, et al.
Published: (2025)
by: Kang, Weitai, et al.
Published: (2025)
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
by: Fang, Tianqing, et al.
Published: (2025)
by: Fang, Tianqing, et al.
Published: (2025)
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
by: Deng, Chenlong, et al.
Published: (2025)
by: Deng, Chenlong, et al.
Published: (2025)
InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing
by: Li, Shuaiyi, et al.
Published: (2025)
by: Li, Shuaiyi, et al.
Published: (2025)
CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
by: Nong, Songqin, et al.
Published: (2025)
by: Nong, Songqin, et al.
Published: (2025)
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
by: He, Hongliang, et al.
Published: (2024)
by: He, Hongliang, et al.
Published: (2024)
DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding
by: Liu, Yichao, et al.
Published: (2026)
by: Liu, Yichao, et al.
Published: (2026)
Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
by: Wan, Xu, et al.
Published: (2025)
by: Wan, Xu, et al.
Published: (2025)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)
by: Wu, Qianhui, et al.
Published: (2025)
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
by: Liu, Shuming, et al.
Published: (2026)
by: Liu, Shuming, et al.
Published: (2026)
A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals
by: Yin, Pingping, et al.
Published: (2025)
by: Yin, Pingping, et al.
Published: (2025)
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows
by: Yao, Wenlin, et al.
Published: (2024)
by: Yao, Wenlin, et al.
Published: (2024)
Similar Items
-
Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
by: Tang, Fei, et al.
Published: (2025) -
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models
by: Wang, Wenkai, et al.
Published: (2025) -
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025) -
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025) -
WebRollback: Enhancing Web Agents with Explicit Rollback Mechanisms
by: Zhang, Zhisong, et al.
Published: (2025)