Saved in:
| Main Authors: | Yan, Haolong, Shen, Yeqing, Huang, Xin, Wang, Jia, Tan, Kaijun, Liang, Zhixuan, Li, Hongxin, Ge, Zheng, Yoshie, Osamu, Li, Si, Zhang, Xiangyu, Jiang, Daxin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.02423 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
by: Yan, Haolong, et al.
Published: (2025)
by: Yan, Haolong, et al.
Published: (2025)
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025)
by: Wang, Haoming, et al.
Published: (2025)
GUI Agents for Continual Game Generation
by: Huang, Yixu, et al.
Published: (2026)
by: Huang, Yixu, et al.
Published: (2026)
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
by: Sun, Yuchen, et al.
Published: (2025)
by: Sun, Yuchen, et al.
Published: (2025)
UIPro: Unleashing Superior Interaction Capability For GUI Agents
by: Li, Hongxin, et al.
Published: (2025)
by: Li, Hongxin, et al.
Published: (2025)
EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration
by: Li, Runze, et al.
Published: (2025)
by: Li, Runze, et al.
Published: (2025)
GEBench: Benchmarking Image Generation Models as GUI Environments
by: Li, Haodong, et al.
Published: (2026)
by: Li, Haodong, et al.
Published: (2026)
AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
by: Li, Hongxin, et al.
Published: (2025)
by: Li, Hongxin, et al.
Published: (2025)
Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)
by: Yan, Haolong, et al.
Published: (2025)
CLIP-driven rain perception: Adaptive deraining with pattern-aware network routing and mask-guided cross-attention
by: Guan, Cong, et al.
Published: (2025)
by: Guan, Cong, et al.
Published: (2025)
GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
by: Xie, Bin, et al.
Published: (2025)
by: Xie, Bin, et al.
Published: (2025)
GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
by: Li, Hongxin, et al.
Published: (2026)
by: Li, Hongxin, et al.
Published: (2026)
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
by: Fan, Yue, et al.
Published: (2025)
by: Fan, Yue, et al.
Published: (2025)
AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark
by: Li, Hongxin, et al.
Published: (2026)
by: Li, Hongxin, et al.
Published: (2026)
Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)
by: Gao, Xinzge, et al.
Published: (2025)
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)
by: Wang, Xuehui, et al.
Published: (2025)
CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
by: Nong, Songqin, et al.
Published: (2025)
by: Nong, Songqin, et al.
Published: (2025)
TinyClick: Single-Turn Agent for Empowering GUI Automation
by: Pawlowski, Pawel, et al.
Published: (2024)
by: Pawlowski, Pawel, et al.
Published: (2024)
FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing
by: Cai, Mingshu, et al.
Published: (2025)
by: Cai, Mingshu, et al.
Published: (2025)
API Agents vs. GUI Agents: Divergence and Convergence
by: Zhang, Chaoyun, et al.
Published: (2025)
by: Zhang, Chaoyun, et al.
Published: (2025)
LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
by: Wu, Yubin, et al.
Published: (2026)
by: Wu, Yubin, et al.
Published: (2026)
GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
by: Liu, Tao, et al.
Published: (2025)
by: Liu, Tao, et al.
Published: (2025)
Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model
by: Cai, Mingshu, et al.
Published: (2025)
by: Cai, Mingshu, et al.
Published: (2025)
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
by: Gao, Longxi, et al.
Published: (2025)
by: Gao, Longxi, et al.
Published: (2025)
GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
by: Lin, Musen, et al.
Published: (2025)
by: Lin, Musen, et al.
Published: (2025)
Slow Perception: Let's Perceive Geometric Figures Step-by-step
by: Wei, Haoran, et al.
Published: (2024)
by: Wei, Haoran, et al.
Published: (2024)
Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning
by: Jiang, Zuoyou, et al.
Published: (2025)
by: Jiang, Zuoyou, et al.
Published: (2025)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)
by: Wu, Qianhui, et al.
Published: (2025)
Restoring Real-World Degraded Events Improves Deblurring Quality
by: Shen, Yeqing, et al.
Published: (2024)
by: Shen, Yeqing, et al.
Published: (2024)
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
by: Zhang, Zhong, et al.
Published: (2025)
by: Zhang, Zhong, et al.
Published: (2025)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
by: Hu, Junan, et al.
Published: (2026)
by: Hu, Junan, et al.
Published: (2026)
A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
by: Li, Jiahao, et al.
Published: (2025)
by: Li, Jiahao, et al.
Published: (2025)
Where, What, Why: Toward Explainable 3D-GS Watermarking
by: Cai, Mingshu, et al.
Published: (2026)
by: Cai, Mingshu, et al.
Published: (2026)
ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
by: Jia, Haitao, et al.
Published: (2025)
by: Jia, Haitao, et al.
Published: (2025)
GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
by: Feng, Liang, et al.
Published: (2024)
by: Feng, Liang, et al.
Published: (2024)
Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments
by: Zhang, Yitong, et al.
Published: (2025)
by: Zhang, Yitong, et al.
Published: (2025)
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
by: Liu, Jihao, et al.
Published: (2024)
by: Liu, Jihao, et al.
Published: (2024)
GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
by: Zhao, Kangjia, et al.
Published: (2024)
by: Zhao, Kangjia, et al.
Published: (2024)
Turning the Ratchet: Dynamic Screening with Multiple Agents
by: Ekmekci, Mehmet, et al.
Published: (2024)
by: Ekmekci, Mehmet, et al.
Published: (2024)
Similar Items
-
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
by: Yan, Haolong, et al.
Published: (2025) -
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025) -
GUI Agents for Continual Game Generation
by: Huang, Yixu, et al.
Published: (2026) -
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
by: Sun, Yuchen, et al.
Published: (2025) -
UIPro: Unleashing Superior Interaction Capability For GUI Agents
by: Li, Hongxin, et al.
Published: (2025)