:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yan, Haolong, Shen, Yeqing, Huang, Xin, Wang, Jia, Tan, Kaijun, Liang, Zhixuan, Li, Hongxin, Ge, Zheng, Yoshie, Osamu, Li, Si, Zhang, Xiangyu, Jiang, Daxin
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.02423
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
by: Yan, Haolong, et al.
Published: (2025)

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025)

GUI Agents for Continual Game Generation
by: Huang, Yixu, et al.
Published: (2026)

GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
by: Sun, Yuchen, et al.
Published: (2025)

UIPro: Unleashing Superior Interaction Capability For GUI Agents
by: Li, Hongxin, et al.
Published: (2025)

EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration
by: Li, Runze, et al.
Published: (2025)

GEBench: Benchmarking Image Generation Models as GUI Environments
by: Li, Haodong, et al.
Published: (2026)

AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs
by: Li, Hongxin, et al.
Published: (2025)

Step-GUI Technical Report
by: Yan, Haolong, et al.
Published: (2025)

CLIP-driven rain perception: Adaptive deraining with pattern-aware network routing and mask-guided cross-attention
by: Guan, Cong, et al.
Published: (2025)

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent
by: Xie, Bin, et al.
Published: (2025)

GoClick: Lightweight Element Grounding Model for Autonomous GUI Interaction
by: Li, Hongxin, et al.
Published: (2026)

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
by: Fan, Yue, et al.
Published: (2025)

AutoGUI-v2: A Comprehensive Multi-Modal GUI Functionality Understanding Benchmark
by: Li, Hongxin, et al.
Published: (2026)

Chain-of-Memory: Enhancing GUI Agents for Cross-Application Navigation
by: Gao, Xinzge, et al.
Published: (2025)

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)

CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
by: Nong, Songqin, et al.
Published: (2025)

TinyClick: Single-Turn Agent for Empowering GUI Automation
by: Pawlowski, Pawel, et al.
Published: (2024)

FluencyVE: Marrying Temporal-Aware Mamba with Bypass Attention for Video Editing
by: Cai, Mingshu, et al.
Published: (2025)

API Agents vs. GUI Agents: Divergence and Convergence
by: Zhang, Chaoyun, et al.
Published: (2025)

LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
by: Wu, Yubin, et al.
Published: (2026)

GUI-Rise: Structured Reasoning and History Summarization for GUI Navigation
by: Liu, Tao, et al.
Published: (2025)

Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model
by: Cai, Mingshu, et al.
Published: (2025)

GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
by: Gao, Longxi, et al.
Published: (2025)

GUI-ReWalk: Massive Data Generation for GUI Agent via Stochastic Exploration and Intent-Aware Reasoning
by: Lin, Musen, et al.
Published: (2025)

Slow Perception: Let's Perceive Geometric Figures Step-by-step
by: Wei, Haoran, et al.
Published: (2024)

Alpha-R1: Alpha Screening with LLM Reasoning via Reinforcement Learning
by: Jiang, Zuoyou, et al.
Published: (2025)

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)

Restoring Real-World Degraded Events Improves Deblurring Quality
by: Shen, Yeqing, et al.
Published: (2024)

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
by: Zhang, Zhong, et al.
Published: (2025)

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
by: Hu, Junan, et al.
Published: (2026)

A Survey on GUI Agents with Foundation Models Enhanced by Reinforcement Learning
by: Li, Jiahao, et al.
Published: (2025)

Where, What, Why: Toward Explainable 3D-GS Watermarking
by: Cai, Mingshu, et al.
Published: (2026)

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation
by: Jia, Haitao, et al.
Published: (2025)

GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
by: Feng, Liang, et al.
Published: (2024)

Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments
by: Zhang, Yitong, et al.
Published: (2025)

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
by: Liu, Jihao, et al.
Published: (2024)

GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
by: Zhao, Kangjia, et al.
Published: (2024)

Turning the Ratchet: Dynamic Screening with Multiple Agents
by: Ekmekci, Mehmet, et al.
Published: (2024)