:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Yibin, Yang, Liang, Chen, Hao, Wang, Hua, Chen, Zhi, Tang, Yaohua
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.11170
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
by: Lu, Songshuo, et al.
Published: (2025)

SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models
by: Cai, Zicheng, et al.
Published: (2025)

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025)

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
by: Lu, Songshuo, et al.
Published: (2024)

LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
by: Wu, Yubin, et al.
Published: (2026)

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)

StableGS: A Floater-Free Framework for 3D Gaussian Splatting
by: Wang, Luchao, et al.
Published: (2025)

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining
by: Xiong, Weimin, et al.
Published: (2026)

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models
by: Chen, Xinlong, et al.
Published: (2026)

History-Aware Reasoning for GUI Agents
by: Wang, Ziwei, et al.
Published: (2025)

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
by: Xu, Yiheng, et al.
Published: (2024)

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
by: Luo, Run, et al.
Published: (2025)

Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference
by: Tang, Yaohua, et al.
Published: (2025)

Understanding GUI Agent Localization Biases through Logit Sharpness
by: Tao, Xingjian, et al.
Published: (2025)

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
by: Sun, Liangtai, et al.
Published: (2022)

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
by: Liu, Yuhang, et al.
Published: (2025)

GUICourse: From General Vision Language Models to Versatile GUI Agents
by: Chen, Wentong, et al.
Published: (2024)

OmniParser for Pure Vision Based GUI Agent
by: Lu, Yadong, et al.
Published: (2024)

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
by: Yang, Rui, et al.
Published: (2026)

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025)

RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
by: Chen, Zhenyuan, et al.
Published: (2025)

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
by: Wu, Zhiyong, et al.
Published: (2024)

VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
by: Li, Lei, et al.
Published: (2024)

UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis
by: Liu, Xinyi, et al.
Published: (2025)

Purging the Gray Zone: Latent-Geometric Denoising for Precise Knowledge Boundary Awareness
by: An, Hao, et al.
Published: (2026)

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
by: Liu, Yuhang, et al.
Published: (2025)

GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)

Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents
by: Bu, Tianpeng, et al.
Published: (2026)

ShowUI: One Vision-Language-Action Model for GUI Visual Agent
by: Lin, Kevin Qinghong, et al.
Published: (2024)

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)

SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)

A Prompt-driven Task Planning Method for Multi-drones based on Large Language Model
by: Liu, Yaohua
Published: (2024)

A Survey on (M)LLM-Based GUI Agents
by: Tang, Fei, et al.
Published: (2025)

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
by: Xu, Haiyang, et al.
Published: (2026)

LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
by: Han, Feng, et al.
Published: (2026)

LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts
by: Chen, Junhao, et al.
Published: (2025)

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)

Android in the Zoo: Chain-of-Action-Thought for GUI Agents
by: Zhang, Jiwen, et al.
Published: (2024)