Saved in:
| Main Authors: | Xu, Yibin, Yang, Liang, Chen, Hao, Wang, Hua, Chen, Zhi, Tang, Yaohua |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.11170 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
by: Lu, Songshuo, et al.
Published: (2025)
by: Lu, Songshuo, et al.
Published: (2025)
SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models
by: Cai, Zicheng, et al.
Published: (2025)
by: Cai, Zicheng, et al.
Published: (2025)
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025)
by: Nayak, Shravan, et al.
Published: (2025)
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
by: Lu, Songshuo, et al.
Published: (2024)
by: Lu, Songshuo, et al.
Published: (2024)
LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
by: Wu, Yubin, et al.
Published: (2026)
by: Wu, Yubin, et al.
Published: (2026)
WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)
by: Zhao, Haoren, et al.
Published: (2026)
StableGS: A Floater-Free Framework for 3D Gaussian Splatting
by: Wang, Luchao, et al.
Published: (2025)
by: Wang, Luchao, et al.
Published: (2025)
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining
by: Xiong, Weimin, et al.
Published: (2026)
by: Xiong, Weimin, et al.
Published: (2026)
DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models
by: Chen, Xinlong, et al.
Published: (2026)
by: Chen, Xinlong, et al.
Published: (2026)
History-Aware Reasoning for GUI Agents
by: Wang, Ziwei, et al.
Published: (2025)
by: Wang, Ziwei, et al.
Published: (2025)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
by: Xu, Yiheng, et al.
Published: (2024)
by: Xu, Yiheng, et al.
Published: (2024)
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
by: Luo, Run, et al.
Published: (2025)
by: Luo, Run, et al.
Published: (2025)
Round Attention: A Novel Round-Level Attention Mechanism to Accelerate LLM Inference
by: Tang, Yaohua, et al.
Published: (2025)
by: Tang, Yaohua, et al.
Published: (2025)
Understanding GUI Agent Localization Biases through Logit Sharpness
by: Tao, Xingjian, et al.
Published: (2025)
by: Tao, Xingjian, et al.
Published: (2025)
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
by: Sun, Liangtai, et al.
Published: (2022)
by: Sun, Liangtai, et al.
Published: (2022)
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents
by: Wang, Xuehui, et al.
Published: (2025)
by: Wang, Xuehui, et al.
Published: (2025)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
by: Liu, Yuhang, et al.
Published: (2025)
by: Liu, Yuhang, et al.
Published: (2025)
GUICourse: From General Vision Language Models to Versatile GUI Agents
by: Chen, Wentong, et al.
Published: (2024)
by: Chen, Wentong, et al.
Published: (2024)
OmniParser for Pure Vision Based GUI Agent
by: Lu, Yadong, et al.
Published: (2024)
by: Lu, Yadong, et al.
Published: (2024)
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
by: Wang, Haoming, et al.
Published: (2025)
by: Wang, Haoming, et al.
Published: (2025)
RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events
by: Chen, Zhenyuan, et al.
Published: (2025)
by: Chen, Zhenyuan, et al.
Published: (2025)
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
by: Wu, Zhiyong, et al.
Published: (2024)
by: Wu, Zhiyong, et al.
Published: (2024)
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
UI-E2I-Synth: Advancing GUI Grounding with Large-Scale Instruction Synthesis
by: Liu, Xinyi, et al.
Published: (2025)
by: Liu, Xinyi, et al.
Published: (2025)
Purging the Gray Zone: Latent-Geometric Denoising for Precise Knowledge Boundary Awareness
by: An, Hao, et al.
Published: (2026)
by: An, Hao, et al.
Published: (2026)
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
by: Liu, Yuhang, et al.
Published: (2025)
by: Liu, Yuhang, et al.
Published: (2025)
GUI-CIDER: Mid-training GUI Agents via Causal Internalization and Density-aware Exemplar Reselection
by: Wu, Zheng, et al.
Published: (2026)
by: Wu, Zheng, et al.
Published: (2026)
Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents
by: Bu, Tianpeng, et al.
Published: (2026)
by: Bu, Tianpeng, et al.
Published: (2026)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
by: Lin, Kevin Qinghong, et al.
Published: (2024)
by: Lin, Kevin Qinghong, et al.
Published: (2024)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)
by: Wu, Qianhui, et al.
Published: (2025)
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)
by: Liu, Zheng, et al.
Published: (2024)
A Prompt-driven Task Planning Method for Multi-drones based on Large Language Model
by: Liu, Yaohua
Published: (2024)
by: Liu, Yaohua
Published: (2024)
A Survey on (M)LLM-Based GUI Agents
by: Tang, Fei, et al.
Published: (2025)
by: Tang, Fei, et al.
Published: (2025)
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
by: Xu, Haiyang, et al.
Published: (2026)
by: Xu, Haiyang, et al.
Published: (2026)
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
by: Han, Feng, et al.
Published: (2026)
by: Han, Feng, et al.
Published: (2026)
LLMsPark: A Benchmark for Evaluating Large Language Models in Strategic Gaming Contexts
by: Chen, Junhao, et al.
Published: (2025)
by: Chen, Junhao, et al.
Published: (2025)
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)
by: Tang, Fei, et al.
Published: (2026)
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
by: Zhang, Jiwen, et al.
Published: (2024)
by: Zhang, Jiwen, et al.
Published: (2024)
Similar Items
-
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
by: Lu, Songshuo, et al.
Published: (2025) -
SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models
by: Cai, Zicheng, et al.
Published: (2025) -
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
by: Nayak, Shravan, et al.
Published: (2025) -
TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
by: Lu, Songshuo, et al.
Published: (2024) -
LiteGUI: Distilling Compact GUI Agents with Reinforcement Learning
by: Wu, Yubin, et al.
Published: (2026)