Saved in:
| Main Authors: | Chen, Sen, Zhao, Tong, Bin, Yi, Ma, Fei, Shao, Wenqi, Wang, Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.16590 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
by: Yang, Jingqi, et al.
Published: (2025)
by: Yang, Jingqi, et al.
Published: (2025)
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment
by: Wu, Qinzhuo, et al.
Published: (2026)
by: Wu, Qinzhuo, et al.
Published: (2026)
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
by: Li, Jinchao, et al.
Published: (2026)
by: Li, Jinchao, et al.
Published: (2026)
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2025)
by: Tang, Xiangru, et al.
Published: (2025)
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction
by: Bian, Haonan, et al.
Published: (2026)
by: Bian, Haonan, et al.
Published: (2026)
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
by: Hu, Mengkang, et al.
Published: (2025)
by: Hu, Mengkang, et al.
Published: (2025)
Adaptive Milestone Reward for GUI Agents
by: Zheng, Congmin, et al.
Published: (2026)
by: Zheng, Congmin, et al.
Published: (2026)
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
by: Sun, Liangtai, et al.
Published: (2022)
by: Sun, Liangtai, et al.
Published: (2022)
A Survey on (M)LLM-Based GUI Agents
by: Tang, Fei, et al.
Published: (2025)
by: Tang, Fei, et al.
Published: (2025)
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)
by: Tang, Fei, et al.
Published: (2026)
RISK: A Framework for GUI Agents in E-commerce Risk Management
by: Chen, Renqi, et al.
Published: (2025)
by: Chen, Renqi, et al.
Published: (2025)
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
by: Liu, Yuhang, et al.
Published: (2025)
by: Liu, Yuhang, et al.
Published: (2025)
C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts
by: Qing, Chenxi, et al.
Published: (2026)
by: Qing, Chenxi, et al.
Published: (2026)
UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions
by: Han, Wenkang, et al.
Published: (2025)
by: Han, Wenkang, et al.
Published: (2025)
MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use
by: Lei, Fei, et al.
Published: (2025)
by: Lei, Fei, et al.
Published: (2025)
Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents
by: Zhao, Yuan, et al.
Published: (2025)
by: Zhao, Yuan, et al.
Published: (2025)
Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
by: Men, Tianyi, et al.
Published: (2025)
by: Men, Tianyi, et al.
Published: (2025)
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
by: Dong, Guanting, et al.
Published: (2026)
by: Dong, Guanting, et al.
Published: (2026)
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
by: Gou, Boyu, et al.
Published: (2024)
by: Gou, Boyu, et al.
Published: (2024)
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)
by: Wu, Qianhui, et al.
Published: (2025)
Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization
by: Chi, Yizhe, et al.
Published: (2026)
by: Chi, Yizhe, et al.
Published: (2026)
Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
by: Xu, Haiyang, et al.
Published: (2026)
by: Xu, Haiyang, et al.
Published: (2026)
LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability
by: Xiao, Zikai, et al.
Published: (2025)
by: Xiao, Zikai, et al.
Published: (2025)
Retrieval-augmented GUI Agents with Generative Guidelines
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
by: Hu, Mengkang, et al.
Published: (2024)
by: Hu, Mengkang, et al.
Published: (2024)
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
by: Yao, Shunyu, et al.
Published: (2024)
by: Yao, Shunyu, et al.
Published: (2024)
SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding
by: Hou, Shuyang, et al.
Published: (2026)
by: Hou, Shuyang, et al.
Published: (2026)
ProgRM: Build Better GUI Agents with Progress Rewards
by: Zhang, Danyang, et al.
Published: (2025)
by: Zhang, Danyang, et al.
Published: (2025)
A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism
by: Hu, Chuanbo, et al.
Published: (2026)
by: Hu, Chuanbo, et al.
Published: (2026)
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
by: Yan, Weixiang, et al.
Published: (2024)
by: Yan, Weixiang, et al.
Published: (2024)
Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors
by: Zhao, Yi, et al.
Published: (2025)
by: Zhao, Yi, et al.
Published: (2025)
BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
by: Wu, Qinzhuo, et al.
Published: (2025)
by: Wu, Qinzhuo, et al.
Published: (2025)
BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
by: Zhang, Zehua, et al.
Published: (2025)
by: Zhang, Zehua, et al.
Published: (2025)
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
by: Han, Qijun, et al.
Published: (2026)
by: Han, Qijun, et al.
Published: (2026)
PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice
by: Shi, Yuzhen, et al.
Published: (2026)
by: Shi, Yuzhen, et al.
Published: (2026)
RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
by: Liu, Yuhang, et al.
Published: (2025)
by: Liu, Yuhang, et al.
Published: (2025)
Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
by: Tang, Fei, et al.
Published: (2025)
by: Tang, Fei, et al.
Published: (2025)
Similar Items
-
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
by: Yang, Jingqi, et al.
Published: (2025) -
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment
by: Wu, Qinzhuo, et al.
Published: (2026) -
WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
by: Li, Jinchao, et al.
Published: (2026) -
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2025) -
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)