:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Sen, Zhao, Tong, Bin, Yi, Ma, Fei, Shao, Wenqi, Wang, Zheng
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2511.16590
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
by: Yang, Jingqi, et al.
Published: (2025)

MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment
by: Wu, Qinzhuo, et al.
Published: (2026)

WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
by: Li, Jinchao, et al.
Published: (2026)

MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2025)

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction
by: Bian, Haonan, et al.
Published: (2026)

Text2World: Benchmarking Large Language Models for Symbolic World Model Generation
by: Hu, Mengkang, et al.
Published: (2025)

Adaptive Milestone Reward for GUI Agents
by: Zheng, Congmin, et al.
Published: (2026)

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
by: Sun, Liangtai, et al.
Published: (2022)

A Survey on (M)LLM-Based GUI Agents
by: Tang, Fei, et al.
Published: (2025)

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
by: Tang, Fei, et al.
Published: (2026)

RISK: A Framework for GUI Agents in E-commerce Risk Management
by: Chen, Renqi, et al.
Published: (2025)

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners
by: Liu, Yuhang, et al.
Published: (2025)

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts
by: Qing, Chenxi, et al.
Published: (2026)

UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions
by: Han, Wenkang, et al.
Published: (2025)

MCPVerse: An Expansive, Real-World Benchmark for Agentic Tool Use
by: Lei, Fei, et al.
Published: (2025)

Co-EPG: A Framework for Co-Evolution of Planning and Grounding in Autonomous GUI Agents
by: Zhao, Yuan, et al.
Published: (2025)

Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents
by: Men, Tianyi, et al.
Published: (2025)

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
by: Yang, Rui, et al.
Published: (2026)

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
by: Dong, Guanting, et al.
Published: (2026)

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
by: Gou, Boyu, et al.
Published: (2024)

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
by: Wu, Qianhui, et al.
Published: (2025)

Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization
by: Chi, Yizhe, et al.
Published: (2026)

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
by: Xu, Haiyang, et al.
Published: (2026)

LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability
by: Xiao, Zikai, et al.
Published: (2025)

Retrieval-augmented GUI Agents with Generative Guidelines
by: Xu, Ran, et al.
Published: (2025)

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
by: Hu, Mengkang, et al.
Published: (2024)

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
by: Yao, Shunyu, et al.
Published: (2024)

SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding
by: Hou, Shuyang, et al.
Published: (2026)

ProgRM: Build Better GUI Agents with Progress Rewards
by: Zhang, Danyang, et al.
Published: (2025)

A Proactive Multi-Agent Dialogue Framework for Assessing Social Language Disorder Traits in Autism
by: Hu, Chuanbo, et al.
Published: (2026)

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
by: Yan, Weixiang, et al.
Published: (2024)

Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors
by: Zhao, Yi, et al.
Published: (2025)

BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
by: Wu, Qinzhuo, et al.
Published: (2025)

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
by: Zhang, Zehua, et al.
Published: (2025)

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
by: Han, Qijun, et al.
Published: (2026)

PLawBench: A Rubric-Based Benchmark for Evaluating LLMs in Real-World Legal Practice
by: Shi, Yuzhen, et al.
Published: (2026)

RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking
by: Yang, Shuo, et al.
Published: (2025)

InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
by: Liu, Yuhang, et al.
Published: (2025)

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems
by: Tang, Fei, et al.
Published: (2025)