:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Renqi, Tao, Zeyin, Guo, Jianming, Wang, Jing, Xu, Zezhou, Zhu, Jingzhe, Sun, Qingqing, Zhang, Tianyi, Chen, Shuai
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2604.13531
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RISK: A Framework for GUI Agents in E-commerce Risk Management
by: Chen, Renqi, et al.
Published: (2025)

InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026)

StressWeb: A Diagnostic Benchmark for Web Agent Robustness under Realistic Interaction Variability
by: Bai, Haoyue, et al.
Published: (2026)

Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents
by: Zhang, Boxuan, et al.
Published: (2025)

Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks
by: Jang, Lawrence Keunho, et al.
Published: (2026)

RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents
by: Yang, Jingyi, et al.
Published: (2025)

EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents
by: Mo, Ying, et al.
Published: (2026)

OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments
by: Henry, Felix, et al.
Published: (2026)

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
by: Sun, Liangtai, et al.
Published: (2022)

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)

MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents
by: Chen, Ruihan, et al.
Published: (2025)

X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System
by: Wang, Peng, et al.
Published: (2025)

A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains
by: Zhang, Xianren, et al.
Published: (2025)

Integrating Building Thermal Flexibility Into Distribution System: A Privacy-Preserved Dispatch Approach
by: Lu, Shuai, et al.
Published: (2025)

Temac: Multi-Agent Collaboration for Automated Web GUI Testing
by: Liu, Chenxu, et al.
Published: (2025)

MGA: Memory-Driven GUI Agent for Observation-Centric Interaction
by: Cheng, Weihua, et al.
Published: (2025)

CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
by: Nong, Songqin, et al.
Published: (2025)

EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments
by: Liu, Zefang, et al.
Published: (2025)

macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
by: Yang, Pei, et al.
Published: (2025)

WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)

From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation
by: Wang, Zezhou, et al.
Published: (2026)

UIPro: Unleashing Superior Interaction Capability For GUI Agents
by: Li, Hongxin, et al.
Published: (2025)

NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation
by: Chen, Renqi, et al.
Published: (2025)

GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
by: Yang, Jingqi, et al.
Published: (2025)

D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies
by: Chen, Sen, et al.
Published: (2025)

ScaleTrack: Scaling and back-tracking Automated GUI Agents
by: Huang, Jing, et al.
Published: (2025)

Continual GUI Agents
by: Liu, Ziwei, et al.
Published: (2026)

MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
by: Yang, Xiao, et al.
Published: (2025)

GUI-PRA: Process Reward Agent for GUI Tasks
by: Xiong, Tao, et al.
Published: (2025)

Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
by: McGregor, Sean, et al.
Published: (2025)

Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
by: Hua, Wenyue, et al.
Published: (2026)

Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing
by: Zhang, Shuning, et al.
Published: (2025)

GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
by: Zhao, Kangjia, et al.
Published: (2024)

LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
by: Liu, Guangyi, et al.
Published: (2025)

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation
by: Feng, Yushi, et al.
Published: (2026)

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
by: Liu, Guangyi, et al.
Published: (2026)

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
by: Savadikar, Chinmay, et al.
Published: (2026)

Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments
by: Zhang, Yitong, et al.
Published: (2025)

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
by: Shao, Shuai, et al.
Published: (2025)

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
by: Tang, Jiaqi, et al.
Published: (2025)