Saved in:
| Main Authors: | Chen, Renqi, Tao, Zeyin, Guo, Jianming, Wang, Jing, Xu, Zezhou, Zhu, Jingzhe, Sun, Qingqing, Zhang, Tianyi, Chen, Shuai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.13531 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RISK: A Framework for GUI Agents in E-commerce Risk Management
by: Chen, Renqi, et al.
Published: (2025)
by: Chen, Renqi, et al.
Published: (2025)
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026)
by: Zhang, Ziyun, et al.
Published: (2026)
StressWeb: A Diagnostic Benchmark for Web Agent Robustness under Realistic Interaction Variability
by: Bai, Haoyue, et al.
Published: (2026)
by: Bai, Haoyue, et al.
Published: (2026)
Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents
by: Zhang, Boxuan, et al.
Published: (2025)
by: Zhang, Boxuan, et al.
Published: (2025)
Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks
by: Jang, Lawrence Keunho, et al.
Published: (2026)
by: Jang, Lawrence Keunho, et al.
Published: (2026)
RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents
by: Yang, Jingyi, et al.
Published: (2025)
by: Yang, Jingyi, et al.
Published: (2025)
EntWorld: A Holistic Environment and Benchmark for Verifiable Enterprise GUI Agents
by: Mo, Ying, et al.
Published: (2026)
by: Mo, Ying, et al.
Published: (2026)
OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments
by: Henry, Felix, et al.
Published: (2026)
by: Henry, Felix, et al.
Published: (2026)
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI
by: Sun, Liangtai, et al.
Published: (2022)
by: Sun, Liangtai, et al.
Published: (2022)
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
by: Chen, Dongping, et al.
Published: (2024)
by: Chen, Dongping, et al.
Published: (2024)
MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents
by: Chen, Ruihan, et al.
Published: (2025)
by: Chen, Ruihan, et al.
Published: (2025)
X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System
by: Wang, Peng, et al.
Published: (2025)
by: Wang, Peng, et al.
Published: (2025)
A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains
by: Zhang, Xianren, et al.
Published: (2025)
by: Zhang, Xianren, et al.
Published: (2025)
Integrating Building Thermal Flexibility Into Distribution System: A Privacy-Preserved Dispatch Approach
by: Lu, Shuai, et al.
Published: (2025)
by: Lu, Shuai, et al.
Published: (2025)
Temac: Multi-Agent Collaboration for Automated Web GUI Testing
by: Liu, Chenxu, et al.
Published: (2025)
by: Liu, Chenxu, et al.
Published: (2025)
MGA: Memory-Driven GUI Agent for Observation-Centric Interaction
by: Cheng, Weihua, et al.
Published: (2025)
by: Cheng, Weihua, et al.
Published: (2025)
CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
by: Nong, Songqin, et al.
Published: (2025)
by: Nong, Songqin, et al.
Published: (2025)
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments
by: Liu, Zefang, et al.
Published: (2025)
by: Liu, Zefang, et al.
Published: (2025)
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
WinDeskGround: A Benchmark for Robust GUI Grounding in Complex Multi-Window Desktop Environments
by: Zhao, Haoren, et al.
Published: (2026)
by: Zhao, Haoren, et al.
Published: (2026)
From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation
by: Wang, Zezhou, et al.
Published: (2026)
by: Wang, Zezhou, et al.
Published: (2026)
UIPro: Unleashing Superior Interaction Capability For GUI Agents
by: Li, Hongxin, et al.
Published: (2025)
by: Li, Hongxin, et al.
Published: (2025)
NAS-LoRA: Empowering Parameter-Efficient Fine-Tuning for Visual Foundation Models with Searchable Adaptation
by: Chen, Renqi, et al.
Published: (2025)
by: Chen, Renqi, et al.
Published: (2025)
GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies
by: Yang, Jingqi, et al.
Published: (2025)
by: Yang, Jingqi, et al.
Published: (2025)
D-GARA: A Dynamic Benchmarking Framework for GUI Agent Robustness in Real-World Anomalies
by: Chen, Sen, et al.
Published: (2025)
by: Chen, Sen, et al.
Published: (2025)
ScaleTrack: Scaling and back-tracking Automated GUI Agents
by: Huang, Jing, et al.
Published: (2025)
by: Huang, Jing, et al.
Published: (2025)
Continual GUI Agents
by: Liu, Ziwei, et al.
Published: (2026)
by: Liu, Ziwei, et al.
Published: (2026)
MLA-Trust: Benchmarking Trustworthiness of Multimodal LLM Agents in GUI Environments
by: Yang, Xiao, et al.
Published: (2025)
by: Yang, Xiao, et al.
Published: (2025)
GUI-PRA: Process Reward Agent for GUI Tasks
by: Xiong, Tao, et al.
Published: (2025)
by: Xiong, Tao, et al.
Published: (2025)
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
by: McGregor, Sean, et al.
Published: (2025)
by: McGregor, Sean, et al.
Published: (2025)
Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
by: Hua, Wenyue, et al.
Published: (2026)
by: Hua, Wenyue, et al.
Published: (2026)
Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing
by: Zhang, Shuning, et al.
Published: (2025)
by: Zhang, Shuning, et al.
Published: (2025)
GUI Testing Arena: A Unified Benchmark for Advancing Autonomous GUI Testing Agent
by: Zhao, Kangjia, et al.
Published: (2024)
by: Zhao, Kangjia, et al.
Published: (2024)
LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark
by: Liu, Guangyi, et al.
Published: (2025)
by: Liu, Guangyi, et al.
Published: (2025)
CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation
by: Feng, Yushi, et al.
Published: (2026)
by: Feng, Yushi, et al.
Published: (2026)
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
by: Liu, Guangyi, et al.
Published: (2026)
by: Liu, Guangyi, et al.
Published: (2026)
ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
by: Savadikar, Chinmay, et al.
Published: (2026)
by: Savadikar, Chinmay, et al.
Published: (2026)
Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments
by: Zhang, Yitong, et al.
Published: (2025)
by: Zhang, Yitong, et al.
Published: (2025)
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
by: Shao, Shuai, et al.
Published: (2025)
by: Shao, Shuai, et al.
Published: (2025)
LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
by: Tang, Jiaqi, et al.
Published: (2025)
by: Tang, Jiaqi, et al.
Published: (2025)
Similar Items
-
RISK: A Framework for GUI Agents in E-commerce Risk Management
by: Chen, Renqi, et al.
Published: (2025) -
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
by: Zhang, Ziyun, et al.
Published: (2026) -
StressWeb: A Diagnostic Benchmark for Web Agent Robustness under Realistic Interaction Variability
by: Bai, Haoyue, et al.
Published: (2026) -
Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents
by: Zhang, Boxuan, et al.
Published: (2025) -
Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks
by: Jang, Lawrence Keunho, et al.
Published: (2026)