Saved in:
| Main Authors: | Xue, Tianci, Liao, Zeyi, Shi, Tianneng, Wang, Zilu, Zhang, Kai, Song, Dawn, Su, Yu, Sun, Huan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10356 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Illusion of Progress? Assessing the Current State of Web Agents
by: Xue, Tianci, et al.
Published: (2025)
by: Xue, Tianci, et al.
Published: (2025)
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
by: Liao, Zeyi, et al.
Published: (2025)
by: Liao, Zeyi, et al.
Published: (2025)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
by: Cai, Will, et al.
Published: (2025)
by: Cai, Will, et al.
Published: (2025)
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
by: Liao, Zeyi, et al.
Published: (2024)
by: Liao, Zeyi, et al.
Published: (2024)
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
by: Mo, Lingbo, et al.
Published: (2024)
by: Mo, Lingbo, et al.
Published: (2024)
WebGuard: Building a Generalizable Guardrail for Web Agents
by: Zheng, Boyuan, et al.
Published: (2025)
by: Zheng, Boyuan, et al.
Published: (2025)
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
by: Sun, Zeyi, et al.
Published: (2025)
by: Sun, Zeyi, et al.
Published: (2025)
When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents
by: Jones, Jaylen, et al.
Published: (2026)
by: Jones, Jaylen, et al.
Published: (2026)
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents
by: Xie, Jingxu, et al.
Published: (2025)
by: Xie, Jingxu, et al.
Published: (2025)
Improving LLM Safety Alignment with Dual-Objective Optimization
by: Zhao, Xuandong, et al.
Published: (2025)
by: Zhao, Xuandong, et al.
Published: (2025)
Can LLMs Ask Good Questions?
by: Zhang, Yueheng, et al.
Published: (2025)
by: Zhang, Yueheng, et al.
Published: (2025)
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks
by: Xie, Jian, et al.
Published: (2026)
by: Xie, Jian, et al.
Published: (2026)
AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024)
by: Kumar, Vishal, et al.
Published: (2024)
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
by: Xu, Chejian, et al.
Published: (2024)
by: Xu, Chejian, et al.
Published: (2024)
AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)
by: Li, Yifei, et al.
Published: (2024)
AGENTCL: Toward Rigorous Evaluation of Continual Learning in Language Agents
by: Shu, Yiheng, et al.
Published: (2026)
by: Shu, Yiheng, et al.
Published: (2026)
Agent Learning via Early Experience
by: Zhang, Kai, et al.
Published: (2025)
by: Zhang, Kai, et al.
Published: (2025)
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
by: Gou, Boyu, et al.
Published: (2025)
by: Gou, Boyu, et al.
Published: (2025)
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
by: Wang, Zhun, et al.
Published: (2025)
by: Wang, Zhun, et al.
Published: (2025)
SafePred: A Predictive Guardrail for Computer-Using Agents via World Models
by: Chen, Yurun, et al.
Published: (2026)
by: Chen, Yurun, et al.
Published: (2026)
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
by: Liao, Zeyi, et al.
Published: (2024)
by: Liao, Zeyi, et al.
Published: (2024)
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
by: Chen, Ziru, et al.
Published: (2024)
by: Chen, Ziru, et al.
Published: (2024)
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
by: Ning, Yuting, et al.
Published: (2026)
by: Ning, Yuting, et al.
Published: (2026)
Progent: Securing AI Agents with Privilege Control
by: Shi, Tianneng, et al.
Published: (2025)
by: Shi, Tianneng, et al.
Published: (2025)
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
by: Liu, Jiayu, et al.
Published: (2025)
by: Liu, Jiayu, et al.
Published: (2025)
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
by: Li, Xu, et al.
Published: (2026)
by: Li, Xu, et al.
Published: (2026)
Video-Based Reward Modeling for Computer-Use Agents
by: Song, Linxin, et al.
Published: (2026)
by: Song, Linxin, et al.
Published: (2026)
Ask Now, Use Later: Benchmarking the Proactivity Gap in Long-Lived LLM Agents
by: Wu, Bin, et al.
Published: (2026)
by: Wu, Bin, et al.
Published: (2026)
Multi-Agent Computer Use
by: Koh, Jing Yu, et al.
Published: (2026)
by: Koh, Jing Yu, et al.
Published: (2026)
DeServe: Towards Affordable Offline LLM Inference via Decentralization
by: Wu, Linyu, et al.
Published: (2025)
by: Wu, Linyu, et al.
Published: (2025)
DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction
by: Inoue, Yoshitaka, et al.
Published: (2024)
by: Inoue, Yoshitaka, et al.
Published: (2024)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation
by: Kapoor, Sayash, et al.
Published: (2025)
by: Kapoor, Sayash, et al.
Published: (2025)
Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
HeartAgent: An Autonomous Agent System for Explainable Differential Diagnosis in Cardiology
by: Zhou, Shuang, et al.
Published: (2026)
by: Zhou, Shuang, et al.
Published: (2026)
MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
by: Kong, Quyu, et al.
Published: (2025)
by: Kong, Quyu, et al.
Published: (2025)
Mistake Notebook Learning: Batch-Clustered Failures for Training-Free Agent Adaptation
by: Su, Xuanbo, et al.
Published: (2025)
by: Su, Xuanbo, et al.
Published: (2025)
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
by: Zhang, Kai, et al.
Published: (2023)
by: Zhang, Kai, et al.
Published: (2023)
AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization
by: Liao, Yusheng, et al.
Published: (2026)
by: Liao, Yusheng, et al.
Published: (2026)
MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment
by: Shi, Yucheng, et al.
Published: (2025)
by: Shi, Yucheng, et al.
Published: (2025)
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
by: Zhang, Jiayi, et al.
Published: (2025)
by: Zhang, Jiayi, et al.
Published: (2025)
Similar Items
-
An Illusion of Progress? Assessing the Current State of Web Agents
by: Xue, Tianci, et al.
Published: (2025) -
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
by: Liao, Zeyi, et al.
Published: (2025) -
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
by: Cai, Will, et al.
Published: (2025) -
AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
by: Liao, Zeyi, et al.
Published: (2024) -
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
by: Mo, Lingbo, et al.
Published: (2024)