:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jia, Hangyi, Qian, Yuxi, Tong, Hanwen, Wu, Xinhui, Chen, Lin, Wei, Feng
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.09321
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search
by: Liang, Zujie, et al.
Published: (2025)

The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
by: Chung, Jae-Won, et al.
Published: (2025)

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)

WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
by: Liu, Chenxu, et al.
Published: (2026)

WIPI: A New Web Threat for LLM-Driven Web Agents
by: Wu, Fangzhou, et al.
Published: (2024)

Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
by: Shen, Judy Hanwen, et al.
Published: (2024)

DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
by: Huang, Enhao, et al.
Published: (2025)

Towards Stepwise Domain Knowledge-Driven Reasoning Optimization and Reflection Improvement
by: Liu, Chengyuan, et al.
Published: (2025)

Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion
by: Li, Minghan, et al.
Published: (2026)

HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization
by: Chen, Yurun, et al.
Published: (2025)

MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution
by: Sun, Libo, et al.
Published: (2026)

A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains
by: Zhang, Xianren, et al.
Published: (2025)

WebCanvas: Benchmarking Web Agents in Online Environments
by: Pan, Yichen, et al.
Published: (2024)

A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
by: Ning, Liangbo, et al.
Published: (2025)

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
by: Liu, Shuochen, et al.
Published: (2026)

Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption
by: Krupp, Lars, et al.
Published: (2025)

Mango: Multi-Agent Web Navigation via Global-View Optimization
by: Tong, Weixi, et al.
Published: (2026)

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents
by: Guo, JunJia, et al.
Published: (2026)

Measuring Information Distortion in Hierarchical Ultra long Novel Reconstruction:The Optimal Expansion Ratio
by: Shen, Hanwen, et al.
Published: (2025)

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
by: Liu, Yinuo, et al.
Published: (2025)

Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
by: Xu, Kai, et al.
Published: (2025)

ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
by: Levy, Ido, et al.
Published: (2024)

AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents
by: Feng, Zhaopeng, et al.
Published: (2026)

Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios
by: Xia, Defei, et al.
Published: (2026)

Constructing Industrial-Scale Optimization Modeling Benchmark
by: Li, Zhong, et al.
Published: (2026)

Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs
by: Chen, Yurun, et al.
Published: (2025)

Code Driven Planning with Domain-Adaptive Critic
by: Tian, Zikang, et al.
Published: (2025)

DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
by: Zhu, Qiming, et al.
Published: (2024)

RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)

ReCreate: Reasoning and Creating Domain Agents Driven by Experience
by: Hao, Zhezheng, et al.
Published: (2026)

Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization
by: Xiao, Quanjia, et al.
Published: (2026)

Web Fraud Attacks Against LLM-Driven Multi-Agent Systems
by: Kong, Dezhang, et al.
Published: (2025)

Cognitive Duality for Adaptive Web Agents
by: Liu, Jiarun, et al.
Published: (2025)

IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining
by: Feng, Dawei, et al.
Published: (2024)

StressWeb: A Diagnostic Benchmark for Web Agent Robustness under Realistic Interaction Variability
by: Bai, Haoyue, et al.
Published: (2026)

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
by: Yu, Shoubin, et al.
Published: (2026)

BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction
by: Xia, Tian, et al.
Published: (2025)

SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs
by: Ye, Anbang, et al.
Published: (2025)

Bilateral Trade Under Heavy-Tailed Valuations: Minimax Regret with Infinite Variance
by: Zhao, Hangyi
Published: (2026)

Insider Purchase Signals in Microcap Equities: Gradient Boosting Detection of Abnormal Returns
by: Zhao, Hangyi
Published: (2026)