Saved in:
| Main Authors: | Jia, Hangyi, Qian, Yuxi, Tong, Hanwen, Wu, Xinhui, Chen, Lin, Wei, Feng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.09321 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search
by: Liang, Zujie, et al.
Published: (2025)
by: Liang, Zujie, et al.
Published: (2025)
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
by: Chung, Jae-Won, et al.
Published: (2025)
by: Chung, Jae-Won, et al.
Published: (2025)
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026)
by: Yang, Rui, et al.
Published: (2026)
WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
by: Liu, Chenxu, et al.
Published: (2026)
by: Liu, Chenxu, et al.
Published: (2026)
WIPI: A New Web Threat for LLM-Driven Web Agents
by: Wu, Fangzhou, et al.
Published: (2024)
by: Wu, Fangzhou, et al.
Published: (2024)
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
by: Shen, Judy Hanwen, et al.
Published: (2024)
by: Shen, Judy Hanwen, et al.
Published: (2024)
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain
by: Huang, Enhao, et al.
Published: (2025)
by: Huang, Enhao, et al.
Published: (2025)
Towards Stepwise Domain Knowledge-Driven Reasoning Optimization and Reflection Improvement
by: Liu, Chengyuan, et al.
Published: (2025)
by: Liu, Chengyuan, et al.
Published: (2025)
Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion
by: Li, Minghan, et al.
Published: (2026)
by: Li, Minghan, et al.
Published: (2026)
HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization
by: Chen, Yurun, et al.
Published: (2025)
by: Chen, Yurun, et al.
Published: (2025)
MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution
by: Sun, Libo, et al.
Published: (2026)
by: Sun, Libo, et al.
Published: (2026)
A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains
by: Zhang, Xianren, et al.
Published: (2025)
by: Zhang, Xianren, et al.
Published: (2025)
WebCanvas: Benchmarking Web Agents in Online Environments
by: Pan, Yichen, et al.
Published: (2024)
by: Pan, Yichen, et al.
Published: (2024)
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models
by: Ning, Liangbo, et al.
Published: (2025)
by: Ning, Liangbo, et al.
Published: (2025)
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments
by: Liu, Shuochen, et al.
Published: (2026)
by: Liu, Shuochen, et al.
Published: (2026)
Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption
by: Krupp, Lars, et al.
Published: (2025)
by: Krupp, Lars, et al.
Published: (2025)
Mango: Multi-Agent Web Navigation via Global-View Optimization
by: Tong, Weixi, et al.
Published: (2026)
by: Tong, Weixi, et al.
Published: (2026)
VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents
by: Guo, JunJia, et al.
Published: (2026)
by: Guo, JunJia, et al.
Published: (2026)
Measuring Information Distortion in Hierarchical Ultra long Novel Reconstruction:The Optimal Expansion Ratio
by: Shen, Hanwen, et al.
Published: (2025)
by: Shen, Hanwen, et al.
Published: (2025)
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
by: Liu, Yinuo, et al.
Published: (2025)
by: Liu, Yinuo, et al.
Published: (2025)
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
by: Xu, Kai, et al.
Published: (2025)
by: Xu, Kai, et al.
Published: (2025)
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
by: Levy, Ido, et al.
Published: (2024)
by: Levy, Ido, et al.
Published: (2024)
AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents
by: Feng, Zhaopeng, et al.
Published: (2026)
by: Feng, Zhaopeng, et al.
Published: (2026)
Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios
by: Xia, Defei, et al.
Published: (2026)
by: Xia, Defei, et al.
Published: (2026)
Constructing Industrial-Scale Optimization Modeling Benchmark
by: Li, Zhong, et al.
Published: (2026)
by: Li, Zhong, et al.
Published: (2026)
Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs
by: Chen, Yurun, et al.
Published: (2025)
by: Chen, Yurun, et al.
Published: (2025)
Code Driven Planning with Domain-Adaptive Critic
by: Tian, Zikang, et al.
Published: (2025)
by: Tian, Zikang, et al.
Published: (2025)
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
by: Zhu, Qiming, et al.
Published: (2024)
by: Zhu, Qiming, et al.
Published: (2024)
RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
by: Atinafu, Yonas, et al.
Published: (2026)
by: Atinafu, Yonas, et al.
Published: (2026)
ReCreate: Reasoning and Creating Domain Agents Driven by Experience
by: Hao, Zhezheng, et al.
Published: (2026)
by: Hao, Zhezheng, et al.
Published: (2026)
Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization
by: Xiao, Quanjia, et al.
Published: (2026)
by: Xiao, Quanjia, et al.
Published: (2026)
Web Fraud Attacks Against LLM-Driven Multi-Agent Systems
by: Kong, Dezhang, et al.
Published: (2025)
by: Kong, Dezhang, et al.
Published: (2025)
Cognitive Duality for Adaptive Web Agents
by: Liu, Jiarun, et al.
Published: (2025)
by: Liu, Jiarun, et al.
Published: (2025)
IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining
by: Feng, Dawei, et al.
Published: (2024)
by: Feng, Dawei, et al.
Published: (2024)
StressWeb: A Diagnostic Benchmark for Web Agent Robustness under Realistic Interaction Variability
by: Bai, Haoyue, et al.
Published: (2026)
by: Bai, Haoyue, et al.
Published: (2026)
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
by: Yu, Shoubin, et al.
Published: (2026)
by: Yu, Shoubin, et al.
Published: (2026)
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction
by: Xia, Tian, et al.
Published: (2025)
by: Xia, Tian, et al.
Published: (2025)
SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs
by: Ye, Anbang, et al.
Published: (2025)
by: Ye, Anbang, et al.
Published: (2025)
Bilateral Trade Under Heavy-Tailed Valuations: Minimax Regret with Infinite Variance
by: Zhao, Hangyi
Published: (2026)
by: Zhao, Hangyi
Published: (2026)
Insider Purchase Signals in Microcap Equities: Gradient Boosting Detection of Abnormal Returns
by: Zhao, Hangyi
Published: (2026)
by: Zhao, Hangyi
Published: (2026)
Similar Items
-
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search
by: Liang, Zujie, et al.
Published: (2025) -
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
by: Chung, Jae-Won, et al.
Published: (2025) -
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
by: Yang, Rui, et al.
Published: (2026) -
WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
by: Liu, Chenxu, et al.
Published: (2026) -
WIPI: A New Web Threat for LLM-Driven Web Agents
by: Wu, Fangzhou, et al.
Published: (2024)