Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Yuyu, Yang, Wenjie, Yang, Siyuan, Liu, Ziyang, Chen, Cheng, Wei, Yuan, Hu, Yun, Huang, Yang, Hao, Guoliang, Yuan, Dongsheng, Wang, Jianming, Chen, Xin, Yu, Hang, Lei, Lei, Di, Peng
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.13559
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910178871345152
author	Guo, Yuyu Yang, Wenjie Yang, Siyuan Liu, Ziyang Chen, Cheng Wei, Yuan Hu, Yun Huang, Yang Hao, Guoliang Yuan, Dongsheng Wang, Jianming Chen, Xin Yu, Hang Lei, Lei Di, Peng
author_facet	Guo, Yuyu Yang, Wenjie Yang, Siyuan Liu, Ziyang Chen, Cheng Wei, Yuan Hu, Yun Huang, Yang Hao, Guoliang Yuan, Dongsheng Wang, Jianming Chen, Xin Yu, Hang Lei, Lei Di, Peng
contents	To fulfill user instructions, autonomous web agents must contend with the inherent complexity and volatile nature of real-world websites. Conventional paradigms predominantly rely on Supervised Fine-Tuning (SFT) or Offline Reinforcement Learning (RL) using static datasets. However, these methods suffer from severe distributional shifts, as offline trajectories fail to capture the stochastic state transitions and real-time feedback of unconstrained wide web environments. In this paper, we propose a robust Online Reinforcement Learning WebAgent, designed to optimize its policy through direct, iterative interactions with unconstrained wide websites. Our approach comprises three core innovations: 1) Hierarchical Multi-Task Fine-tuning: We curate a comprehensive mixture of datasets categorized by functional primitives -- Planning, Acting, and Grounding -- establishing a Vision-Language Model (VLM) with strong instruction-following capabilities for Web GUI tasks. 2) Online Agentic RL in the Wild: We develop an online interaction environment and fine-tune the VLM using a specialized RL pipeline. We introduce a Hybrid Reward Mechanism that combines a ground-truth-agnostic WebJudge for holistic outcome assessment with a Rule-based Decision Tree (RDT) for progress reward. This system effectively mitigates the credit assignment challenge in long-horizon navigation. Notably, our RL-enhanced model achieves a 38.1\% success rate (pass@5) on WebArena, outperforming all existing monolithic baselines. 3) Operator Agent: We introduce a modular agentic framework, namely \textbf{OpAgent}, orchestrating a Planner, Grounder, Reflector, and Summarizer. This synergy enables robust error recovery and self-correction, elevating the agent's performance to a new State-of-the-Art (SOTA) success rate of \textbf{71.6\%}.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_13559
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	OpAgent: Operator Agent for Web Navigation Guo, Yuyu Yang, Wenjie Yang, Siyuan Liu, Ziyang Chen, Cheng Wei, Yuan Hu, Yun Huang, Yang Hao, Guoliang Yuan, Dongsheng Wang, Jianming Chen, Xin Yu, Hang Lei, Lei Di, Peng Artificial Intelligence To fulfill user instructions, autonomous web agents must contend with the inherent complexity and volatile nature of real-world websites. Conventional paradigms predominantly rely on Supervised Fine-Tuning (SFT) or Offline Reinforcement Learning (RL) using static datasets. However, these methods suffer from severe distributional shifts, as offline trajectories fail to capture the stochastic state transitions and real-time feedback of unconstrained wide web environments. In this paper, we propose a robust Online Reinforcement Learning WebAgent, designed to optimize its policy through direct, iterative interactions with unconstrained wide websites. Our approach comprises three core innovations: 1) Hierarchical Multi-Task Fine-tuning: We curate a comprehensive mixture of datasets categorized by functional primitives -- Planning, Acting, and Grounding -- establishing a Vision-Language Model (VLM) with strong instruction-following capabilities for Web GUI tasks. 2) Online Agentic RL in the Wild: We develop an online interaction environment and fine-tune the VLM using a specialized RL pipeline. We introduce a Hybrid Reward Mechanism that combines a ground-truth-agnostic WebJudge for holistic outcome assessment with a Rule-based Decision Tree (RDT) for progress reward. This system effectively mitigates the credit assignment challenge in long-horizon navigation. Notably, our RL-enhanced model achieves a 38.1\% success rate (pass@5) on WebArena, outperforming all existing monolithic baselines. 3) Operator Agent: We introduce a modular agentic framework, namely \textbf{OpAgent}, orchestrating a Planner, Grounder, Reflector, and Summarizer. This synergy enables robust error recovery and self-correction, elevating the agent's performance to a new State-of-the-Art (SOTA) success rate of \textbf{71.6\%}.
title	OpAgent: Operator Agent for Web Navigation
topic	Artificial Intelligence
url	https://arxiv.org/abs/2602.13559

Similar Items