Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yao, Yihang, Cen, Zhepeng, Lin, Haohong, Liu, Shiqi, Liu, Zuxin, Zhu, Jiacheng, Hong, Zhang-Wei, Shi, Laixi, Zhao, Ding
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2602.11351
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915794105925632
author	Yao, Yihang Cen, Zhepeng Lin, Haohong Liu, Shiqi Liu, Zuxin Zhu, Jiacheng Hong, Zhang-Wei Shi, Laixi Zhao, Ding
author_facet	Yao, Yihang Cen, Zhepeng Lin, Haohong Liu, Shiqi Liu, Zuxin Zhu, Jiacheng Hong, Zhang-Wei Shi, Laixi Zhao, Ding
contents	Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users' intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities with behavior regularization to suppress inefficient or redundant interactions and align agent behavior with user expectations. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-aligned LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_11351
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization Yao, Yihang Cen, Zhepeng Lin, Haohong Liu, Shiqi Liu, Zuxin Zhu, Jiacheng Hong, Zhang-Wei Shi, Laixi Zhao, Ding Artificial Intelligence Machine Learning Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users' intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities with behavior regularization to suppress inefficient or redundant interactions and align agent behavior with user expectations. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-aligned LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.
title	Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2602.11351

Similar Items