Saved in:
| Main Authors: | Yu, Yajie, Feng, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.15313 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
by: Feng, Xinshun, et al.
Published: (2026)
by: Feng, Xinshun, et al.
Published: (2026)
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
by: Li, Yibo, et al.
Published: (2026)
by: Li, Yibo, et al.
Published: (2026)
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
by: Sequeira, Pedro, et al.
Published: (2025)
by: Sequeira, Pedro, et al.
Published: (2025)
PolicyEvolve: Evolving Programmatic Policies by LLMs for multi-player games via Population-Based Training
by: Lv, Mingrui, et al.
Published: (2025)
by: Lv, Mingrui, et al.
Published: (2025)
One-Way Policy Optimization for Self-Evolving LLMs
by: Yang, Shuo, et al.
Published: (2026)
by: Yang, Shuo, et al.
Published: (2026)
Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization
by: Zhou, Huilin, et al.
Published: (2026)
by: Zhou, Huilin, et al.
Published: (2026)
Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization
by: Wang, Junzhe, et al.
Published: (2026)
by: Wang, Junzhe, et al.
Published: (2026)
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
by: Zhang, Haozhen, et al.
Published: (2026)
by: Zhang, Haozhen, et al.
Published: (2026)
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees
by: Huang, Chenyu, et al.
Published: (2026)
by: Huang, Chenyu, et al.
Published: (2026)
EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
by: Liu, Jiaqi, et al.
Published: (2026)
by: Liu, Jiaqi, et al.
Published: (2026)
Group-in-Group Policy Optimization for LLM Agent Training
by: Feng, Lang, et al.
Published: (2025)
by: Feng, Lang, et al.
Published: (2025)
Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
PolicyLong: Towards On-Policy Context Extension
by: Jia, Junlong, et al.
Published: (2026)
by: Jia, Junlong, et al.
Published: (2026)
Policy Dispersion in Non-Markovian Environment
by: Qu, Bohao, et al.
Published: (2023)
by: Qu, Bohao, et al.
Published: (2023)
Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization
by: Liu, Zongkai, et al.
Published: (2024)
by: Liu, Zongkai, et al.
Published: (2024)
Reinforcing Language Agents via Policy Optimization with Action Decomposition
by: Wen, Muning, et al.
Published: (2024)
by: Wen, Muning, et al.
Published: (2024)
AgentEvolver: Towards Efficient Self-Evolving Agent System
by: Zhai, Yunpeng, et al.
Published: (2025)
by: Zhai, Yunpeng, et al.
Published: (2025)
UCPO: Uncertainty-Aware Policy Optimization
by: Zeng, Xianzhou, et al.
Published: (2026)
by: Zeng, Xianzhou, et al.
Published: (2026)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
by: Luo, Haipeng, et al.
Published: (2023)
by: Luo, Haipeng, et al.
Published: (2023)
Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing
by: Khan, Azal Ahmad, et al.
Published: (2026)
by: Khan, Azal Ahmad, et al.
Published: (2026)
COPR: Continual Human Preference Learning via Optimal Policy Regularization
by: Zhang, Han, et al.
Published: (2024)
by: Zhang, Han, et al.
Published: (2024)
Provable and Practical In-Context Policy Optimization for Self-Improvement
by: Yu, Tianrun, et al.
Published: (2026)
by: Yu, Tianrun, et al.
Published: (2026)
Intrinsic Reward Policy Optimization for Sparse-Reward Environments
by: Cho, Minjae, et al.
Published: (2026)
by: Cho, Minjae, et al.
Published: (2026)
PA3: Policy-Aware Agent Alignment through Chain-of-Thought
by: Dipta, Shubhashis Roy, et al.
Published: (2026)
by: Dipta, Shubhashis Roy, et al.
Published: (2026)
HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
by: Ding, Ken
Published: (2026)
by: Ding, Ken
Published: (2026)
Calibration-Aware Policy Optimization for Reasoning LLMs
by: Wang, Ziqi, et al.
Published: (2026)
by: Wang, Ziqi, et al.
Published: (2026)
Fine-tuning Pocket-Aware Diffusion Models via Denoising Policy Optimization
by: Xue, Yuan, et al.
Published: (2026)
by: Xue, Yuan, et al.
Published: (2026)
POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles
by: Menet, Nicolas, et al.
Published: (2026)
by: Menet, Nicolas, et al.
Published: (2026)
Symbolic Learning Enables Self-Evolving Agents
by: Zhou, Wangchunshu, et al.
Published: (2024)
by: Zhou, Wangchunshu, et al.
Published: (2024)
IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck
by: Deng, Huilin, et al.
Published: (2026)
by: Deng, Huilin, et al.
Published: (2026)
Adaptive Social Learning via Mode Policy Optimization for Language Agents
by: Wang, Minzheng, et al.
Published: (2025)
by: Wang, Minzheng, et al.
Published: (2025)
ClawArena: Benchmarking AI Agents in Evolving Information Environments
by: Ji, Haonian, et al.
Published: (2026)
by: Ji, Haonian, et al.
Published: (2026)
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
by: Liu, Zeyuan, et al.
Published: (2026)
by: Liu, Zeyuan, et al.
Published: (2026)
Decision Making in Non-Stationary Environments with Policy-Augmented Search
by: Pettet, Ava, et al.
Published: (2024)
by: Pettet, Ava, et al.
Published: (2024)
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
by: Li, Gengsheng, et al.
Published: (2026)
by: Li, Gengsheng, et al.
Published: (2026)
TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment
by: Wang, Jiaxuan, et al.
Published: (2026)
by: Wang, Jiaxuan, et al.
Published: (2026)
Towards Flash Thinking via Decoupled Advantage Policy Optimization
by: Tan, Zezhong, et al.
Published: (2025)
by: Tan, Zezhong, et al.
Published: (2025)
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
by: Jiang, Yuhua, et al.
Published: (2025)
by: Jiang, Yuhua, et al.
Published: (2025)
Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning
by: Lepel, Olivier, et al.
Published: (2024)
by: Lepel, Olivier, et al.
Published: (2024)
Self-Evolving LLMs via Continual Instruction Tuning
by: Kang, Jiazheng, et al.
Published: (2025)
by: Kang, Jiazheng, et al.
Published: (2025)
Similar Items
-
SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
by: Feng, Xinshun, et al.
Published: (2026) -
APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents
by: Li, Yibo, et al.
Published: (2026) -
ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies
by: Sequeira, Pedro, et al.
Published: (2025) -
PolicyEvolve: Evolving Programmatic Policies by LLMs for multi-player games via Population-Based Training
by: Lv, Mingrui, et al.
Published: (2025) -
One-Way Policy Optimization for Self-Evolving LLMs
by: Yang, Shuo, et al.
Published: (2026)