Saved in:
| Main Authors: | Wang, Sai, Wu, Yu, Xu, Zhongwen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.25052 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding Tool-Integrated Reasoning
by: Lin, Heng, et al.
Published: (2025)
by: Lin, Heng, et al.
Published: (2025)
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
by: Xu, Zelai, et al.
Published: (2023)
by: Xu, Zelai, et al.
Published: (2023)
Single-stream Policy Optimization
by: Xu, Zhongwen, et al.
Published: (2025)
by: Xu, Zhongwen, et al.
Published: (2025)
Learning to Optimize for Reinforcement Learning
by: Lan, Qingfeng, et al.
Published: (2023)
by: Lan, Qingfeng, et al.
Published: (2023)
Learning Game-Playing Agents with Generative Code Optimization
by: Kuang, Zhiyi, et al.
Published: (2025)
by: Kuang, Zhiyi, et al.
Published: (2025)
Mutual Information Regularized Offline Reinforcement Learning
by: Ma, Xiao, et al.
Published: (2022)
by: Ma, Xiao, et al.
Published: (2022)
Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis
by: Li, Xinyi, et al.
Published: (2025)
by: Li, Xinyi, et al.
Published: (2025)
ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
by: Fan, Chao, et al.
Published: (2024)
by: Fan, Chao, et al.
Published: (2024)
Spatial Reasoning and Planning for Deep Embodied Agents
by: Ishida, Shu
Published: (2024)
by: Ishida, Shu
Published: (2024)
Learning Robust Reasoning through Guided Adversarial Self-Play
by: Li, Shuozhe, et al.
Published: (2026)
by: Li, Shuozhe, et al.
Published: (2026)
SmartPlay: A Benchmark for LLMs as Intelligent Agents
by: Wu, Yue, et al.
Published: (2023)
by: Wu, Yue, et al.
Published: (2023)
Reinforced Reasoning for Embodied Planning
by: Wu, Di, et al.
Published: (2025)
by: Wu, Di, et al.
Published: (2025)
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
by: Liu, Bo, et al.
Published: (2025)
by: Liu, Bo, et al.
Published: (2025)
Learning to play: A Multimodal Agent for 3D Game-Play
by: Yue, Yuguang, et al.
Published: (2025)
by: Yue, Yuguang, et al.
Published: (2025)
LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
by: Jain, Ojas, et al.
Published: (2026)
by: Jain, Ojas, et al.
Published: (2026)
Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
by: Wang, Zehong, et al.
Published: (2026)
by: Wang, Zehong, et al.
Published: (2026)
Self-Improving AI Agents through Self-Play
by: Chojecki, Przemyslaw
Published: (2025)
by: Chojecki, Przemyslaw
Published: (2025)
TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
by: Tan, Zhewen, et al.
Published: (2026)
by: Tan, Zhewen, et al.
Published: (2026)
Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning
by: Qian, Yilue, et al.
Published: (2023)
by: Qian, Yilue, et al.
Published: (2023)
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks
by: Wang, Tianlong, et al.
Published: (2024)
by: Wang, Tianlong, et al.
Published: (2024)
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
by: Putta, Pranav, et al.
Published: (2024)
by: Putta, Pranav, et al.
Published: (2024)
WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement
by: Li, Fangyuan, et al.
Published: (2026)
by: Li, Fangyuan, et al.
Published: (2026)
Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints
by: Qiao, Dan, et al.
Published: (2024)
by: Qiao, Dan, et al.
Published: (2024)
Play Style Identification Using Low-Level Representations of Play Traces in MicroRTS
by: Xia, Ruizhe Yu, et al.
Published: (2025)
by: Xia, Ruizhe Yu, et al.
Published: (2025)
OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning
by: Wu, Hao, et al.
Published: (2026)
by: Wu, Hao, et al.
Published: (2026)
TextAtari: 100K Frames Game Playing with Language Agents
by: Li, Wenhao, et al.
Published: (2025)
by: Li, Wenhao, et al.
Published: (2025)
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning
by: Yang, Chengcao
Published: (2026)
by: Yang, Chengcao
Published: (2026)
Better LLM Reasoning via Dual-Play
by: Zhang, Zhengxin, et al.
Published: (2025)
by: Zhang, Zhengxin, et al.
Published: (2025)
Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
by: Cheng, Hanyin, et al.
Published: (2026)
by: Cheng, Hanyin, et al.
Published: (2026)
MASP: Scalable GNN-based Planning for Multi-Agent Navigation
by: Yang, Xinyi, et al.
Published: (2023)
by: Yang, Xinyi, et al.
Published: (2023)
Differentially Private Reinforcement Learning with Self-Play
by: Qiao, Dan, et al.
Published: (2024)
by: Qiao, Dan, et al.
Published: (2024)
Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
by: Zhang, Yifei, et al.
Published: (2026)
by: Zhang, Yifei, et al.
Published: (2026)
Demystifying MuZero Planning: Interpreting the Learned Model
by: Guei, Hung, et al.
Published: (2024)
by: Guei, Hung, et al.
Published: (2024)
Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
by: Winston, Caleb, et al.
Published: (2026)
by: Winston, Caleb, et al.
Published: (2026)
Learning to Play Blackjack: A Curriculum Learning Perspective
by: Alasti, Amirreza, et al.
Published: (2026)
by: Alasti, Amirreza, et al.
Published: (2026)
Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions
by: Long, Weifan, et al.
Published: (2024)
by: Long, Weifan, et al.
Published: (2024)
Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
by: Gao, Shujian, et al.
Published: (2026)
by: Gao, Shujian, et al.
Published: (2026)
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
by: Cao, Qi, et al.
Published: (2025)
by: Cao, Qi, et al.
Published: (2025)
Similar Items
-
Understanding Tool-Integrated Reasoning
by: Lin, Heng, et al.
Published: (2025) -
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
by: Xu, Zelai, et al.
Published: (2023) -
Single-stream Policy Optimization
by: Xu, Zhongwen, et al.
Published: (2025) -
Learning to Optimize for Reinforcement Learning
by: Lan, Qingfeng, et al.
Published: (2023) -
Learning Game-Playing Agents with Generative Code Optimization
by: Kuang, Zhiyi, et al.
Published: (2025)