:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Sai, Wu, Yu, Xu, Zhongwen
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2509.25052
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Understanding Tool-Integrated Reasoning
by: Lin, Heng, et al.
Published: (2025)

Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
by: Xu, Zelai, et al.
Published: (2023)

Single-stream Policy Optimization
by: Xu, Zhongwen, et al.
Published: (2025)

Learning to Optimize for Reinforcement Learning
by: Lan, Qingfeng, et al.
Published: (2023)

Learning Game-Playing Agents with Generative Code Optimization
by: Kuang, Zhiyi, et al.
Published: (2025)

Mutual Information Regularized Offline Reinforcement Learning
by: Ma, Xiao, et al.
Published: (2022)

Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis
by: Li, Xinyi, et al.
Published: (2025)

ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers
by: Fan, Chao, et al.
Published: (2024)

Spatial Reasoning and Planning for Deep Embodied Agents
by: Ishida, Shu
Published: (2024)

Learning Robust Reasoning through Guided Adversarial Self-Play
by: Li, Shuozhe, et al.
Published: (2026)

SmartPlay: A Benchmark for LLMs as Intelligent Agents
by: Wu, Yue, et al.
Published: (2023)

Reinforced Reasoning for Embodied Planning
by: Wu, Di, et al.
Published: (2025)

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
by: Liu, Bo, et al.
Published: (2025)

Learning to play: A Multimodal Agent for 3D Game-Play
by: Yue, Yuguang, et al.
Published: (2025)

LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
by: Jain, Ojas, et al.
Published: (2026)

Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
by: Wang, Zehong, et al.
Published: (2026)

Self-Improving AI Agents through Self-Play
by: Chojecki, Przemyslaw
Published: (2025)

TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
by: Tan, Zhewen, et al.
Published: (2026)

Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning
by: Qian, Yilue, et al.
Published: (2023)

CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks
by: Wang, Tianlong, et al.
Published: (2024)

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
by: Putta, Pranav, et al.
Published: (2024)

WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement
by: Li, Fangyuan, et al.
Published: (2026)

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints
by: Qiao, Dan, et al.
Published: (2024)

Play Style Identification Using Low-Level Representations of Play Traces in MicroRTS
by: Xia, Ruizhe Yu, et al.
Published: (2025)

OMNIFLOW: A Physics-Grounded Multimodal Agent for Generalized Scientific Reasoning
by: Wu, Hao, et al.
Published: (2026)

TextAtari: 100K Frames Game Playing with Language Agents
by: Li, Wenhao, et al.
Published: (2025)

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
by: Xu, Ran, et al.
Published: (2025)

ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning
by: Yang, Chengcao
Published: (2026)

Better LLM Reasoning via Dual-Play
by: Zhang, Zhengxin, et al.
Published: (2025)

Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization
by: Li, Yu, et al.
Published: (2026)

CoRA: Boosting Time Series Foundation Models for Multivariate Forecasting through Correlation-aware Adapter
by: Cheng, Hanyin, et al.
Published: (2026)

MASP: Scalable GNN-based Planning for Multi-Agent Navigation
by: Yang, Xinyi, et al.
Published: (2023)

Differentially Private Reinforcement Learning with Self-Play
by: Qiao, Dan, et al.
Published: (2024)

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
by: Zhang, Yifei, et al.
Published: (2026)

Demystifying MuZero Planning: Interpreting the Learned Model
by: Guei, Hung, et al.
Published: (2024)

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
by: Winston, Caleb, et al.
Published: (2026)

Learning to Play Blackjack: A Curriculum Learning Perspective
by: Alasti, Amirreza, et al.
Published: (2026)

Role Play: Learning Adaptive Role-Specific Strategies in Multi-Agent Interactions
by: Long, Weifan, et al.
Published: (2024)

Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy
by: Gao, Shujian, et al.
Published: (2026)

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
by: Cao, Qi, et al.
Published: (2025)