Saved in:
| Main Authors: | Dionisopoulos, Lucas, Majamaki, Nicklas, Ammanabrolu, Prithviraj |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.05134 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning
by: Kim, Bosung, et al.
Published: (2025)
by: Kim, Bosung, et al.
Published: (2025)
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
by: Wang, Ruiyi, et al.
Published: (2025)
by: Wang, Ruiyi, et al.
Published: (2025)
Preference-Based Learning in Audio Applications: A Systematic Analysis
by: Broukhim, Aaron, et al.
Published: (2025)
by: Broukhim, Aaron, et al.
Published: (2025)
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
by: Shen, Yiran, et al.
Published: (2025)
by: Shen, Yiran, et al.
Published: (2025)
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
by: Kang, Haoqiang, et al.
Published: (2025)
by: Kang, Haoqiang, et al.
Published: (2025)
Introspective X Training: Feedback Conditioning Improves Scaling Across all LLM Training Stages
by: Cui, Brandon, et al.
Published: (2026)
by: Cui, Brandon, et al.
Published: (2026)
VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study
by: Zhang, Zhicheng, et al.
Published: (2026)
by: Zhang, Zhicheng, et al.
Published: (2026)
Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight
by: Cui, Christopher Z., et al.
Published: (2026)
by: Cui, Christopher Z., et al.
Published: (2026)
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
by: Tan, Zelin, et al.
Published: (2025)
by: Tan, Zelin, et al.
Published: (2025)
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
by: Hwang, Dongyoon, et al.
Published: (2025)
by: Hwang, Dongyoon, et al.
Published: (2025)
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
by: Liu, Jincheng, et al.
Published: (2025)
by: Liu, Jincheng, et al.
Published: (2025)
Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training
by: Matsutani, Kohsei, et al.
Published: (2026)
by: Matsutani, Kohsei, et al.
Published: (2026)
ChessQA: Evaluating Large Language Models for Chess Understanding
by: Wen, Qianfeng, et al.
Published: (2025)
by: Wen, Qianfeng, et al.
Published: (2025)
Data-Efficient Training by Evolved Sampling
by: Cheng, Ziheng, et al.
Published: (2025)
by: Cheng, Ziheng, et al.
Published: (2025)
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data
by: Akter, Syeda Nahida, et al.
Published: (2025)
by: Akter, Syeda Nahida, et al.
Published: (2025)
Generating Creative Chess Puzzles
by: Feng, Xidong, et al.
Published: (2025)
by: Feng, Xidong, et al.
Published: (2025)
Implicit Search via Discrete Diffusion: A Study on Chess
by: Ye, Jiacheng, et al.
Published: (2025)
by: Ye, Jiacheng, et al.
Published: (2025)
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Liquid Reasoning Transformers: A Sudoku-Based Prototype for Chess-Scale Algorithmic Tasks
by: Sahni, Shivansh, et al.
Published: (2025)
by: Sahni, Shivansh, et al.
Published: (2025)
Complete Chess Games Enable LLM Become A Chess Master
by: Zhang, Yinqi, et al.
Published: (2025)
by: Zhang, Yinqi, et al.
Published: (2025)
Amortized Planning with Large-Scale Transformers: A Case Study on Chess
by: Ruoss, Anian, et al.
Published: (2024)
by: Ruoss, Anian, et al.
Published: (2024)
Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026)
by: Javanmard, Adel, et al.
Published: (2026)
Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
by: Hu, Pingbang, et al.
Published: (2026)
by: Hu, Pingbang, et al.
Published: (2026)
Human-aligned Chess with a Bit of Search
by: Zhang, Yiming, et al.
Published: (2024)
by: Zhang, Yiming, et al.
Published: (2024)
Enhancing Chess Reinforcement Learning with Graph Representation
by: Rigaux, Tomas, et al.
Published: (2024)
by: Rigaux, Tomas, et al.
Published: (2024)
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
by: Chen, Zui, et al.
Published: (2024)
by: Chen, Zui, et al.
Published: (2024)
Towards Piece-by-Piece Explanations for Chess Positions with SHAP
by: Spinnato, Francesco
Published: (2025)
by: Spinnato, Francesco
Published: (2025)
Iterative Inference in a Chess-Playing Neural Network
by: Sandmann, Elias, et al.
Published: (2025)
by: Sandmann, Elias, et al.
Published: (2025)
Diversifying AI: Towards Creative Chess with AlphaZero
by: Zahavy, Tom, et al.
Published: (2023)
by: Zahavy, Tom, et al.
Published: (2023)
Mastering Chinese Chess AI (Xiangqi) Without Search
by: Chen, Yu, et al.
Published: (2024)
by: Chen, Yu, et al.
Published: (2024)
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training
by: Nepal, Aadim, et al.
Published: (2025)
by: Nepal, Aadim, et al.
Published: (2025)
Oracle-Guided Soft Shielding for Safe Move Prediction in Chess
by: Rajendran, Prajit T, et al.
Published: (2026)
by: Rajendran, Prajit T, et al.
Published: (2026)
Mixture of Masters: Sparse Chess Language Models with Player Routing
by: Frisoni, Giacomo, et al.
Published: (2026)
by: Frisoni, Giacomo, et al.
Published: (2026)
Learning to Reason Efficiently with A* Post-Training
by: Opedal, Andreas, et al.
Published: (2026)
by: Opedal, Andreas, et al.
Published: (2026)
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions
by: Veeriah, Vivek, et al.
Published: (2025)
by: Veeriah, Vivek, et al.
Published: (2025)
Self-Evolving Curriculum for LLM Reasoning
by: Chen, Xiaoyin, et al.
Published: (2025)
by: Chen, Xiaoyin, et al.
Published: (2025)
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
by: Li, Qi, et al.
Published: (2025)
by: Li, Qi, et al.
Published: (2025)
RewardHarness: Self-Evolving Agentic Post-Training
by: Zhang, Yuxuan, et al.
Published: (2026)
by: Zhang, Yuxuan, et al.
Published: (2026)
R-Zero: Self-Evolving Reasoning LLM from Zero Data
by: Huang, Chengsong, et al.
Published: (2025)
by: Huang, Chengsong, et al.
Published: (2025)
Similar Items
-
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning
by: Kim, Bosung, et al.
Published: (2025) -
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
by: Wang, Ruiyi, et al.
Published: (2025) -
Preference-Based Learning in Audio Applications: A Systematic Analysis
by: Broukhim, Aaron, et al.
Published: (2025) -
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
by: Shen, Yiran, et al.
Published: (2025) -
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
by: Kang, Haoqiang, et al.
Published: (2025)