Saved in:
| Main Author: | Tomihari, Akiyoshi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.04670 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
by: Tomihari, Akiyoshi, et al.
Published: (2026)
by: Tomihari, Akiyoshi, et al.
Published: (2026)
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
by: Tomihari, Akiyoshi, et al.
Published: (2024)
by: Tomihari, Akiyoshi, et al.
Published: (2024)
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
by: Tomihari, Akiyoshi, et al.
Published: (2025)
by: Tomihari, Akiyoshi, et al.
Published: (2025)
Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025)
by: Tomihari, Akiyoshi, et al.
Published: (2025)
DiRL: An Efficient Post-Training Framework for Diffusion Language Models
by: Zhu, Ying, et al.
Published: (2025)
by: Zhu, Ying, et al.
Published: (2025)
JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training
by: Hu, Zhengding, et al.
Published: (2026)
by: Hu, Zhengding, et al.
Published: (2026)
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
by: Ahmadi, Saba, et al.
Published: (2026)
by: Ahmadi, Saba, et al.
Published: (2026)
SCOPE-RL: Stable and Quantitative Control of Policy Entropy in RL Post-Training
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
by: Cui, Peng, et al.
Published: (2026)
by: Cui, Peng, et al.
Published: (2026)
Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models
by: Zhang, Hongyin, et al.
Published: (2025)
by: Zhang, Hongyin, et al.
Published: (2025)
Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
by: Wu, Runzhe, et al.
Published: (2025)
by: Wu, Runzhe, et al.
Published: (2025)
On the Optimal Reasoning Length for RL-Trained Language Models
by: Nohara, Daisuke, et al.
Published: (2026)
by: Nohara, Daisuke, et al.
Published: (2026)
On the Plasticity and Stability for Post-Training Large Language Models
by: Qiang, Wenwen, et al.
Published: (2026)
by: Qiang, Wenwen, et al.
Published: (2026)
Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models
by: Bergmeister, Andreas, et al.
Published: (2026)
by: Bergmeister, Andreas, et al.
Published: (2026)
Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models
by: Wang, Liangyu, et al.
Published: (2025)
by: Wang, Liangyu, et al.
Published: (2025)
Training Dynamics Impact Post-Training Quantization Robustness
by: Catalan-Tatjer, Albert, et al.
Published: (2025)
by: Catalan-Tatjer, Albert, et al.
Published: (2025)
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
by: Iso, Hayate, et al.
Published: (2026)
by: Iso, Hayate, et al.
Published: (2026)
Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
by: Fakoor, Rasool, et al.
Published: (2026)
by: Fakoor, Rasool, et al.
Published: (2026)
GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training
by: Hu, Yuelin, et al.
Published: (2026)
by: Hu, Yuelin, et al.
Published: (2026)
CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation
by: Chen, Yutong, et al.
Published: (2026)
by: Chen, Yutong, et al.
Published: (2026)
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
by: Han, Zhenyu, et al.
Published: (2025)
by: Han, Zhenyu, et al.
Published: (2025)
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
by: Amico, Jeffrey, et al.
Published: (2025)
by: Amico, Jeffrey, et al.
Published: (2025)
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
by: Gupta, Prakhar, et al.
Published: (2025)
by: Gupta, Prakhar, et al.
Published: (2025)
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
by: Qu, Yun, et al.
Published: (2026)
by: Qu, Yun, et al.
Published: (2026)
Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning
by: Pardinas, Rafael, et al.
Published: (2026)
by: Pardinas, Rafael, et al.
Published: (2026)
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
by: Zhou, Jin Peng, et al.
Published: (2025)
by: Zhou, Jin Peng, et al.
Published: (2025)
Laminar: A Scalable Asynchronous RL Post-Training Framework
by: Sheng, Guangming, et al.
Published: (2025)
by: Sheng, Guangming, et al.
Published: (2025)
SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning
by: Zhang, Ruiqi, et al.
Published: (2025)
by: Zhang, Ruiqi, et al.
Published: (2025)
Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models
by: Zou, Jade, et al.
Published: (2026)
by: Zou, Jade, et al.
Published: (2026)
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
by: Ding, Zheng, et al.
Published: (2025)
by: Ding, Zheng, et al.
Published: (2025)
Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
by: McAllister, David, et al.
Published: (2026)
by: McAllister, David, et al.
Published: (2026)
Role-Based Fault Tolerance System for LLM RL Post-Training
by: Chen, Zhenqian, et al.
Published: (2025)
by: Chen, Zhenqian, et al.
Published: (2025)
Group Causal Policy Optimization for Post-Training Large Language Models
by: Gu, Ziyin, et al.
Published: (2025)
by: Gu, Ziyin, et al.
Published: (2025)
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
by: Gwak, Minju, et al.
Published: (2026)
by: Gwak, Minju, et al.
Published: (2026)
Understanding Post-Training Structural Changes in Large Language Models
by: He, Xinyu, et al.
Published: (2025)
by: He, Xinyu, et al.
Published: (2025)
Scaling Laws for Post Training Quantized Large Language Models
by: Xu, Zifei, et al.
Published: (2024)
by: Xu, Zifei, et al.
Published: (2024)
Post Training Quantization of Large Language Models with Microscaling Formats
by: Sharify, Sayeh, et al.
Published: (2024)
by: Sharify, Sayeh, et al.
Published: (2024)
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
by: Xu, Charles, et al.
Published: (2026)
by: Xu, Charles, et al.
Published: (2026)
On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
by: Guo, Tao, et al.
Published: (2025)
by: Guo, Tao, et al.
Published: (2025)
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
by: Gao, Wei, et al.
Published: (2025)
by: Gao, Wei, et al.
Published: (2025)
Similar Items
-
Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
by: Tomihari, Akiyoshi, et al.
Published: (2026) -
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
by: Tomihari, Akiyoshi, et al.
Published: (2024) -
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
by: Tomihari, Akiyoshi, et al.
Published: (2025) -
Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025) -
DiRL: An Efficient Post-Training Framework for Diffusion Language Models
by: Zhu, Ying, et al.
Published: (2025)