:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Tomihari, Akiyoshi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.04670
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation
by: Tomihari, Akiyoshi, et al.
Published: (2026)

Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective
by: Tomihari, Akiyoshi, et al.
Published: (2024)

Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
by: Tomihari, Akiyoshi, et al.
Published: (2025)

Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025)

DiRL: An Efficient Post-Training Framework for Diffusion Language Models
by: Zhu, Ying, et al.
Published: (2025)

JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training
by: Hu, Zhengding, et al.
Published: (2026)

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
by: Ahmadi, Saba, et al.
Published: (2026)

SCOPE-RL: Stable and Quantitative Control of Policy Entropy in RL Post-Training
by: Wang, Chen, et al.
Published: (2025)

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
by: Cui, Peng, et al.
Published: (2026)

Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models
by: Zhang, Hongyin, et al.
Published: (2025)

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
by: Wu, Runzhe, et al.
Published: (2025)

On the Optimal Reasoning Length for RL-Trained Language Models
by: Nohara, Daisuke, et al.
Published: (2026)

On the Plasticity and Stability for Post-Training Large Language Models
by: Qiang, Wenwen, et al.
Published: (2026)

Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models
by: Bergmeister, Andreas, et al.
Published: (2026)

Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models
by: Wang, Liangyu, et al.
Published: (2025)

Training Dynamics Impact Post-Training Quantization Robustness
by: Catalan-Tatjer, Albert, et al.
Published: (2025)

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
by: Iso, Hayate, et al.
Published: (2026)

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
by: Fakoor, Rasool, et al.
Published: (2026)

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training
by: Hu, Yuelin, et al.
Published: (2026)

CoScale-RL: Efficient Post-Training by Co-Scaling Data and Computation
by: Chen, Yutong, et al.
Published: (2026)

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
by: Han, Zhenyu, et al.
Published: (2025)

Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
by: Amico, Jeffrey, et al.
Published: (2025)

Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
by: Gupta, Prakhar, et al.
Published: (2025)

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
by: Qu, Yun, et al.
Published: (2026)

Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning
by: Pardinas, Rafael, et al.
Published: (2026)

$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
by: Zhou, Jin Peng, et al.
Published: (2025)

Laminar: A Scalable Asynchronous RL Post-Training Framework
by: Sheng, Guangming, et al.
Published: (2025)

SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning
by: Zhang, Ruiqi, et al.
Published: (2025)

Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models
by: Zou, Jade, et al.
Published: (2026)

TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
by: Ding, Zheng, et al.
Published: (2025)

Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models
by: McAllister, David, et al.
Published: (2026)

Role-Based Fault Tolerance System for LLM RL Post-Training
by: Chen, Zhenqian, et al.
Published: (2025)

Group Causal Policy Optimization for Post-Training Large Language Models
by: Gu, Ziyin, et al.
Published: (2025)

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training
by: Gwak, Minju, et al.
Published: (2026)

Understanding Post-Training Structural Changes in Large Language Models
by: He, Xinyu, et al.
Published: (2025)

Scaling Laws for Post Training Quantized Large Language Models
by: Xu, Zifei, et al.
Published: (2024)

Post Training Quantization of Large Language Models with Microscaling Formats
by: Sharify, Sayeh, et al.
Published: (2024)

RL Token: Bootstrapping Online RL with Vision-Language-Action Models
by: Xu, Charles, et al.
Published: (2026)

On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
by: Guo, Tao, et al.
Published: (2025)

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
by: Gao, Wei, et al.
Published: (2025)