Saved in:
| Main Author: | Shen, Han |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.03493 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025)
by: Zhan, Guojian, et al.
Published: (2025)
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
by: Vanlioglu, Abdullah
Published: (2025)
by: Vanlioglu, Abdullah
Published: (2025)
FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
by: Xue, Jun, et al.
Published: (2026)
by: Xue, Jun, et al.
Published: (2026)
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments
by: Dewan, Shaurya, et al.
Published: (2024)
by: Dewan, Shaurya, et al.
Published: (2024)
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
by: Agarwal, Shivam, et al.
Published: (2025)
by: Agarwal, Shivam, et al.
Published: (2025)
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)
by: Huang, Luke J., et al.
Published: (2026)
A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)
by: Choi, Wonhyeok, et al.
Published: (2026)
Token-Efficient RL for LLM Reasoning
by: Lee, Alan, et al.
Published: (2025)
by: Lee, Alan, et al.
Published: (2025)
Scalable Policy-Based RL Algorithms for POMDPs
by: Anjarlekar, Ameya, et al.
Published: (2025)
by: Anjarlekar, Ameya, et al.
Published: (2025)
Accelerating Goal-Conditioned RL Algorithms and Research
by: Bortkiewicz, Michał, et al.
Published: (2024)
by: Bortkiewicz, Michał, et al.
Published: (2024)
Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)
by: Karine, Karine, et al.
Published: (2025)
EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
by: Shi, Jiahe, et al.
Published: (2025)
by: Shi, Jiahe, et al.
Published: (2025)
FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
by: Lv, Lei, et al.
Published: (2026)
by: Lv, Lei, et al.
Published: (2026)
LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning
by: Kang, Haoqiang, et al.
Published: (2026)
by: Kang, Haoqiang, et al.
Published: (2026)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)
by: Brantley, Kianté, et al.
Published: (2025)
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
by: Han, Zhenyu, et al.
Published: (2025)
by: Han, Zhenyu, et al.
Published: (2025)
Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail
by: Li, Yingru, et al.
Published: (2025)
by: Li, Yingru, et al.
Published: (2025)
rePIRL: Learn PRM with Inverse RL for LLM Reasoning
by: Wu, Xian, et al.
Published: (2026)
by: Wu, Xian, et al.
Published: (2026)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
by: Liu, Jia, et al.
Published: (2025)
by: Liu, Jia, et al.
Published: (2025)
How Can LLM Guide RL? A Value-Based Approach
by: Zhang, Shenao, et al.
Published: (2024)
by: Zhang, Shenao, et al.
Published: (2024)
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
by: Ye, Chenlu, et al.
Published: (2026)
by: Ye, Chenlu, et al.
Published: (2026)
IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
by: Cheng, Zhoujun, et al.
Published: (2026)
by: Cheng, Zhoujun, et al.
Published: (2026)
The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
by: Li, Yingru, et al.
Published: (2026)
by: Li, Yingru, et al.
Published: (2026)
Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL
by: Choi, Yunseon, et al.
Published: (2024)
by: Choi, Yunseon, et al.
Published: (2024)
Novel RL approach for efficient Elevator Group Control Systems
by: Vaartjes, Nathan, et al.
Published: (2025)
by: Vaartjes, Nathan, et al.
Published: (2025)
Deep RL With Information Constrained Policies: Generalization in Continuous Control
by: Malloy, Tailia, et al.
Published: (2020)
by: Malloy, Tailia, et al.
Published: (2020)
The Ends Justify the Thoughts: RL-Induced Motivated Reasoning in LLM CoTs
by: Howe, Nikolaus, et al.
Published: (2025)
by: Howe, Nikolaus, et al.
Published: (2025)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing
by: Sikeridis, Dimitrios, et al.
Published: (2024)
by: Sikeridis, Dimitrios, et al.
Published: (2024)
Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
by: Kong, Lingxiao, et al.
Published: (2026)
by: Kong, Lingxiao, et al.
Published: (2026)
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
by: Zhao, Anhao, et al.
Published: (2026)
by: Zhao, Anhao, et al.
Published: (2026)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)
by: Bhatia, Abhinav, et al.
Published: (2023)
Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
by: Nekoei, Hadi, et al.
Published: (2025)
by: Nekoei, Hadi, et al.
Published: (2025)
Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method
by: Choi, Kyuwon, et al.
Published: (2024)
by: Choi, Kyuwon, et al.
Published: (2024)
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS
by: Jin, Can, et al.
Published: (2025)
by: Jin, Can, et al.
Published: (2025)
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
by: Ged, François, et al.
Published: (2023)
by: Ged, François, et al.
Published: (2023)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
by: Dwyer, Madeleine, et al.
Published: (2025)
by: Dwyer, Madeleine, et al.
Published: (2025)
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
by: Rutherford, Alexander, et al.
Published: (2023)
by: Rutherford, Alexander, et al.
Published: (2023)
STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order
by: Gu, Chengyang, et al.
Published: (2026)
by: Gu, Chengyang, et al.
Published: (2026)
Similar Items
-
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025) -
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
by: Vanlioglu, Abdullah
Published: (2025) -
FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
by: Xue, Jun, et al.
Published: (2026) -
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments
by: Dewan, Shaurya, et al.
Published: (2024) -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
by: Agarwal, Shivam, et al.
Published: (2025)