:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Shen, Han
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.03493
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025)

Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
by: Vanlioglu, Abdullah
Published: (2025)

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control
by: Xue, Jun, et al.
Published: (2026)

Curiosity & Entropy Driven Unsupervised RL in Multiple Environments
by: Dewan, Shaurya, et al.
Published: (2024)

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
by: Agarwal, Shivam, et al.
Published: (2025)

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)

A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)

Token-Efficient RL for LLM Reasoning
by: Lee, Alan, et al.
Published: (2025)

Scalable Policy-Based RL Algorithms for POMDPs
by: Anjarlekar, Ameya, et al.
Published: (2025)

Accelerating Goal-Conditioned RL Algorithms and Research
by: Bortkiewicz, Michał, et al.
Published: (2024)

Combining LLM decision and RL action selection to improve RL policy for adaptive interventions
by: Karine, Karine, et al.
Published: (2025)

EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
by: Shi, Jiahe, et al.
Published: (2025)

FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
by: Lv, Lei, et al.
Published: (2026)

LaDi-RL: Latent Diffusion Reasoning Prevents Entropy Collapse in Reinforcement Learning
by: Kang, Haoqiang, et al.
Published: (2026)

Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)

AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
by: Han, Zhenyu, et al.
Published: (2025)

Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail
by: Li, Yingru, et al.
Published: (2025)

rePIRL: Learn PRM with Inverse RL for LLM Reasoning
by: Wu, Xian, et al.
Published: (2026)

SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)

ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
by: Liu, Jia, et al.
Published: (2025)

How Can LLM Guide RL? A Value-Based Approach
by: Zhang, Shenao, et al.
Published: (2024)

Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
by: Ye, Chenlu, et al.
Published: (2026)

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
by: Cheng, Zhoujun, et al.
Published: (2026)

The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
by: Li, Yingru, et al.
Published: (2026)

Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL
by: Choi, Yunseon, et al.
Published: (2024)

Novel RL approach for efficient Elevator Group Control Systems
by: Vaartjes, Nathan, et al.
Published: (2025)

Deep RL With Information Constrained Policies: Generalization in Continuous Control
by: Malloy, Tailia, et al.
Published: (2020)

The Ends Justify the Thoughts: RL-Induced Motivated Reasoning in LLM CoTs
by: Howe, Nikolaus, et al.
Published: (2025)

PickLLM: Context-Aware RL-Assisted Large Language Model Routing
by: Sikeridis, Dimitrios, et al.
Published: (2024)

Enhancing RL Generalizability in Robotics through SHAP Analysis of Algorithms and Hyperparameters
by: Kong, Lingxiao, et al.
Published: (2026)

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
by: Zhao, Anhao, et al.
Published: (2026)

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)

Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
by: Nekoei, Hadi, et al.
Published: (2025)

Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method
by: Choi, Kyuwon, et al.
Published: (2024)

Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS
by: Jin, Can, et al.
Published: (2025)

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
by: Ged, François, et al.
Published: (2023)

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)

It's Not You, It's Clipping: A Soft Trust-Region via Probability Smoothing for LLM RL
by: Dwyer, Madeleine, et al.
Published: (2025)

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
by: Rutherford, Alexander, et al.
Published: (2023)

STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order
by: Gu, Chengyang, et al.
Published: (2026)