Saved in:
| Main Authors: | Hu, Ruike, Wu, Shulei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.00319 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning
by: Nowak, Ivo
Published: (2026)
by: Nowak, Ivo
Published: (2026)
StructSynth: Leveraging LLMs for Structure-Aware Tabular Data Synthesis in Low-Data Regimes
by: Liu, Siyi, et al.
Published: (2025)
by: Liu, Siyi, et al.
Published: (2025)
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
by: Yao, Zhiyuan, et al.
Published: (2026)
by: Yao, Zhiyuan, et al.
Published: (2026)
Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation
by: Li, Huanyu, et al.
Published: (2026)
by: Li, Huanyu, et al.
Published: (2026)
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)
by: Wu, Xiefeng, et al.
Published: (2025)
StructMem: Structured Memory for Long-Horizon Behavior in LLMs
by: Xu, Buqiang, et al.
Published: (2026)
by: Xu, Buqiang, et al.
Published: (2026)
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
by: Chang, Fu-Chieh, et al.
Published: (2024)
by: Chang, Fu-Chieh, et al.
Published: (2024)
EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
by: Shi, Jiahe, et al.
Published: (2025)
by: Shi, Jiahe, et al.
Published: (2025)
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)
by: Wei, Zhepei, et al.
Published: (2025)
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
by: Samineni, Soumya Rani, et al.
Published: (2025)
by: Samineni, Soumya Rani, et al.
Published: (2025)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)
by: Liu, Shaoteng, et al.
Published: (2024)
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
by: Jin, Hangzhan, et al.
Published: (2025)
by: Jin, Hangzhan, et al.
Published: (2025)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)
by: Bhatia, Abhinav, et al.
Published: (2023)
LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training
by: Wu, Bo, et al.
Published: (2025)
by: Wu, Bo, et al.
Published: (2025)
CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
by: Zhang, Rui, et al.
Published: (2026)
by: Zhang, Rui, et al.
Published: (2026)
VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills
by: Lintunen, Erik M.
Published: (2025)
by: Lintunen, Erik M.
Published: (2025)
StructPrune: Structured Global Pruning asymptotics with $\mathcal{O}(\sqrt{N})$ GPU Memory
by: Song, Xinyuan, et al.
Published: (2025)
by: Song, Xinyuan, et al.
Published: (2025)
OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas
by: Huang, James Y., et al.
Published: (2025)
by: Huang, James Y., et al.
Published: (2025)
FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning
by: Zhang, Xikai, et al.
Published: (2026)
by: Zhang, Xikai, et al.
Published: (2026)
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
by: Marchesini, Enrico, et al.
Published: (2025)
by: Marchesini, Enrico, et al.
Published: (2025)
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
by: Berto, Federico, et al.
Published: (2023)
by: Berto, Federico, et al.
Published: (2023)
EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)
by: Bi, Jinhe, et al.
Published: (2026)
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)
by: Huang, Luke J., et al.
Published: (2026)
Reinforcement Learning Enhanced LLMs: A Survey
by: Wang, Shuhe, et al.
Published: (2024)
by: Wang, Shuhe, et al.
Published: (2024)
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
by: Dong, Yihong, et al.
Published: (2025)
by: Dong, Yihong, et al.
Published: (2025)
A Systematic Investigation of The RL-Jailbreaker in LLMs
by: Mohammedalamen, Montaser, et al.
Published: (2026)
by: Mohammedalamen, Montaser, et al.
Published: (2026)
SafeRL-Lite: A Lightweight, Explainable, and Constrained Reinforcement Learning Library
by: Mishra, Satyam, et al.
Published: (2025)
by: Mishra, Satyam, et al.
Published: (2025)
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
by: Bolton, William James, et al.
Published: (2024)
by: Bolton, William James, et al.
Published: (2024)
When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
by: Galeone, Cosimo, et al.
Published: (2026)
by: Galeone, Cosimo, et al.
Published: (2026)
Introducing PetriRL: An Innovative Framework for JSSP Resolution Integrating Petri nets and Event-based Reinforcement Learning
by: Lassoued, Sofiene, et al.
Published: (2024)
by: Lassoued, Sofiene, et al.
Published: (2024)
Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)
by: Mathur, Puneet, et al.
Published: (2026)
How Good Are LLMs at Processing Tool Outputs?
by: Kate, Kiran, et al.
Published: (2025)
by: Kate, Kiran, et al.
Published: (2025)
LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis
by: Wei, Zhihuan, et al.
Published: (2026)
by: Wei, Zhihuan, et al.
Published: (2026)
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
by: Cao, Boxi, et al.
Published: (2024)
by: Cao, Boxi, et al.
Published: (2024)
SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
by: Tahmid, Tokey, et al.
Published: (2025)
by: Tahmid, Tokey, et al.
Published: (2025)
A Reliable Cryptographic Framework for Empirical Machine Unlearning Evaluation
by: Tu, Yiwen, et al.
Published: (2024)
by: Tu, Yiwen, et al.
Published: (2024)
R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning
by: Farhi, Nadir
Published: (2025)
by: Farhi, Nadir
Published: (2025)
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning
by: Wang, Jichao, et al.
Published: (2026)
by: Wang, Jichao, et al.
Published: (2026)
ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
by: Romeo, Carlo, et al.
Published: (2026)
by: Romeo, Carlo, et al.
Published: (2026)
Similar Items
-
StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning
by: Nowak, Ivo
Published: (2026) -
StructSynth: Leveraging LLMs for Structure-Aware Tabular Data Synthesis in Low-Data Regimes
by: Liu, Siyi, et al.
Published: (2025) -
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
by: Yao, Zhiyuan, et al.
Published: (2026) -
Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation
by: Li, Huanyu, et al.
Published: (2026) -
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)