:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Ruike, Wu, Shulei
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2512.00319
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning
by: Nowak, Ivo
Published: (2026)

StructSynth: Leveraging LLMs for Structure-Aware Tabular Data Synthesis in Low-Data Regimes
by: Liu, Siyi, et al.
Published: (2025)

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs
by: Yao, Zhiyuan, et al.
Published: (2026)

Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation
by: Li, Huanyu, et al.
Published: (2026)

Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)

StructMem: Structured Memory for Long-Horizon Behavior in LLMs
by: Xu, Buqiang, et al.
Published: (2026)

RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
by: Chang, Fu-Chieh, et al.
Published: (2024)

EARL: Entropy-Aware RL Alignment of LLMs for Reliable RTL Code Generation
by: Shi, Jiahe, et al.
Published: (2025)

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)

RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
by: Samineni, Soumya Rani, et al.
Published: (2025)

RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)

RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
by: Jin, Hangzhan, et al.
Published: (2025)

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)

LlamaRL: A Distributed Asynchronous Reinforcement Learning Framework for Efficient Large-scale LLM Training
by: Wu, Bo, et al.
Published: (2025)

CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
by: Zhang, Rui, et al.
Published: (2026)

VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills
by: Lintunen, Erik M.
Published: (2025)

StructPrune: Structured Global Pruning asymptotics with $\mathcal{O}(\sqrt{N})$ GPU Memory
by: Song, Xinyuan, et al.
Published: (2025)

OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas
by: Huang, James Y., et al.
Published: (2025)

FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning
by: Zhang, Xikai, et al.
Published: (2026)

RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
by: Marchesini, Enrico, et al.
Published: (2025)

RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
by: Berto, Federico, et al.
Published: (2023)

EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)

Reinforcement Learning Enhanced LLMs: A Survey
by: Wang, Shuhe, et al.
Published: (2024)

RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
by: Dong, Yihong, et al.
Published: (2025)

A Systematic Investigation of The RL-Jailbreaker in LLMs
by: Mohammedalamen, Montaser, et al.
Published: (2026)

SafeRL-Lite: A Lightweight, Explainable, and Constrained Reinforcement Learning Library
by: Mishra, Satyam, et al.
Published: (2025)

RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
by: Bolton, William James, et al.
Published: (2024)

When Correct Isn't Usable: Improving Structured Output Reliability in Small Language Models
by: Galeone, Cosimo, et al.
Published: (2026)

Introducing PetriRL: An Innovative Framework for JSSP Resolution Integrating Petri nets and Event-based Reinforcement Learning
by: Lassoued, Sofiene, et al.
Published: (2024)

Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)

How Good Are LLMs at Processing Tool Outputs?
by: Kate, Kiran, et al.
Published: (2025)

LiteInception: A Lightweight and Interpretable Deep Learning Framework for General Aviation Fault Diagnosis
by: Wei, Zhihuan, et al.
Published: (2026)

StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation
by: Cao, Boxi, et al.
Published: (2024)

SpikeRL: A Scalable and Energy-efficient Framework for Deep Spiking Reinforcement Learning
by: Tahmid, Tokey, et al.
Published: (2025)

A Reliable Cryptographic Framework for Empirical Machine Unlearning Evaluation
by: Tu, Yiwen, et al.
Published: (2024)

R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning
by: Farhi, Nadir
Published: (2025)

MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)

SOLAR-RL: Semi-Online Long-horizon Assignment Reinforcement Learning
by: Wang, Jichao, et al.
Published: (2026)

ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders
by: Romeo, Carlo, et al.
Published: (2026)