Saved in:
| Main Authors: | Qiu, Ruiyu, Wang, Rui, Yang, Guanghui, Li, Xiang, Shao, Zhijiang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.08339 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning
by: Tercan, Alperen, et al.
Published: (2024)
by: Tercan, Alperen, et al.
Published: (2024)
Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
by: Nishimori, Soichiro, et al.
Published: (2026)
by: Nishimori, Soichiro, et al.
Published: (2026)
ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
by: Hou, Hongru, et al.
Published: (2026)
by: Hou, Hongru, et al.
Published: (2026)
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning
by: Jiang, Xitai, et al.
Published: (2026)
by: Jiang, Xitai, et al.
Published: (2026)
Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)
by: Mathur, Puneet, et al.
Published: (2026)
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)
by: Liu, Bingshuai, et al.
Published: (2025)
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
by: Zhang, Wenhao, et al.
Published: (2025)
by: Zhang, Wenhao, et al.
Published: (2025)
Metric-Gradient Projection for Stable Multi-Agent Policy Learning
by: Zhang, Zuyuan, et al.
Published: (2026)
by: Zhang, Zuyuan, et al.
Published: (2026)
TreeRPO: Tree Relative Policy Optimization
by: Yang, Zhicheng, et al.
Published: (2025)
by: Yang, Zhicheng, et al.
Published: (2025)
Policy Gradient Methods for Non-Markovian Reinforcement Learning
by: Kar, Avik, et al.
Published: (2026)
by: Kar, Avik, et al.
Published: (2026)
Policy Gradients for Cumulative Prospect Theory in Reinforcement Learning
by: Lepel, Olivier, et al.
Published: (2024)
by: Lepel, Olivier, et al.
Published: (2024)
SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
by: Samadi, Amir, et al.
Published: (2024)
by: Samadi, Amir, et al.
Published: (2024)
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
by: Yang, Zhicheng, et al.
Published: (2025)
by: Yang, Zhicheng, et al.
Published: (2025)
$K$-Level Policy Gradients for Multi-Agent Reinforcement Learning
by: Reddi, Aryaman, et al.
Published: (2025)
by: Reddi, Aryaman, et al.
Published: (2025)
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
by: Batra, Sumeet, et al.
Published: (2023)
by: Batra, Sumeet, et al.
Published: (2023)
CARE-RL: Capability-Aware Reinforcement Learning for Mitigating Cross-Domain Conflicts
by: Zhang, Rui, et al.
Published: (2026)
by: Zhang, Rui, et al.
Published: (2026)
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
by: Feng, Xidong, et al.
Published: (2021)
by: Feng, Xidong, et al.
Published: (2021)
Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
by: Kiyohara, Haruka, et al.
Published: (2023)
by: Kiyohara, Haruka, et al.
Published: (2023)
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning
by: Yang, Tong, et al.
Published: (2023)
by: Yang, Tong, et al.
Published: (2023)
Guardian: Decoupling Exploration from Safety in Reinforcement Learning
by: Cai, Kaitong, et al.
Published: (2025)
by: Cai, Kaitong, et al.
Published: (2025)
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
by: Shakerinava, Mehran, et al.
Published: (2025)
by: Shakerinava, Mehran, et al.
Published: (2025)
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning
by: Liang, Zhenwen, et al.
Published: (2025)
by: Liang, Zhenwen, et al.
Published: (2025)
EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)
by: Bi, Jinhe, et al.
Published: (2026)
EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning
by: Shitanda, Naoki, et al.
Published: (2026)
by: Shitanda, Naoki, et al.
Published: (2026)
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
by: Melo, Luckeciano C., et al.
Published: (2025)
by: Melo, Luckeciano C., et al.
Published: (2025)
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning
by: Barakat, Anas, et al.
Published: (2024)
by: Barakat, Anas, et al.
Published: (2024)
PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods
by: Jeon, WooJae, et al.
Published: (2024)
by: Jeon, WooJae, et al.
Published: (2024)
Unbiased Gradient Low-Rank Projection
by: Pan, Rui, et al.
Published: (2025)
by: Pan, Rui, et al.
Published: (2025)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)
by: Bhatia, Abhinav, et al.
Published: (2023)
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
by: Madhow, Sunil, et al.
Published: (2023)
by: Madhow, Sunil, et al.
Published: (2023)
Reinforcement Learning by Guided Safe Exploration
by: Yang, Qisong, et al.
Published: (2023)
by: Yang, Qisong, et al.
Published: (2023)
Prune-OPD: Efficient and Reliable On-Policy Distillation for Long-Horizon Reasoning
by: Yang, Zhicheng, et al.
Published: (2026)
by: Yang, Zhicheng, et al.
Published: (2026)
Performative Policy Gradient: Optimality in Performative Reinforcement Learning
by: Basu, Debabrota, et al.
Published: (2025)
by: Basu, Debabrota, et al.
Published: (2025)
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
by: Liu, Jia, et al.
Published: (2025)
by: Liu, Jia, et al.
Published: (2025)
LLM-Explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven by Large Language Models
by: Hao, Qianyue, et al.
Published: (2025)
by: Hao, Qianyue, et al.
Published: (2025)
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
by: Lehmann, Matthias
Published: (2024)
by: Lehmann, Matthias
Published: (2024)
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
by: Wei, Zhepei, et al.
Published: (2025)
by: Wei, Zhepei, et al.
Published: (2025)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)
by: Liu, Shaoteng, et al.
Published: (2024)
Similar Items
-
Thresholded Lexicographic Ordered Multiobjective Reinforcement Learning
by: Tercan, Alperen, et al.
Published: (2024) -
Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
by: Nishimori, Soichiro, et al.
Published: (2026) -
ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation
by: Hou, Hongru, et al.
Published: (2026) -
From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning
by: Jiang, Xitai, et al.
Published: (2026) -
Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)