Saved in:
| Main Authors: | Chaudhary, Gaurav, Behera, Laxmidhar, Mondal, Washim Uddin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.27515 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning
by: Chaudhary, Gaurav, et al.
Published: (2025)
by: Chaudhary, Gaurav, et al.
Published: (2025)
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
by: Chaudhary, Gaurav, et al.
Published: (2025)
by: Chaudhary, Gaurav, et al.
Published: (2025)
TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning
by: Chaudhary, Gaurav, et al.
Published: (2025)
by: Chaudhary, Gaurav, et al.
Published: (2025)
Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
by: Mondal, Washim Uddin, et al.
Published: (2024)
by: Mondal, Washim Uddin, et al.
Published: (2024)
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
by: Mondal, Washim Uddin, et al.
Published: (2023)
by: Mondal, Washim Uddin, et al.
Published: (2023)
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
by: Xu, Yang, et al.
Published: (2025)
by: Xu, Yang, et al.
Published: (2025)
Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm
by: Bai, Qinbo, et al.
Published: (2024)
by: Bai, Qinbo, et al.
Published: (2024)
Order-Optimal Regret with Novel Policy Gradient Approaches in Infinite-Horizon Average Reward MDPs
by: Ganesh, Swetha, et al.
Published: (2024)
by: Ganesh, Swetha, et al.
Published: (2024)
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes
by: Bai, Qinbo, et al.
Published: (2023)
by: Bai, Qinbo, et al.
Published: (2023)
Sample-Efficient Constrained Reinforcement Learning with General Parameterization
by: Mondal, Washim Uddin, et al.
Published: (2024)
by: Mondal, Washim Uddin, et al.
Published: (2024)
A Sharper Global Convergence Analysis for Average Reward Reinforcement Learning via an Actor-Critic Approach
by: Ganesh, Swetha, et al.
Published: (2024)
by: Ganesh, Swetha, et al.
Published: (2024)
Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization
by: Satheesh, Anirudh, et al.
Published: (2026)
by: Satheesh, Anirudh, et al.
Published: (2026)
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms
by: Aggarwal, Vaneet, et al.
Published: (2024)
by: Aggarwal, Vaneet, et al.
Published: (2024)
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)
by: Mondal, Washim Uddin, et al.
Published: (2022)
by: Mondal, Washim Uddin, et al.
Published: (2022)
Hindsight Experience Replay Accelerates Proximal Policy Optimization
by: Crowder, Douglas C., et al.
Published: (2024)
by: Crowder, Douglas C., et al.
Published: (2024)
Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
by: Xu, Yang, et al.
Published: (2025)
by: Xu, Yang, et al.
Published: (2025)
Dynamic Hand Gesture Recognition for Robot Manipulator Tasks
by: Sharma, Dharmendra, et al.
Published: (2026)
by: Sharma, Dharmendra, et al.
Published: (2026)
Governance-as-a-Service: A Multi-Agent Framework for AI System Compliance and Policy Enforcement
by: Gaurav, Suyash, et al.
Published: (2025)
by: Gaurav, Suyash, et al.
Published: (2025)
Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning
by: Yoo, Jason, et al.
Published: (2024)
by: Yoo, Jason, et al.
Published: (2024)
Variance Reduction Based Experience Replay for Policy Optimization
by: Zheng, Hua, et al.
Published: (2026)
by: Zheng, Hua, et al.
Published: (2026)
RePO: Replay-Enhanced Policy Optimization
by: Li, Siheng, et al.
Published: (2025)
by: Li, Siheng, et al.
Published: (2025)
On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization
by: Shankar, Kaaustaaub, et al.
Published: (2025)
by: Shankar, Kaaustaaub, et al.
Published: (2025)
Central Path Proximal Policy Optimization
by: Milosevic, Nikola, et al.
Published: (2025)
by: Milosevic, Nikola, et al.
Published: (2025)
Reparameterization Proximal Policy Optimization
by: Zhong, Hai, et al.
Published: (2025)
by: Zhong, Hai, et al.
Published: (2025)
Combined Peak Reduction and Self-Consumption Using Proximal Policy Optimization
by: Peirelinck, Thijs, et al.
Published: (2022)
by: Peirelinck, Thijs, et al.
Published: (2022)
Diffusion Policy through Conditional Proximal Policy Optimization
by: Liu, Ben, et al.
Published: (2026)
by: Liu, Ben, et al.
Published: (2026)
Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay
by: Marek, Martin, et al.
Published: (2026)
by: Marek, Martin, et al.
Published: (2026)
Actor-Critic Pretraining for Proximal Policy Optimization
by: Kernbach, Andreas, et al.
Published: (2026)
by: Kernbach, Andreas, et al.
Published: (2026)
Transductive Off-policy Proximal Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024)
by: Gan, Yaozhong, et al.
Published: (2024)
Deep Gaussian Process Proximal Policy Optimization
by: van der Lende, Matthijs, et al.
Published: (2025)
by: van der Lende, Matthijs, et al.
Published: (2025)
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
by: Batra, Sumeet, et al.
Published: (2023)
by: Batra, Sumeet, et al.
Published: (2023)
Complexity-Regularized Proximal Policy Optimization
by: Serfilippi, Luca, et al.
Published: (2025)
by: Serfilippi, Luca, et al.
Published: (2025)
Beyond the Boundaries of Proximal Policy Optimization
by: Tan, Charlie B., et al.
Published: (2024)
by: Tan, Charlie B., et al.
Published: (2024)
Proximal Policy Optimization with Adaptive Exploration
by: Lixandru, Andrei
Published: (2024)
by: Lixandru, Andrei
Published: (2024)
Token-level Proximal Policy Optimization for Query Generation
by: Ouyang, Yichen, et al.
Published: (2024)
by: Ouyang, Yichen, et al.
Published: (2024)
ESPO: Early-Stopping Proximal Policy Optimization
by: Li, Zihang, et al.
Published: (2026)
by: Li, Zihang, et al.
Published: (2026)
Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning
by: Chiang, Chia-Cheng, et al.
Published: (2024)
by: Chiang, Chia-Cheng, et al.
Published: (2024)
KIPPO: Koopman-Inspired Proximal Policy Optimization
by: Cozma, Andrei, et al.
Published: (2025)
by: Cozma, Andrei, et al.
Published: (2025)
Learning Branching Policies for MILPs with Proximal Policy Optimization
by: Mhamed, Abdelouahed Ben, et al.
Published: (2025)
by: Mhamed, Abdelouahed Ben, et al.
Published: (2025)
ERPPO: Entropy Regularization-based Proximal Policy Optimization
by: Lee, Changha, et al.
Published: (2026)
by: Lee, Changha, et al.
Published: (2026)
Similar Items
-
From Novelty to Imitation: Self-Distilled Rewards for Offline Reinforcement Learning
by: Chaudhary, Gaurav, et al.
Published: (2025) -
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning
by: Chaudhary, Gaurav, et al.
Published: (2025) -
TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning
by: Chaudhary, Gaurav, et al.
Published: (2025) -
Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
by: Mondal, Washim Uddin, et al.
Published: (2024) -
Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
by: Mondal, Washim Uddin, et al.
Published: (2023)