Saved in:
| Main Authors: | Gan, Yaozhong, Yan, Renye, Tan, Xiaoyang, Wu, Zhe, Xing, Junliang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.03894 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reflective Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024)
by: Gan, Yaozhong, et al.
Published: (2024)
AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning
by: Yan, Renye, et al.
Published: (2024)
by: Yan, Renye, et al.
Published: (2024)
The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
by: Yan, Renye, et al.
Published: (2024)
by: Yan, Renye, et al.
Published: (2024)
ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm
by: Wang, Hanyong, et al.
Published: (2026)
by: Wang, Hanyong, et al.
Published: (2026)
MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning
by: Wu, Cuiling, et al.
Published: (2025)
by: Wu, Cuiling, et al.
Published: (2025)
Beyond the Boundaries of Proximal Policy Optimization
by: Tan, Charlie B., et al.
Published: (2024)
by: Tan, Charlie B., et al.
Published: (2024)
ESPO: Early-Stopping Proximal Policy Optimization
by: Li, Zihang, et al.
Published: (2026)
by: Li, Zihang, et al.
Published: (2026)
On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization
by: Shankar, Kaaustaaub, et al.
Published: (2025)
by: Shankar, Kaaustaaub, et al.
Published: (2025)
Central Path Proximal Policy Optimization
by: Milosevic, Nikola, et al.
Published: (2025)
by: Milosevic, Nikola, et al.
Published: (2025)
Reparameterization Proximal Policy Optimization
by: Zhong, Hai, et al.
Published: (2025)
by: Zhong, Hai, et al.
Published: (2025)
Diffusion Policy through Conditional Proximal Policy Optimization
by: Liu, Ben, et al.
Published: (2026)
by: Liu, Ben, et al.
Published: (2026)
Deep Gaussian Process Proximal Policy Optimization
by: van der Lende, Matthijs, et al.
Published: (2025)
by: van der Lende, Matthijs, et al.
Published: (2025)
Actor-Critic Pretraining for Proximal Policy Optimization
by: Kernbach, Andreas, et al.
Published: (2026)
by: Kernbach, Andreas, et al.
Published: (2026)
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
by: Shen, Guobin, et al.
Published: (2026)
by: Shen, Guobin, et al.
Published: (2026)
Proximal Policy Optimization with Adaptive Exploration
by: Lixandru, Andrei
Published: (2024)
by: Lixandru, Andrei
Published: (2024)
Complexity-Regularized Proximal Policy Optimization
by: Serfilippi, Luca, et al.
Published: (2025)
by: Serfilippi, Luca, et al.
Published: (2025)
RePO: Bridging On-Policy Learning and Off-Policy Knowledge through Rephrasing Policy Optimization
by: Xia, Linxuan, et al.
Published: (2026)
by: Xia, Linxuan, et al.
Published: (2026)
Hindsight Experience Replay Accelerates Proximal Policy Optimization
by: Crowder, Douglas C., et al.
Published: (2024)
by: Crowder, Douglas C., et al.
Published: (2024)
Token-level Proximal Policy Optimization for Query Generation
by: Ouyang, Yichen, et al.
Published: (2024)
by: Ouyang, Yichen, et al.
Published: (2024)
Match or Replay: Self Imitating Proximal Policy Optimization
by: Chaudhary, Gaurav, et al.
Published: (2026)
by: Chaudhary, Gaurav, et al.
Published: (2026)
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
by: Zhang, Yuheng, et al.
Published: (2025)
by: Zhang, Yuheng, et al.
Published: (2025)
KIPPO: Koopman-Inspired Proximal Policy Optimization
by: Cozma, Andrei, et al.
Published: (2025)
by: Cozma, Andrei, et al.
Published: (2025)
Revisiting Group Relative Policy Optimization: Insights into On-Policy and Off-Policy Training
by: Mroueh, Youssef, et al.
Published: (2025)
by: Mroueh, Youssef, et al.
Published: (2025)
BodyGen: Advancing Towards Efficient Embodiment Co-Design
by: Lu, Haofei, et al.
Published: (2025)
by: Lu, Haofei, et al.
Published: (2025)
Learning Branching Policies for MILPs with Proximal Policy Optimization
by: Mhamed, Abdelouahed Ben, et al.
Published: (2025)
by: Mhamed, Abdelouahed Ben, et al.
Published: (2025)
Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction
by: Kiyohara, Haruka, et al.
Published: (2024)
by: Kiyohara, Haruka, et al.
Published: (2024)
ERPPO: Entropy Regularization-based Proximal Policy Optimization
by: Lee, Changha, et al.
Published: (2026)
by: Lee, Changha, et al.
Published: (2026)
Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization
by: Akgül, Abdullah, et al.
Published: (2025)
by: Akgül, Abdullah, et al.
Published: (2025)
BiBLDR: Bidirectional Behavior Learning for Drug Repositioning
by: Zhang, Renye, et al.
Published: (2025)
by: Zhang, Renye, et al.
Published: (2025)
PAC Off-Policy Prediction of Contextual Bandits
by: Wan, Yilong, et al.
Published: (2025)
by: Wan, Yilong, et al.
Published: (2025)
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)
by: Cohen, Taco, et al.
Published: (2025)
On the Reuse Bias in Off-Policy Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2022)
by: Ying, Chengyang, et al.
Published: (2022)
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off
by: Li, Xiaofan, et al.
Published: (2026)
by: Li, Xiaofan, et al.
Published: (2026)
Wasserstein Proximal Policy Gradient
by: Zhu, Zhaoyu, et al.
Published: (2026)
by: Zhu, Zhaoyu, et al.
Published: (2026)
Proximal Policy Distillation
by: Spigler, Giacomo
Published: (2024)
by: Spigler, Giacomo
Published: (2024)
Fairness Aware Reinforcement Learning via Proximal Policy Optimization
by: La Malfa, Gabriele, et al.
Published: (2025)
by: La Malfa, Gabriele, et al.
Published: (2025)
Pessimistic Off-Policy Optimization for Learning to Rank
by: Cief, Matej, et al.
Published: (2022)
by: Cief, Matej, et al.
Published: (2022)
ISOPO: Proximal policy gradients without pi-old
by: Abrahamsen, Nilin
Published: (2025)
by: Abrahamsen, Nilin
Published: (2025)
Eval-PPO: Building an Efficient Threat Evaluator Using Proximal Policy Optimization
by: Sun, Wuzhou, et al.
Published: (2025)
by: Sun, Wuzhou, et al.
Published: (2025)
Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning
by: Behnamnia, Armin, et al.
Published: (2025)
by: Behnamnia, Armin, et al.
Published: (2025)
Similar Items
-
Reflective Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024) -
AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning
by: Yan, Renye, et al.
Published: (2024) -
The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
by: Yan, Renye, et al.
Published: (2024) -
ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm
by: Wang, Hanyong, et al.
Published: (2026) -
MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning
by: Wu, Cuiling, et al.
Published: (2025)