Saved in:
| Main Author: | Hu, Ranting |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.15654 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GAGPO: Generalized Advantage Grouped Policy Optimization
by: Zhu, Siyuan, et al.
Published: (2026)
by: Zhu, Siyuan, et al.
Published: (2026)
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
by: He, Xixiang, et al.
Published: (2026)
by: He, Xixiang, et al.
Published: (2026)
FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning
by: Koirala, Prajwal, et al.
Published: (2024)
by: Koirala, Prajwal, et al.
Published: (2024)
Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning
by: Chen, Xiaocong, et al.
Published: (2024)
by: Chen, Xiaocong, et al.
Published: (2024)
REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents
by: Lee, Jane H., et al.
Published: (2025)
by: Lee, Jane H., et al.
Published: (2025)
Skip-Connected Policy Optimization for Implicit Advantage
by: Teng, Fengwei, et al.
Published: (2026)
by: Teng, Fengwei, et al.
Published: (2026)
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
by: Ye, Chenlu, et al.
Published: (2022)
by: Ye, Chenlu, et al.
Published: (2022)
Path Learning with Trajectory Advantage Regression
by: Miyaguchi, Kohei
Published: (2025)
by: Miyaguchi, Kohei
Published: (2025)
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets
by: Pandian, Shriram Karpoora Sundara, et al.
Published: (2025)
by: Pandian, Shriram Karpoora Sundara, et al.
Published: (2025)
Smooth Gate Functions for Soft Advantage Policy Optimization
by: Denisov, Egor, et al.
Published: (2026)
by: Denisov, Egor, et al.
Published: (2026)
Policy Optimization via Adv2: Adversarial Learning on Advantage Functions
by: Jonckheere, Matthieu, et al.
Published: (2023)
by: Jonckheere, Matthieu, et al.
Published: (2023)
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
by: He, Longxiang, et al.
Published: (2025)
by: He, Longxiang, et al.
Published: (2025)
Towards Flash Thinking via Decoupled Advantage Policy Optimization
by: Tan, Zezhong, et al.
Published: (2025)
by: Tan, Zezhong, et al.
Published: (2025)
Cascading Bandits Robust to Adversarial Corruptions
by: Xie, Jize, et al.
Published: (2025)
by: Xie, Jize, et al.
Published: (2025)
Robust Bayesian Optimisation with Unbounded Corruptions
by: Ezzerg, Abdelhamid, et al.
Published: (2025)
by: Ezzerg, Abdelhamid, et al.
Published: (2025)
On Corruption-Robustness in Performative Reinforcement Learning
by: Pollatos, Vasilis, et al.
Published: (2025)
by: Pollatos, Vasilis, et al.
Published: (2025)
Corruption-Robust Lipschitz Contextual Search
by: Zuo, Shiliang
Published: (2023)
by: Zuo, Shiliang
Published: (2023)
How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization
by: Fang, Yangyi, et al.
Published: (2026)
by: Fang, Yangyi, et al.
Published: (2026)
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
by: Hau, Jia Lin, et al.
Published: (2022)
by: Hau, Jia Lin, et al.
Published: (2022)
Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience
by: Hu, Zicheng, et al.
Published: (2025)
by: Hu, Zicheng, et al.
Published: (2025)
Risk-Averse Total-Reward Reinforcement Learning
by: Su, Xihong, et al.
Published: (2025)
by: Su, Xihong, et al.
Published: (2025)
Online Bayesian Risk-Averse Reinforcement Learning
by: Wang, Yuhao, et al.
Published: (2025)
by: Wang, Yuhao, et al.
Published: (2025)
Risk-Averse Certification of Bayesian Neural Networks
by: Zhang, Xiyue, et al.
Published: (2024)
by: Zhang, Xiyue, et al.
Published: (2024)
RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback
by: Vatsa, Amitesh, et al.
Published: (2026)
by: Vatsa, Amitesh, et al.
Published: (2026)
Assessing Quantum Advantage for Gaussian Process Regression
by: Lowe, Dominic, et al.
Published: (2025)
by: Lowe, Dominic, et al.
Published: (2025)
A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond
by: Hu, Zicheng, et al.
Published: (2025)
by: Hu, Zicheng, et al.
Published: (2025)
Linear Regression under Missing or Corrupted Coordinates
by: Diakonikolas, Ilias, et al.
Published: (2025)
by: Diakonikolas, Ilias, et al.
Published: (2025)
Sparse Offline Reinforcement Learning with Corruption Robustness
by: Tran, Nam Phuong, et al.
Published: (2025)
by: Tran, Nam Phuong, et al.
Published: (2025)
On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures
by: Yu, Xian, et al.
Published: (2023)
by: Yu, Xian, et al.
Published: (2023)
Risk-Averse Reinforcement Learning with Itakura-Saito Loss
by: Udovichenko, Igor, et al.
Published: (2025)
by: Udovichenko, Igor, et al.
Published: (2025)
Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization
by: Leme, Renato Paes, et al.
Published: (2022)
by: Leme, Renato Paes, et al.
Published: (2022)
AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization
by: Sane, Soham
Published: (2025)
by: Sane, Soham
Published: (2025)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)
by: Brantley, Kianté, et al.
Published: (2025)
Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions
by: Ghaffari, Fatemeh, et al.
Published: (2024)
by: Ghaffari, Fatemeh, et al.
Published: (2024)
Robust Distribution Learning with Local and Global Adversarial Corruptions
by: Nietert, Sloan, et al.
Published: (2024)
by: Nietert, Sloan, et al.
Published: (2024)
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Robust Kernel Hypothesis Testing under Data Corruption
by: Schrab, Antonin, et al.
Published: (2024)
by: Schrab, Antonin, et al.
Published: (2024)
Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
by: Xue, Shuchen, et al.
Published: (2025)
by: Xue, Shuchen, et al.
Published: (2025)
Similar Items
-
GAGPO: Generalized Advantage Grouped Policy Optimization
by: Zhu, Siyuan, et al.
Published: (2026) -
Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
by: Cetin, Edoardo, et al.
Published: (2024) -
Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
by: He, Xixiang, et al.
Published: (2026) -
FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning
by: Koirala, Prajwal, et al.
Published: (2024) -
Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning
by: Chen, Xiaocong, et al.
Published: (2024)