:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Hu, Ranting
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.15654
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GAGPO: Generalized Advantage Grouped Policy Optimization
by: Zhu, Siyuan, et al.
Published: (2026)

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
by: Cetin, Edoardo, et al.
Published: (2024)

Advantage Collapse in Group Relative Policy Optimization: Diagnosis and Mitigation
by: He, Xixiang, et al.
Published: (2026)

FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning
by: Koirala, Prajwal, et al.
Published: (2024)

Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning
by: Chen, Xiaocong, et al.
Published: (2024)

REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization
by: Hu, Jian, et al.
Published: (2025)

Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents
by: Lee, Jane H., et al.
Published: (2025)

Skip-Connected Policy Optimization for Implicit Advantage
by: Teng, Fengwei, et al.
Published: (2026)

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
by: Ye, Chenlu, et al.
Published: (2022)

Path Learning with Trajectory Advantage Regression
by: Miyaguchi, Kohei
Published: (2025)

Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets
by: Pandian, Shriram Karpoora Sundara, et al.
Published: (2025)

Smooth Gate Functions for Soft Advantage Policy Optimization
by: Denisov, Egor, et al.
Published: (2026)

Policy Optimization via Adv2: Adversarial Learning on Advantage Functions
by: Jonckheere, Matthieu, et al.
Published: (2023)

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
by: He, Longxiang, et al.
Published: (2025)

Towards Flash Thinking via Decoupled Advantage Policy Optimization
by: Tan, Zezhong, et al.
Published: (2025)

Cascading Bandits Robust to Adversarial Corruptions
by: Xie, Jize, et al.
Published: (2025)

Robust Bayesian Optimisation with Unbounded Corruptions
by: Ezzerg, Abdelhamid, et al.
Published: (2025)

On Corruption-Robustness in Performative Reinforcement Learning
by: Pollatos, Vasilis, et al.
Published: (2025)

Corruption-Robust Lipschitz Contextual Search
by: Zuo, Shiliang
Published: (2023)

How to Allocate, How to Learn? Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization
by: Fang, Yangyi, et al.
Published: (2026)

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
by: Hau, Jia Lin, et al.
Published: (2022)

Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience
by: Hu, Zicheng, et al.
Published: (2025)

Risk-Averse Total-Reward Reinforcement Learning
by: Su, Xihong, et al.
Published: (2025)

Online Bayesian Risk-Averse Reinforcement Learning
by: Wang, Yuhao, et al.
Published: (2025)

Risk-Averse Certification of Bayesian Neural Networks
by: Zhang, Xiyue, et al.
Published: (2024)

RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback
by: Vatsa, Amitesh, et al.
Published: (2026)

Assessing Quantum Advantage for Gaussian Process Regression
by: Lowe, Dominic, et al.
Published: (2025)

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond
by: Hu, Zicheng, et al.
Published: (2025)

Linear Regression under Missing or Corrupted Coordinates
by: Diakonikolas, Ilias, et al.
Published: (2025)

Sparse Offline Reinforcement Learning with Corruption Robustness
by: Tran, Nam Phuong, et al.
Published: (2025)

On the Global Convergence of Risk-Averse Natural Policy Gradient Methods with Expected Conditional Risk Measures
by: Yu, Xian, et al.
Published: (2023)

Risk-Averse Reinforcement Learning with Itakura-Saito Loss
by: Udovichenko, Igor, et al.
Published: (2025)

Density-Based Algorithms for Corruption-Robust Contextual Search and Convex Optimization
by: Leme, Renato Paes, et al.
Published: (2022)

AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization
by: Sane, Soham
Published: (2025)

Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)

Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions
by: Ghaffari, Fatemeh, et al.
Published: (2024)

Robust Distribution Learning with Local and Global Adversarial Corruptions
by: Nietert, Sloan, et al.
Published: (2024)

Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)

Robust Kernel Hypothesis Testing under Data Corruption
by: Schrab, Antonin, et al.
Published: (2024)

Advantage Weighted Matching: Aligning RL with Pretraining in Diffusion Models
by: Xue, Shuchen, et al.
Published: (2025)