:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hao, Ruijie, Zhang, Longfei, Dai, Yang, Ma, Yang, Liang, Xingxing, Cheng, Guangquan
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.00977
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Single-Trajectory Distributionally Robust Reinforcement Learning
by: Liang, Zhipeng, et al.
Published: (2023)

Offline Trajectory Optimization for Offline Reinforcement Learning
by: Zhao, Ziqi, et al.
Published: (2024)

Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization
by: Liu, Shixuan, et al.
Published: (2024)

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?
by: Dai, Yang, et al.
Published: (2024)

Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)

A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points
by: Liu, Zihe, et al.
Published: (2024)

Path-Coupled Bellman Flows for Distributional Reinforcement Learning
by: Xu, Boyang, et al.
Published: (2026)

Flow-Based Policy for Online Reinforcement Learning
by: Lv, Lei, et al.
Published: (2025)

Offline Reinforcement Learning with Generative Trajectory Policies
by: Feng, Xinsong, et al.
Published: (2025)

Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning
by: Ma, Guoqing, et al.
Published: (2025)

Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
by: Gao, Chen-Xiao, et al.
Published: (2025)

Learning Robust Spectral Dynamics for Temporal Domain Generalization
by: Yu, En, et al.
Published: (2025)

Drift-aware Collaborative Assistance Mixture of Experts for Heterogeneous Multistream Learning
by: Yu, En, et al.
Published: (2025)

Generalized Incremental Learning under Concept Drift across Evolving Data Streams
by: Yu, En, et al.
Published: (2025)

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning
by: Ma, Yunchang, et al.
Published: (2025)

Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective
by: Duan, Tianyang, et al.
Published: (2025)

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
by: Kang, Hyungkyu, et al.
Published: (2025)

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
by: Hu, Jifeng, et al.
Published: (2025)

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
by: Liang, Hao, et al.
Published: (2022)

DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
by: Xie, Yaqi, et al.
Published: (2026)

Online Boosting Adaptive Learning under Concept Drift for Multistream Classification
by: Yu, En, et al.
Published: (2023)

StaRPO: Stability-Augmented Reinforcement Policy Optimization
by: Zhang, Jinghan, et al.
Published: (2026)

Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
by: Zhan, Simon Sinong, et al.
Published: (2025)

Maximum Entropy Reinforcement Learning with Diffusion Policy
by: Dong, Xiaoyi, et al.
Published: (2025)

A Variance-Reduced Cubic-Regularized Newton for Policy Optimization
by: Sun, Cheng, et al.
Published: (2025)

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
by: Doo, JaeHyeok, et al.
Published: (2026)

GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
by: Zhang, Han, et al.
Published: (2025)

IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization
by: Wang, Shuai, et al.
Published: (2026)

Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
by: Dai, Juntao, et al.
Published: (2024)

Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy
by: Cai, Ruichu, et al.
Published: (2024)

DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
by: Zhu, Dingwei, et al.
Published: (2025)

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
by: Tu, Songjun, et al.
Published: (2024)

Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
by: Ma, Hao, et al.
Published: (2026)

Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
by: Liu, Jinyi, et al.
Published: (2023)

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
by: Li, Jiawei, et al.
Published: (2024)

Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization
by: Liu, Zongkai, et al.
Published: (2024)

Quantile Geometry Regularization for Distributional Reinforcement Learning
by: Zhang, Zhaofan, et al.
Published: (2026)

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
by: Liu, Tenglong, et al.
Published: (2024)

Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
by: Yao, Yihang, et al.
Published: (2023)

Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies
by: Rietz, Finn, et al.
Published: (2024)