Saved in:
| Main Authors: | Hao, Ruijie, Zhang, Longfei, Dai, Yang, Ma, Yang, Liang, Xingxing, Cheng, Guangquan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.00977 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Single-Trajectory Distributionally Robust Reinforcement Learning
by: Liang, Zhipeng, et al.
Published: (2023)
by: Liang, Zhipeng, et al.
Published: (2023)
Offline Trajectory Optimization for Offline Reinforcement Learning
by: Zhao, Ziqi, et al.
Published: (2024)
by: Zhao, Ziqi, et al.
Published: (2024)
Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization
by: Liu, Shixuan, et al.
Published: (2024)
by: Liu, Shixuan, et al.
Published: (2024)
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?
by: Dai, Yang, et al.
Published: (2024)
by: Dai, Yang, et al.
Published: (2024)
Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)
by: Hu, Hao, et al.
Published: (2025)
A Behavior-Aware Approach for Deep Reinforcement Learning in Non-stationary Environments without Known Change Points
by: Liu, Zihe, et al.
Published: (2024)
by: Liu, Zihe, et al.
Published: (2024)
Path-Coupled Bellman Flows for Distributional Reinforcement Learning
by: Xu, Boyang, et al.
Published: (2026)
by: Xu, Boyang, et al.
Published: (2026)
Flow-Based Policy for Online Reinforcement Learning
by: Lv, Lei, et al.
Published: (2025)
by: Lv, Lei, et al.
Published: (2025)
Offline Reinforcement Learning with Generative Trajectory Policies
by: Feng, Xinsong, et al.
Published: (2025)
by: Feng, Xinsong, et al.
Published: (2025)
Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning
by: Ma, Guoqing, et al.
Published: (2025)
by: Ma, Guoqing, et al.
Published: (2025)
Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
by: Gao, Chen-Xiao, et al.
Published: (2025)
by: Gao, Chen-Xiao, et al.
Published: (2025)
Learning Robust Spectral Dynamics for Temporal Domain Generalization
by: Yu, En, et al.
Published: (2025)
by: Yu, En, et al.
Published: (2025)
Drift-aware Collaborative Assistance Mixture of Experts for Heterogeneous Multistream Learning
by: Yu, En, et al.
Published: (2025)
by: Yu, En, et al.
Published: (2025)
Generalized Incremental Learning under Concept Drift across Evolving Data Streams
by: Yu, En, et al.
Published: (2025)
by: Yu, En, et al.
Published: (2025)
Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning
by: Ma, Yunchang, et al.
Published: (2025)
by: Ma, Yunchang, et al.
Published: (2025)
Rethinking Adversarial Attacks in Reinforcement Learning from Policy Distribution Perspective
by: Duan, Tianyang, et al.
Published: (2025)
by: Duan, Tianyang, et al.
Published: (2025)
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
by: Kang, Hyungkyu, et al.
Published: (2025)
by: Kang, Hyungkyu, et al.
Published: (2025)
Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning
by: Hu, Jifeng, et al.
Published: (2025)
by: Hu, Jifeng, et al.
Published: (2025)
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
by: Liang, Hao, et al.
Published: (2022)
by: Liang, Hao, et al.
Published: (2022)
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
by: Xie, Yaqi, et al.
Published: (2026)
by: Xie, Yaqi, et al.
Published: (2026)
Online Boosting Adaptive Learning under Concept Drift for Multistream Classification
by: Yu, En, et al.
Published: (2023)
by: Yu, En, et al.
Published: (2023)
StaRPO: Stability-Augmented Reinforcement Policy Optimization
by: Zhang, Jinghan, et al.
Published: (2026)
by: Zhang, Jinghan, et al.
Published: (2026)
Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
by: Zhan, Simon Sinong, et al.
Published: (2025)
by: Zhan, Simon Sinong, et al.
Published: (2025)
Maximum Entropy Reinforcement Learning with Diffusion Policy
by: Dong, Xiaoyi, et al.
Published: (2025)
by: Dong, Xiaoyi, et al.
Published: (2025)
A Variance-Reduced Cubic-Regularized Newton for Policy Optimization
by: Sun, Cheng, et al.
Published: (2025)
by: Sun, Cheng, et al.
Published: (2025)
Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
by: Doo, JaeHyeok, et al.
Published: (2026)
by: Doo, JaeHyeok, et al.
Published: (2026)
GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning
by: Zhang, Han, et al.
Published: (2025)
by: Zhang, Han, et al.
Published: (2025)
IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization
by: Wang, Shuai, et al.
Published: (2026)
by: Wang, Shuai, et al.
Published: (2026)
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
by: Dai, Juntao, et al.
Published: (2024)
by: Dai, Juntao, et al.
Published: (2024)
Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy
by: Cai, Ruichu, et al.
Published: (2024)
by: Cai, Ruichu, et al.
Published: (2024)
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
by: Zhu, Dingwei, et al.
Published: (2025)
by: Zhu, Dingwei, et al.
Published: (2025)
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
by: Tu, Songjun, et al.
Published: (2024)
by: Tu, Songjun, et al.
Published: (2024)
Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner
by: Ma, Hao, et al.
Published: (2026)
by: Ma, Hao, et al.
Published: (2026)
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
by: Liu, Jinyi, et al.
Published: (2023)
by: Liu, Jinyi, et al.
Published: (2023)
PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
by: Li, Jiawei, et al.
Published: (2024)
by: Li, Jiawei, et al.
Published: (2024)
Offline Multi-Agent Reinforcement Learning via In-Sample Sequential Policy Optimization
by: Liu, Zongkai, et al.
Published: (2024)
by: Liu, Zongkai, et al.
Published: (2024)
Quantile Geometry Regularization for Distributional Reinforcement Learning
by: Zhang, Zhaofan, et al.
Published: (2026)
by: Zhang, Zhaofan, et al.
Published: (2026)
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
by: Liu, Tenglong, et al.
Published: (2024)
by: Liu, Tenglong, et al.
Published: (2024)
Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
by: Yao, Yihang, et al.
Published: (2023)
by: Yao, Yihang, et al.
Published: (2023)
Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies
by: Rietz, Finn, et al.
Published: (2024)
by: Rietz, Finn, et al.
Published: (2024)
Similar Items
-
Single-Trajectory Distributionally Robust Reinforcement Learning
by: Liang, Zhipeng, et al.
Published: (2023) -
Offline Trajectory Optimization for Offline Reinforcement Learning
by: Zhao, Ziqi, et al.
Published: (2024) -
Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization
by: Liu, Shixuan, et al.
Published: (2024) -
Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?
by: Dai, Yang, et al.
Published: (2024) -
Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)