Saved in:
| Main Authors: | Chen, Xingguo, He, Zhiang, Shen, Yuchen, Yang, Shangdong, Li, Chao, Yang, Guang, Wang, Wenhao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28855 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction
by: Chen, Xingguo, et al.
Published: (2026)
by: Chen, Xingguo, et al.
Published: (2026)
A Variance Minimization Approach to Temporal-Difference Learning
by: Chen, Xingguo, et al.
Published: (2024)
by: Chen, Xingguo, et al.
Published: (2024)
Regularized Centered Emphatic Temporal Difference Learning
by: Chen, Xingguo, et al.
Published: (2026)
by: Chen, Xingguo, et al.
Published: (2026)
Bellman Error Centering
by: Chen, Xingguo, et al.
Published: (2025)
by: Chen, Xingguo, et al.
Published: (2025)
Bitboard version of Tetris AI
by: Chen, Xingguo, et al.
Published: (2026)
by: Chen, Xingguo, et al.
Published: (2026)
OpenGuanDan: A Large-Scale Imperfect Information Game Benchmark
by: Li, Chao, et al.
Published: (2026)
by: Li, Chao, et al.
Published: (2026)
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
by: Zhou, Hongyi, et al.
Published: (2025)
by: Zhou, Hongyi, et al.
Published: (2025)
Multi-Agent Reinforcement Learning with Communication-Constrained Priors
by: Yang, Guang, et al.
Published: (2025)
by: Yang, Guang, et al.
Published: (2025)
Knowledge is Power: Harnessing Large Language Models for Enhanced Cognitive Diagnosis
by: Dong, Zhiang, et al.
Published: (2025)
by: Dong, Zhiang, et al.
Published: (2025)
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
by: Zhang, Wenhao, et al.
Published: (2025)
by: Zhang, Wenhao, et al.
Published: (2025)
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
by: Meng, Wenjia, et al.
Published: (2024)
by: Meng, Wenjia, et al.
Published: (2024)
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
by: Ye, Chenlu, et al.
Published: (2026)
by: Ye, Chenlu, et al.
Published: (2026)
TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
by: Wang, Jiaqi, et al.
Published: (2026)
by: Wang, Jiaqi, et al.
Published: (2026)
ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm
by: Wang, Hanyong, et al.
Published: (2026)
by: Wang, Hanyong, et al.
Published: (2026)
Recursive Learning-Based Virtual Buffering for Analytical Global Placement
by: Kahng, Andrew B., et al.
Published: (2025)
by: Kahng, Andrew B., et al.
Published: (2025)
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation
by: Cao, Hongye, et al.
Published: (2025)
by: Cao, Hongye, et al.
Published: (2025)
Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models
by: Panchal, Utsav, et al.
Published: (2025)
by: Panchal, Utsav, et al.
Published: (2025)
Pessimistic Auxiliary Policy for Offline Reinforcement Learning
by: Zhang, Fan, et al.
Published: (2026)
by: Zhang, Fan, et al.
Published: (2026)
TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
by: Chen, Minghan, et al.
Published: (2025)
by: Chen, Minghan, et al.
Published: (2025)
Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction
by: Liu, Zhanwen, et al.
Published: (2024)
by: Liu, Zhanwen, et al.
Published: (2024)
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
by: Yao, Chaorui, et al.
Published: (2025)
by: Yao, Chaorui, et al.
Published: (2025)
Off-Policy Correction For Multi-Agent Reinforcement Learning
by: Zawalski, Michał, et al.
Published: (2021)
by: Zawalski, Michał, et al.
Published: (2021)
ROCKET: Residual-Oriented Multi-Layer Alignment for Spatially-Aware Vision-Language-Action Models
by: Sun, Guoheng, et al.
Published: (2026)
by: Sun, Guoheng, et al.
Published: (2026)
ORFS-agent: Tool-Using Agents for Chip Design Optimization
by: Ghose, Amur, et al.
Published: (2025)
by: Ghose, Amur, et al.
Published: (2025)
Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights
by: Liu, Yuchen, et al.
Published: (2025)
by: Liu, Yuchen, et al.
Published: (2025)
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
by: Liu, Zeyuan, et al.
Published: (2026)
by: Liu, Zeyuan, et al.
Published: (2026)
ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization
by: Zhan, Yang, et al.
Published: (2026)
by: Zhan, Yang, et al.
Published: (2026)
Uncertainty-Aware Crime Prediction With Spatial Temporal Multivariate Graph Neural Networks
by: Wang, Zepu, et al.
Published: (2024)
by: Wang, Zepu, et al.
Published: (2024)
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)
by: Huang, Luke J., et al.
Published: (2026)
Global Spatio-Temporal Fusion-based Traffic Prediction Algorithm with Anomaly Aware
by: Liu, Chaoqun, et al.
Published: (2024)
by: Liu, Chaoqun, et al.
Published: (2024)
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
by: Shen, Guobin, et al.
Published: (2026)
by: Shen, Guobin, et al.
Published: (2026)
CABTO: Context-Aware Behavior Tree Grounding for Robot Manipulation
by: Cai, Yishuai, et al.
Published: (2026)
by: Cai, Yishuai, et al.
Published: (2026)
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
by: Zheng, Ruijie, et al.
Published: (2024)
by: Zheng, Ruijie, et al.
Published: (2024)
Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation
by: Kuang, Yiling, et al.
Published: (2024)
by: Kuang, Yiling, et al.
Published: (2024)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
by: Ji, Tianying, et al.
Published: (2024)
by: Ji, Tianying, et al.
Published: (2024)
Richer Representations for Neural Algorithmic Reasoning via Auxiliary Reconstruction
by: Huang, Jiafu, et al.
Published: (2026)
by: Huang, Jiafu, et al.
Published: (2026)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Two Birds with One Stone: Enhancing Uncertainty Quantification and Interpretability with Graph Functional Neural Process
by: Kong, Lingkai, et al.
Published: (2025)
by: Kong, Lingkai, et al.
Published: (2025)
Similar Items
-
Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction
by: Chen, Xingguo, et al.
Published: (2026) -
A Variance Minimization Approach to Temporal-Difference Learning
by: Chen, Xingguo, et al.
Published: (2024) -
Regularized Centered Emphatic Temporal Difference Learning
by: Chen, Xingguo, et al.
Published: (2026) -
Bellman Error Centering
by: Chen, Xingguo, et al.
Published: (2025) -
Bitboard version of Tetris AI
by: Chen, Xingguo, et al.
Published: (2026)