Saved in:
| Main Authors: | Wu, Xiefeng, Hu, Mingyu, Zhang, Shu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.15761 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)
by: Wu, Xiefeng, et al.
Published: (2025)
Enhancing Q-Learning with Large Language Model Heuristics
by: Wu, Xiefeng
Published: (2024)
by: Wu, Xiefeng
Published: (2024)
From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
by: Wu, Xiefeng
Published: (2024)
by: Wu, Xiefeng
Published: (2024)
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
by: Ji, Tianying, et al.
Published: (2024)
by: Ji, Tianying, et al.
Published: (2024)
Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
by: Sen, Sayambhu, et al.
Published: (2025)
by: Sen, Sayambhu, et al.
Published: (2025)
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic
by: He, Jiamin, et al.
Published: (2026)
by: He, Jiamin, et al.
Published: (2026)
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation
by: Choe, Jean Seong Bjorn, et al.
Published: (2024)
by: Choe, Jean Seong Bjorn, et al.
Published: (2024)
Functional Critics Are Essential for Actor-Critic: From Off-Policy Stability to Efficient Exploration
by: Bai, Qinxun, et al.
Published: (2025)
by: Bai, Qinxun, et al.
Published: (2025)
Diffusion Actor-Critic with Entropy Regulator
by: Wang, Yinuo, et al.
Published: (2024)
by: Wang, Yinuo, et al.
Published: (2024)
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
by: Ji, Tianying, et al.
Published: (2023)
by: Ji, Tianying, et al.
Published: (2023)
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)
by: Luo, Yu, et al.
Published: (2024)
Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences
by: Singh, Nikhil Kumar, et al.
Published: (2024)
by: Singh, Nikhil Kumar, et al.
Published: (2024)
Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation
by: Nakanishi, Kosuke, et al.
Published: (2025)
by: Nakanishi, Kosuke, et al.
Published: (2025)
Distributional Soft Actor-Critic with Diffusion Policy
by: Liu, Tong, et al.
Published: (2025)
by: Liu, Tong, et al.
Published: (2025)
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
by: Zhai, Shaopeng, et al.
Published: (2025)
by: Zhai, Shaopeng, et al.
Published: (2025)
Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
by: Kohler, Hector, et al.
Published: (2023)
by: Kohler, Hector, et al.
Published: (2023)
Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic
by: Vo, Thanh Vinh, et al.
Published: (2025)
by: Vo, Thanh Vinh, et al.
Published: (2025)
Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation
by: Georgiev, Ignat, et al.
Published: (2024)
by: Georgiev, Ignat, et al.
Published: (2024)
Relative Importance Sampling for off-Policy Actor-Critic in Deep Reinforcement Learning
by: Humayoo, Mahammad, et al.
Published: (2018)
by: Humayoo, Mahammad, et al.
Published: (2018)
Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning
by: Zhuang, Yuan, et al.
Published: (2026)
by: Zhuang, Yuan, et al.
Published: (2026)
Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic
by: Lee, Jeong Woon, et al.
Published: (2026)
by: Lee, Jeong Woon, et al.
Published: (2026)
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning
by: Yang, Tong, et al.
Published: (2023)
by: Yang, Tong, et al.
Published: (2023)
CTSAC: Curriculum-Based Transformer Soft Actor-Critic for Goal-Oriented Robot Exploration
by: Yang, Chunyu, et al.
Published: (2025)
by: Yang, Chunyu, et al.
Published: (2025)
Flow Actor-Critic for Offline Reinforcement Learning
by: Chae, Jongseong, et al.
Published: (2026)
by: Chae, Jongseong, et al.
Published: (2026)
Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation
by: Dong, Yanjie, et al.
Published: (2024)
by: Dong, Yanjie, et al.
Published: (2024)
Learning Multi-Robot Coordination through Locality-Based Factorized Multi-Agent Actor-Critic Algorithm
by: Shek, Chak Lam, et al.
Published: (2025)
by: Shek, Chak Lam, et al.
Published: (2025)
Automated Design of Linear Bounding Functions for Sigmoidal Nonlinearities in Neural Networks
by: König, Matthias, et al.
Published: (2024)
by: König, Matthias, et al.
Published: (2024)
Multi-Robot Multi-Queue Control via Exhaustive Assignment Actor-Critic Learning
by: Merati, Mohammad, et al.
Published: (2026)
by: Merati, Mohammad, et al.
Published: (2026)
AdaWorldPolicy: World-Model-Driven Diffusion Policy with Online Adaptive Learning for Robotic Manipulation
by: Yuan, Ge, et al.
Published: (2026)
by: Yuan, Ge, et al.
Published: (2026)
Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic
by: Wang, Leo Muxing, et al.
Published: (2026)
by: Wang, Leo Muxing, et al.
Published: (2026)
Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound
by: Fiskus, Tal, et al.
Published: (2025)
by: Fiskus, Tal, et al.
Published: (2025)
Finite-time Convergence Analysis of Actor-Critic with Evolving Reward
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
Quantum Advantage Actor-Critic for Reinforcement Learning
by: Kölle, Michael, et al.
Published: (2024)
by: Kölle, Michael, et al.
Published: (2024)
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition
by: Manivannan, Sanjeev, et al.
Published: (2026)
by: Manivannan, Sanjeev, et al.
Published: (2026)
Offline Actor-Critic Reinforcement Learning Scales to Large Models
by: Springenberg, Jost Tobias, et al.
Published: (2024)
by: Springenberg, Jost Tobias, et al.
Published: (2024)
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World
by: Zhou, Zhiyuan, et al.
Published: (2025)
by: Zhou, Zhiyuan, et al.
Published: (2025)
Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
by: Yang, Jiarui, et al.
Published: (2025)
by: Yang, Jiarui, et al.
Published: (2025)
Asymmetric Actor-Critic for Multi-turn LLM Agents
by: Jiang, Shuli, et al.
Published: (2026)
by: Jiang, Shuli, et al.
Published: (2026)
Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function
by: Zheng, Hongye, et al.
Published: (2024)
by: Zheng, Hongye, et al.
Published: (2024)
Similar Items
-
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025) -
Enhancing Q-Learning with Large Language Model Heuristics
by: Wu, Xiefeng
Published: (2024) -
From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
by: Wu, Xiefeng
Published: (2024) -
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
by: Ji, Tianying, et al.
Published: (2024) -
Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
by: Sen, Sayambhu, et al.
Published: (2025)