Saved in:
| Main Authors: | Wang, Guojian, Wu, Faguo, Zhang, Xiao, Liu, Jianxiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.04539 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations
by: Wang, Guojian, et al.
Published: (2023)
by: Wang, Guojian, et al.
Published: (2023)
Trajectory-Oriented Policy Optimization with Sparse Rewards
by: Wang, Guojian, et al.
Published: (2024)
by: Wang, Guojian, et al.
Published: (2024)
Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
by: Yao, Qingmao, et al.
Published: (2025)
by: Yao, Qingmao, et al.
Published: (2025)
Preference-Guided Reinforcement Learning for Efficient Exploration
by: Wang, Guojian, et al.
Published: (2024)
by: Wang, Guojian, et al.
Published: (2024)
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
by: Wang, Guojian, et al.
Published: (2023)
by: Wang, Guojian, et al.
Published: (2023)
Data Fusion-Enhanced Decision Transformer for Stable Cross-Domain Generalization
by: Wang, Guojian, et al.
Published: (2025)
by: Wang, Guojian, et al.
Published: (2025)
Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation
by: Zhan, Guojian, et al.
Published: (2026)
by: Zhan, Guojian, et al.
Published: (2026)
FFHFlow: Diverse and Uncertainty-Aware Dexterous Grasp Generation via Flow Variational Inference
by: Feng, Qian, et al.
Published: (2024)
by: Feng, Qian, et al.
Published: (2024)
Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios
by: Zhang, Feihong, et al.
Published: (2025)
by: Zhang, Feihong, et al.
Published: (2025)
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
by: Chen, Xuyang, et al.
Published: (2025)
by: Chen, Xuyang, et al.
Published: (2025)
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
Reinforcement Learning with Curriculum-inspired Adaptive Direct Policy Guidance for Truck Dispatching
by: Meng, Shi, et al.
Published: (2025)
by: Meng, Shi, et al.
Published: (2025)
Distributional Soft Actor-Critic with Diffusion Policy
by: Liu, Tong, et al.
Published: (2025)
by: Liu, Tong, et al.
Published: (2025)
When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning
by: Liu, Xiaogeng, et al.
Published: (2026)
by: Liu, Xiaogeng, et al.
Published: (2026)
Self-Pro: A Self-Prompt and Tuning Framework for Graph Neural Networks
by: Gong, Chenghua, et al.
Published: (2023)
by: Gong, Chenghua, et al.
Published: (2023)
Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance
by: Du, Weitao
Published: (2026)
by: Du, Weitao
Published: (2026)
Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance
by: He, Jinmin, et al.
Published: (2025)
by: He, Jinmin, et al.
Published: (2025)
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration
by: Yuan, Xiaoyang, et al.
Published: (2025)
by: Yuan, Xiaoyang, et al.
Published: (2025)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Soft Sequence Policy Optimization
by: Glazyrina, Svetlana, et al.
Published: (2026)
by: Glazyrina, Svetlana, et al.
Published: (2026)
Learning Policy Committees for Effective Personalization in MDPs with Diverse Tasks
by: Ge, Luise, et al.
Published: (2025)
by: Ge, Luise, et al.
Published: (2025)
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)
by: Cohen, Taco, et al.
Published: (2025)
Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
by: He, Yufei, et al.
Published: (2025)
by: He, Yufei, et al.
Published: (2025)
Behavior-Regularized Diffusion Policy Optimization for Offline Reinforcement Learning
by: Gao, Chen-Xiao, et al.
Published: (2025)
by: Gao, Chen-Xiao, et al.
Published: (2025)
Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic
by: Vo, Thanh Vinh, et al.
Published: (2025)
by: Vo, Thanh Vinh, et al.
Published: (2025)
Precision over Diversity: High-Precision Reward Generalizes to Robust Instruction Following
by: Zeng, Yirong, et al.
Published: (2026)
by: Zeng, Yirong, et al.
Published: (2026)
FedRD: Reducing Divergences for Generalized Federated Learning via Heterogeneity-aware Parameter Guidance
by: Wang, Kaile, et al.
Published: (2026)
by: Wang, Kaile, et al.
Published: (2026)
A Lightweight Framework for Trigger-Guided LoRA-Based Self-Adaptation in LLMs
by: Wei, Jiacheng, et al.
Published: (2025)
by: Wei, Jiacheng, et al.
Published: (2025)
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
by: Zhang, Junan, et al.
Published: (2025)
by: Zhang, Junan, et al.
Published: (2025)
GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning
by: Liu, Ziru, et al.
Published: (2025)
by: Liu, Ziru, et al.
Published: (2025)
Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning
by: Yang, Hanlin, et al.
Published: (2024)
by: Yang, Hanlin, et al.
Published: (2024)
Soft Adaptive Policy Optimization
by: Gao, Chang, et al.
Published: (2025)
by: Gao, Chang, et al.
Published: (2025)
Adaptive Guidance for Local Training in Heterogeneous Federated Learning
by: Zhang, Jianqing, et al.
Published: (2024)
by: Zhang, Jianqing, et al.
Published: (2024)
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO
by: Ren, Yiming, et al.
Published: (2026)
by: Ren, Yiming, et al.
Published: (2026)
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning
by: Batra, Sumeet, et al.
Published: (2023)
by: Batra, Sumeet, et al.
Published: (2023)
Soft Deterministic Policy Gradient with Gaussian Smoothing
by: Na, Hyunjun, et al.
Published: (2026)
by: Na, Hyunjun, et al.
Published: (2026)
Deconstructing Generative Diversity: An Information Bottleneck Analysis of Discrete Latent Generative Models
by: Wu, Yudi, et al.
Published: (2025)
by: Wu, Yudi, et al.
Published: (2025)
LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies
by: Sun, Ximan, et al.
Published: (2025)
by: Sun, Ximan, et al.
Published: (2025)
GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning
by: Lee, Jaewoo, et al.
Published: (2024)
by: Lee, Jaewoo, et al.
Published: (2024)
Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning
by: Reizinger, Patrik, et al.
Published: (2025)
by: Reizinger, Patrik, et al.
Published: (2025)
Similar Items
-
Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations
by: Wang, Guojian, et al.
Published: (2023) -
Trajectory-Oriented Policy Optimization with Sparse Rewards
by: Wang, Guojian, et al.
Published: (2024) -
Offline RL with Smooth OOD Generalization in Convex Hull and its Neighborhood
by: Yao, Qingmao, et al.
Published: (2025) -
Preference-Guided Reinforcement Learning for Efficient Exploration
by: Wang, Guojian, et al.
Published: (2024) -
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
by: Wang, Guojian, et al.
Published: (2023)