Saved in:
| Main Authors: | Xue, Wanqi, An, Bo, Yan, Shuicheng, Xu, Zhongwen |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2301.11774 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning to Optimize for Reinforcement Learning
by: Lan, Qingfeng, et al.
Published: (2023)
by: Lan, Qingfeng, et al.
Published: (2023)
Mutual Information Regularized Offline Reinforcement Learning
by: Ma, Xiao, et al.
Published: (2022)
by: Ma, Xiao, et al.
Published: (2022)
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning
by: Xue, Zhenghai, et al.
Published: (2025)
by: Xue, Zhenghai, et al.
Published: (2025)
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
by: Xu, Xiangyu, et al.
Published: (2024)
by: Xu, Xiangyu, et al.
Published: (2024)
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
by: Zhai, Yuanzhao, et al.
Published: (2023)
by: Zhai, Yuanzhao, et al.
Published: (2023)
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024)
by: Shani, Lior, et al.
Published: (2024)
CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
by: Mu, Ni, et al.
Published: (2025)
by: Mu, Ni, et al.
Published: (2025)
Multi-Type Preference Learning: Empowering Preference-Based Reinforcement Learning with Equal Preferences
by: Liu, Ziang, et al.
Published: (2024)
by: Liu, Ziang, et al.
Published: (2024)
Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
by: Wang, Sai, et al.
Published: (2025)
by: Wang, Sai, et al.
Published: (2025)
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Single-stream Policy Optimization
by: Xu, Zhongwen, et al.
Published: (2025)
by: Xu, Zhongwen, et al.
Published: (2025)
Understanding Tool-Integrated Reasoning
by: Lin, Heng, et al.
Published: (2025)
by: Lin, Heng, et al.
Published: (2025)
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
by: Luan, Yao, et al.
Published: (2025)
by: Luan, Yao, et al.
Published: (2025)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)
by: Hong, Ilgee, et al.
Published: (2024)
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
by: Ye, Chenlu, et al.
Published: (2024)
by: Ye, Chenlu, et al.
Published: (2024)
OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
by: Yang, Yiqin, et al.
Published: (2026)
by: Yang, Yiqin, et al.
Published: (2026)
Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback
by: Kim, Gihoon, et al.
Published: (2026)
by: Kim, Gihoon, et al.
Published: (2026)
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
by: Poddar, Sriyash, et al.
Published: (2024)
by: Poddar, Sriyash, et al.
Published: (2024)
MoH: Multi-Head Attention as Mixture-of-Head Attention
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference
by: Cercola, Matteo, et al.
Published: (2025)
by: Cercola, Matteo, et al.
Published: (2025)
Preference Conditioned Multi-Objective Reinforcement Learning: Decomposed, Diversity-Driven Policy Optimization
by: Ambadkar, Tanmay, et al.
Published: (2026)
by: Ambadkar, Tanmay, et al.
Published: (2026)
Combinatorial Reinforcement Learning with Preference Feedback
by: Lee, Joongkyu, et al.
Published: (2025)
by: Lee, Joongkyu, et al.
Published: (2025)
Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
by: Gao, Chen-Xiao, et al.
Published: (2024)
by: Gao, Chen-Xiao, et al.
Published: (2024)
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
by: Bai, Fengshuo, et al.
Published: (2024)
by: Bai, Fengshuo, et al.
Published: (2024)
General Preference Reinforcement Learning
by: Umer, Muhammad, et al.
Published: (2026)
by: Umer, Muhammad, et al.
Published: (2026)
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback
by: Zhong, Huiying, et al.
Published: (2024)
by: Zhong, Huiying, et al.
Published: (2024)
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
by: Ghosh, Udita, et al.
Published: (2025)
by: Ghosh, Udita, et al.
Published: (2025)
Reinforcement Learning from Adversarial Preferences in Tabular MDPs
by: Tsuchiya, Taira, et al.
Published: (2025)
by: Tsuchiya, Taira, et al.
Published: (2025)
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
Preference-based Multi-Objective Reinforcement Learning
by: Mu, Ni, et al.
Published: (2025)
by: Mu, Ni, et al.
Published: (2025)
Preference-Guided Reinforcement Learning for Efficient Exploration
by: Wang, Guojian, et al.
Published: (2024)
by: Wang, Guojian, et al.
Published: (2024)
Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
by: Xu, Yinglun, et al.
Published: (2024)
by: Xu, Yinglun, et al.
Published: (2024)
Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations
by: Woo, Jiin, et al.
Published: (2025)
by: Woo, Jiin, et al.
Published: (2025)
Preference Elicitation for Offline Reinforcement Learning
by: Pace, Alizée, et al.
Published: (2024)
by: Pace, Alizée, et al.
Published: (2024)
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
by: Liu, Wenpu, et al.
Published: (2026)
by: Liu, Wenpu, et al.
Published: (2026)
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks
by: Xu, Ziping, et al.
Published: (2024)
by: Xu, Ziping, et al.
Published: (2024)
From Reward-Free Representations to Preferences: Rethinking Offline Preference-Based Reinforcement Learning
by: Yang, Jun-Jie, et al.
Published: (2026)
by: Yang, Jun-Jie, et al.
Published: (2026)
Discriminative Entropy Clustering and its Relation to K-means and SVM
by: Zhang, Zhongwen, et al.
Published: (2023)
by: Zhang, Zhongwen, et al.
Published: (2023)
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
by: Cheng, Jie, et al.
Published: (2024)
by: Cheng, Jie, et al.
Published: (2024)
Query-Policy Misalignment in Preference-Based Reinforcement Learning
by: Hu, Xiao, et al.
Published: (2023)
by: Hu, Xiao, et al.
Published: (2023)
Similar Items
-
Learning to Optimize for Reinforcement Learning
by: Lan, Qingfeng, et al.
Published: (2023) -
Mutual Information Regularized Offline Reinforcement Learning
by: Ma, Xiao, et al.
Published: (2022) -
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning
by: Xue, Zhenghai, et al.
Published: (2025) -
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation
by: Xu, Xiangyu, et al.
Published: (2024) -
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
by: Zhai, Yuanzhao, et al.
Published: (2023)