Saved in:
| Main Authors: | Liu, Ziang, Xu, Junjie, Wu, Xingjiao, Yang, Jing, He, Liang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.07268 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
by: Gao, Chen-Xiao, et al.
Published: (2024)
by: Gao, Chen-Xiao, et al.
Published: (2024)
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
by: Ghosh, Udita, et al.
Published: (2025)
by: Ghosh, Udita, et al.
Published: (2025)
From Reward-Free Representations to Preferences: Rethinking Offline Preference-Based Reinforcement Learning
by: Yang, Jun-Jie, et al.
Published: (2026)
by: Yang, Jun-Jie, et al.
Published: (2026)
Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning
by: Shianifar, Jonaid, et al.
Published: (2026)
by: Shianifar, Jonaid, et al.
Published: (2026)
Preference-based Multi-Objective Reinforcement Learning
by: Mu, Ni, et al.
Published: (2025)
by: Mu, Ni, et al.
Published: (2025)
Preference-Guided Reinforcement Learning for Efficient Exploration
by: Wang, Guojian, et al.
Published: (2024)
by: Wang, Guojian, et al.
Published: (2024)
Reinforcement Learning from Diverse Human Preferences
by: Xue, Wanqi, et al.
Published: (2023)
by: Xue, Wanqi, et al.
Published: (2023)
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
by: Cheng, Jie, et al.
Published: (2024)
by: Cheng, Jie, et al.
Published: (2024)
General Preference Reinforcement Learning
by: Umer, Muhammad, et al.
Published: (2026)
by: Umer, Muhammad, et al.
Published: (2026)
Regret Bounds for Reinforcement Learning from Multi-Source Imperfect Preferences
by: Shi, Ming, et al.
Published: (2026)
by: Shi, Ming, et al.
Published: (2026)
Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models
by: Liu, Yi, et al.
Published: (2023)
by: Liu, Yi, et al.
Published: (2023)
Combinatorial Reinforcement Learning with Preference Feedback
by: Lee, Joongkyu, et al.
Published: (2025)
by: Lee, Joongkyu, et al.
Published: (2025)
Query-Policy Misalignment in Preference-Based Reinforcement Learning
by: Hu, Xiao, et al.
Published: (2023)
by: Hu, Xiao, et al.
Published: (2023)
OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
by: Yang, Yiqin, et al.
Published: (2026)
by: Yang, Yiqin, et al.
Published: (2026)
PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning
by: Driss, Brahim, et al.
Published: (2025)
by: Driss, Brahim, et al.
Published: (2025)
CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries
by: Mu, Ni, et al.
Published: (2025)
by: Mu, Ni, et al.
Published: (2025)
POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization
by: Wang, Ziqing, et al.
Published: (2025)
by: Wang, Ziqing, et al.
Published: (2025)
Deep Reinforcement Learning from Hierarchical Preference Design
by: Bukharin, Alexander, et al.
Published: (2023)
by: Bukharin, Alexander, et al.
Published: (2023)
Preference Elicitation for Offline Reinforcement Learning
by: Pace, Alizée, et al.
Published: (2024)
by: Pace, Alizée, et al.
Published: (2024)
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024)
by: Shani, Lior, et al.
Published: (2024)
Knowledge Gradient for Preference Learning
by: Wu, Kaiwen, et al.
Published: (2026)
by: Wu, Kaiwen, et al.
Published: (2026)
Boosting Robustness in Preference-Based Reinforcement Learning with Dynamic Sparsity
by: Muslimani, Calarina, et al.
Published: (2024)
by: Muslimani, Calarina, et al.
Published: (2024)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
by: Zhou, Zhanhui, et al.
Published: (2023)
by: Zhou, Zhanhui, et al.
Published: (2023)
Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
by: Xu, Yinglun, et al.
Published: (2024)
by: Xu, Yinglun, et al.
Published: (2024)
Preference Learning Algorithms Do Not Learn Preference Rankings
by: Chen, Angelica, et al.
Published: (2024)
by: Chen, Angelica, et al.
Published: (2024)
Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning
by: Bui, The Viet, et al.
Published: (2025)
by: Bui, The Viet, et al.
Published: (2025)
Provable Reward-Agnostic Preference-Based Reinforcement Learning
by: Zhan, Wenhao, et al.
Published: (2023)
by: Zhan, Wenhao, et al.
Published: (2023)
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
by: Cho, Taehyun, et al.
Published: (2025)
by: Cho, Taehyun, et al.
Published: (2025)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)
by: Hong, Ilgee, et al.
Published: (2024)
Behavior Preference Regression for Offline Reinforcement Learning
by: Srinivasan, Padmanaba, et al.
Published: (2025)
by: Srinivasan, Padmanaba, et al.
Published: (2025)
Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning
by: Zhou, Qin, et al.
Published: (2026)
by: Zhou, Qin, et al.
Published: (2026)
Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions
by: Xu, Yinglun, et al.
Published: (2023)
by: Xu, Yinglun, et al.
Published: (2023)
Fusing Rewards and Preferences in Reinforcement Learning
by: Khorasani, Sadegh, et al.
Published: (2025)
by: Khorasani, Sadegh, et al.
Published: (2025)
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
by: Luan, Yao, et al.
Published: (2025)
by: Luan, Yao, et al.
Published: (2025)
Reinforcement Learning from Adversarial Preferences in Tabular MDPs
by: Tsuchiya, Taira, et al.
Published: (2025)
by: Tsuchiya, Taira, et al.
Published: (2025)
Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
by: Gao, Xiancheng, et al.
Published: (2025)
by: Gao, Xiancheng, et al.
Published: (2025)
Diffusion Classifier-Driven Reward for Offline Preference-based Reinforcement Learning
by: Pang, Teng, et al.
Published: (2025)
by: Pang, Teng, et al.
Published: (2025)
Debiasing Online Preference Learning via Preference Feature Preservation
by: Kim, Dongyoung, et al.
Published: (2025)
by: Kim, Dongyoung, et al.
Published: (2025)
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024)
by: Zhang, Tianle, et al.
Published: (2024)
Active Learning for Direct Preference Optimization
by: Kveton, Branislav, et al.
Published: (2025)
by: Kveton, Branislav, et al.
Published: (2025)
Similar Items
-
Hindsight Preference Learning for Offline Preference-based Reinforcement Learning
by: Gao, Chen-Xiao, et al.
Published: (2024) -
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
by: Ghosh, Udita, et al.
Published: (2025) -
From Reward-Free Representations to Preferences: Rethinking Offline Preference-Based Reinforcement Learning
by: Yang, Jun-Jie, et al.
Published: (2026) -
Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning
by: Shianifar, Jonaid, et al.
Published: (2026) -
Preference-based Multi-Objective Reinforcement Learning
by: Mu, Ni, et al.
Published: (2025)