Saved in:
| Main Authors: | Lee, Seong Jin, Sun, Will Wei, Liu, Yufeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.19436 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Low-Rank Online Dynamic Assortment with Dual Contextual Information
by: Lee, Seong Jin, et al.
Published: (2024)
by: Lee, Seong Jin, et al.
Published: (2024)
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)
by: Liu, Pangpang, et al.
Published: (2024)
Reinforcement Learning from Human Feedback: A Statistical Perspective
by: Liu, Pangpang, et al.
Published: (2026)
by: Liu, Pangpang, et al.
Published: (2026)
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
by: Liu, Pangpang, et al.
Published: (2025)
by: Liu, Pangpang, et al.
Published: (2025)
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
by: Lu, Nan, et al.
Published: (2025)
by: Lu, Nan, et al.
Published: (2025)
Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling
by: Cho, Young Hyun, et al.
Published: (2026)
by: Cho, Young Hyun, et al.
Published: (2026)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Contextually Guided Transformers via Low-Rank Adaptation
by: Zhmoginov, Andrey, et al.
Published: (2025)
by: Zhmoginov, Andrey, et al.
Published: (2025)
Reinforcement Learning from Human Feedback
by: Lambert, Nathan
Published: (2025)
by: Lambert, Nathan
Published: (2025)
Efficient Generalized Low-Rank Tensor Contextual Bandits
by: Yi, Qianxin, et al.
Published: (2023)
by: Yi, Qianxin, et al.
Published: (2023)
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
by: Liu, Qiang, et al.
Published: (2026)
by: Liu, Qiang, et al.
Published: (2026)
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024)
by: Chan, Alex J., et al.
Published: (2024)
TIC-GRPO: Provable and Efficient Optimization for Reinforcement Learning from Human Feedback
by: Pang, Lei, et al.
Published: (2025)
by: Pang, Lei, et al.
Published: (2025)
Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback
by: Liu, Haolin, et al.
Published: (2024)
by: Liu, Haolin, et al.
Published: (2024)
Strategyproof Reinforcement Learning from Human Feedback
by: Buening, Thomas Kleine, et al.
Published: (2025)
by: Buening, Thomas Kleine, et al.
Published: (2025)
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
Towards Federated Low-Rank Adaptation of Language Models with Rank Heterogeneity
by: Byun, Yuji, et al.
Published: (2024)
by: Byun, Yuji, et al.
Published: (2024)
Parameter Efficient Reinforcement Learning from Human Feedback
by: Sidahmed, Hakim, et al.
Published: (2024)
by: Sidahmed, Hakim, et al.
Published: (2024)
ILoRA: Federated Learning with Low-Rank Adaptation for Heterogeneous Client Aggregation
by: Zhou, Junchao, et al.
Published: (2025)
by: Zhou, Junchao, et al.
Published: (2025)
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
by: Ye, Chenlu, et al.
Published: (2024)
by: Ye, Chenlu, et al.
Published: (2024)
Combinatorial Reinforcement Learning with Preference Feedback
by: Lee, Joongkyu, et al.
Published: (2025)
by: Lee, Joongkyu, et al.
Published: (2025)
Generalized Low-Rank Matrix Contextual Bandits with Graph Information
by: Wang, Yao, et al.
Published: (2025)
by: Wang, Yao, et al.
Published: (2025)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
by: Swamy, Gokul, et al.
Published: (2024)
by: Swamy, Gokul, et al.
Published: (2024)
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024)
by: Shani, Lior, et al.
Published: (2024)
Reinforcement Learning from Multi-level and Episodic Human Feedback
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
by: Lee, Harrison, et al.
Published: (2023)
by: Lee, Harrison, et al.
Published: (2023)
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Federated Reinforcement Learning with Constraint Heterogeneity
by: Jin, Hao, et al.
Published: (2024)
by: Jin, Hao, et al.
Published: (2024)
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
by: Chen, Ruitao, et al.
Published: (2024)
by: Chen, Ruitao, et al.
Published: (2024)
A Survey of Reinforcement Learning from Human Feedback
by: Kaufmann, Timo, et al.
Published: (2023)
by: Kaufmann, Timo, et al.
Published: (2023)
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
by: Zhang, Zhen-Yu, et al.
Published: (2026)
by: Zhang, Zhen-Yu, et al.
Published: (2026)
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
by: Lambert, Nathan, et al.
Published: (2023)
by: Lambert, Nathan, et al.
Published: (2023)
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
by: Zhang, Qining, et al.
Published: (2025)
by: Zhang, Qining, et al.
Published: (2025)
Active Human Feedback Collection via Neural Contextual Dueling Bandits
by: Verma, Arun, et al.
Published: (2025)
by: Verma, Arun, et al.
Published: (2025)
Reinforcing Human Behavior Simulation via Verbal Feedback
by: Sun, Weiwei, et al.
Published: (2026)
by: Sun, Weiwei, et al.
Published: (2026)
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
by: Zhang, Qining, et al.
Published: (2024)
by: Zhang, Qining, et al.
Published: (2024)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024)
by: Zhang, Shun, et al.
Published: (2024)
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
by: Peng, Xiyue, et al.
Published: (2024)
by: Peng, Xiyue, et al.
Published: (2024)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)
by: Hong, Ilgee, et al.
Published: (2024)
Similar Items
-
Low-Rank Online Dynamic Assortment with Dual Contextual Information
by: Lee, Seong Jin, et al.
Published: (2024) -
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024) -
Reinforcement Learning from Human Feedback: A Statistical Perspective
by: Liu, Pangpang, et al.
Published: (2026) -
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
by: Liu, Pangpang, et al.
Published: (2025) -
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
by: Lu, Nan, et al.
Published: (2025)