Saved in:
| Main Authors: | Liu, Pangpang, Shi, Chengchun, Sun, Will Wei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.02507 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)
by: Liu, Pangpang, et al.
Published: (2024)
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
by: Liu, Pangpang, et al.
Published: (2025)
by: Liu, Pangpang, et al.
Published: (2025)
Statistical Inference in Reinforcement Learning: A Selective Survey
by: Shi, Chengchun
Published: (2025)
by: Shi, Chengchun
Published: (2025)
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Fairness-aware Contextual Dynamic Pricing with Strategic Buyers
by: Liu, Pangpang, et al.
Published: (2025)
by: Liu, Pangpang, et al.
Published: (2025)
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data
by: Wang, Danyang, et al.
Published: (2024)
by: Wang, Danyang, et al.
Published: (2024)
Counterfactually Safe Reinforcement Learning
by: Li, Jingyi, et al.
Published: (2026)
by: Li, Jingyi, et al.
Published: (2026)
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback
by: Lee, Seong Jin, et al.
Published: (2024)
by: Lee, Seong Jin, et al.
Published: (2024)
Testing Stationarity and Change Point Detection in Reinforcement Learning
by: Li, Mengbing, et al.
Published: (2022)
by: Li, Mengbing, et al.
Published: (2022)
Contextual Dynamic Pricing with Strategic Buyers
by: Liu, Pangpang, et al.
Published: (2023)
by: Liu, Pangpang, et al.
Published: (2023)
Doubly Inhomogeneous Reinforcement Learning
by: Hu, Liyuan, et al.
Published: (2022)
by: Hu, Liyuan, et al.
Published: (2022)
Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling
by: Cho, Young Hyun, et al.
Published: (2026)
by: Cho, Young Hyun, et al.
Published: (2026)
Reinforcement Learning from Human Feedback
by: Lambert, Nathan
Published: (2025)
by: Lambert, Nathan
Published: (2025)
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
by: Liu, Qiang, et al.
Published: (2026)
by: Liu, Qiang, et al.
Published: (2026)
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024)
by: Chan, Alex J., et al.
Published: (2024)
Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
by: Wu, Xiangkun, et al.
Published: (2026)
by: Wu, Xiangkun, et al.
Published: (2026)
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning
by: Yu, Shuguang, et al.
Published: (2024)
by: Yu, Shuguang, et al.
Published: (2024)
Strategyproof Reinforcement Learning from Human Feedback
by: Buening, Thomas Kleine, et al.
Published: (2025)
by: Buening, Thomas Kleine, et al.
Published: (2025)
Sequential Knockoffs for Variable Selection in Reinforcement Learning
by: Ma, Tao, et al.
Published: (2023)
by: Ma, Tao, et al.
Published: (2023)
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
by: Gong, Shijin, et al.
Published: (2026)
by: Gong, Shijin, et al.
Published: (2026)
From Authors to Reviewers: Leveraging Rankings to Improve Peer Review
by: Wang, Weichen, et al.
Published: (2025)
by: Wang, Weichen, et al.
Published: (2025)
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
by: Swamy, Gokul, et al.
Published: (2024)
by: Swamy, Gokul, et al.
Published: (2024)
Semi-pessimistic Reinforcement Learning
by: Zhu, Jin, et al.
Published: (2025)
by: Zhu, Jin, et al.
Published: (2025)
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Robust Offline Reinforcement learning with Heavy-Tailed Rewards
by: Zhu, Jin, et al.
Published: (2023)
by: Zhu, Jin, et al.
Published: (2023)
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
by: Ye, Chenlu, et al.
Published: (2024)
by: Ye, Chenlu, et al.
Published: (2024)
Infrared Spectra Prediction for Diazo Groups Utilizing a Machine Learning Approach with Structural Attention Mechanism
by: Liu, Chengchun, et al.
Published: (2024)
by: Liu, Chengchun, et al.
Published: (2024)
Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making
by: Bian, Zeyu, et al.
Published: (2026)
by: Bian, Zeyu, et al.
Published: (2026)
A Survey of Reinforcement Learning from Human Feedback
by: Kaufmann, Timo, et al.
Published: (2023)
by: Kaufmann, Timo, et al.
Published: (2023)
Reinforcement Learning from Multi-level and Episodic Human Feedback
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024)
by: Shani, Lior, et al.
Published: (2024)
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
by: Wang, Jitao, et al.
Published: (2025)
by: Wang, Jitao, et al.
Published: (2025)
Parameter Efficient Reinforcement Learning from Human Feedback
by: Sidahmed, Hakim, et al.
Published: (2024)
by: Sidahmed, Hakim, et al.
Published: (2024)
Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic
by: Zhou, Hongyi, et al.
Published: (2026)
by: Zhou, Hongyi, et al.
Published: (2026)
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
by: Chen, Ruitao, et al.
Published: (2024)
by: Chen, Ruitao, et al.
Published: (2024)
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
by: Zhang, Zhen-Yu, et al.
Published: (2026)
by: Zhang, Zhen-Yu, et al.
Published: (2026)
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
by: Lambert, Nathan, et al.
Published: (2023)
by: Lambert, Nathan, et al.
Published: (2023)
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
by: Zhang, Qining, et al.
Published: (2025)
by: Zhang, Qining, et al.
Published: (2025)
Reinforcing Human Behavior Simulation via Verbal Feedback
by: Sun, Weiwei, et al.
Published: (2026)
by: Sun, Weiwei, et al.
Published: (2026)
Similar Items
-
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024) -
Uncertainty Quantification for Large Language Model Reward Learning under Heterogeneous Human Feedback
by: Liu, Pangpang, et al.
Published: (2025) -
Statistical Inference in Reinforcement Learning: A Selective Survey
by: Shi, Chengchun
Published: (2025) -
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025) -
Fairness-aware Contextual Dynamic Pricing with Strategic Buyers
by: Liu, Pangpang, et al.
Published: (2025)