Saved in:
| Main Authors: | Xu, Tianxiang, Zhu, Xiaoyan, Lai, Xin, Wang, Jiayin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17458 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Do Papers Tell the Whole Story? A Benchmark and Framework for Uncovering Hidden Implementation Gaps in Bioinformatics
by: Xu, Tianxiang, et al.
Published: (2026)
by: Xu, Tianxiang, et al.
Published: (2026)
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
by: Hu, Wentao, et al.
Published: (2025)
by: Hu, Wentao, et al.
Published: (2025)
Raising the ClaSS of Streaming Time Series Segmentation
by: Ermshaus, Arik, et al.
Published: (2023)
by: Ermshaus, Arik, et al.
Published: (2023)
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
by: Chen, Ruitao, et al.
Published: (2024)
by: Chen, Ruitao, et al.
Published: (2024)
M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
by: Wang, Ziyan, et al.
Published: (2025)
by: Wang, Ziyan, et al.
Published: (2025)
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
by: Chakraborty, Souradip, et al.
Published: (2023)
by: Chakraborty, Souradip, et al.
Published: (2023)
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
by: Singh, Ashish, et al.
Published: (2023)
by: Singh, Ashish, et al.
Published: (2023)
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
by: Shen, Wei, et al.
Published: (2025)
by: Shen, Wei, et al.
Published: (2025)
TimeHF: Billion-Scale Time Series Models Guided by Human Feedback
by: Qi, Yongzhi, et al.
Published: (2025)
by: Qi, Yongzhi, et al.
Published: (2025)
Reinforcement Learning from Human Feedback
by: Lambert, Nathan
Published: (2025)
by: Lambert, Nathan
Published: (2025)
Towards User-level Private Reinforcement Learning with Human Feedback
by: Zhang, Jiaming, et al.
Published: (2025)
by: Zhang, Jiaming, et al.
Published: (2025)
Learning to Schedule Online Tasks with Bandit Feedback
by: Xu, Yongxin, et al.
Published: (2024)
by: Xu, Yongxin, et al.
Published: (2024)
Strategyproof Reinforcement Learning from Human Feedback
by: Buening, Thomas Kleine, et al.
Published: (2025)
by: Buening, Thomas Kleine, et al.
Published: (2025)
Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback
by: Li, Derun, et al.
Published: (2025)
by: Li, Derun, et al.
Published: (2025)
Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework
by: Metz, Yannick, et al.
Published: (2024)
by: Metz, Yannick, et al.
Published: (2024)
Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off
by: Zhao, Mingkuan, et al.
Published: (2025)
by: Zhao, Mingkuan, et al.
Published: (2025)
Reinforcement Learning from Human Feedback: A Statistical Perspective
by: Liu, Pangpang, et al.
Published: (2026)
by: Liu, Pangpang, et al.
Published: (2026)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
by: Swamy, Gokul, et al.
Published: (2024)
by: Swamy, Gokul, et al.
Published: (2024)
BioDefect: The First Dataset for Defect Detection in Bioinformatics Software
by: Xu, Tianxiang, et al.
Published: (2026)
by: Xu, Tianxiang, et al.
Published: (2026)
Robust Reinforcement Learning from Corrupted Human Feedback
by: Bukharin, Alexander, et al.
Published: (2024)
by: Bukharin, Alexander, et al.
Published: (2024)
Dual Active Learning for Reinforcement Learning from Human Feedback
by: Liu, Pangpang, et al.
Published: (2024)
by: Liu, Pangpang, et al.
Published: (2024)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024)
by: Zhang, Shun, et al.
Published: (2024)
Bresa: Bio-inspired Reflexive Safe Reinforcement Learning for Contact-Rich Robotic Tasks
by: Zhang, Heng, et al.
Published: (2025)
by: Zhang, Heng, et al.
Published: (2025)
A Survey of Reinforcement Learning from Human Feedback
by: Kaufmann, Timo, et al.
Published: (2023)
by: Kaufmann, Timo, et al.
Published: (2023)
Reinforcement Learning from Multi-level and Episodic Human Feedback
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
by: Elahi, Muhammad Qasim, et al.
Published: (2025)
Dense Reward for Free in Reinforcement Learning from Human Feedback
by: Chan, Alex J., et al.
Published: (2024)
by: Chan, Alex J., et al.
Published: (2024)
Multi-turn Reinforcement Learning from Preference Human Feedback
by: Shani, Lior, et al.
Published: (2024)
by: Shani, Lior, et al.
Published: (2024)
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
by: Zhai, Yuanzhao, et al.
Published: (2023)
by: Zhai, Yuanzhao, et al.
Published: (2023)
Towards Off-Policy Reinforcement Learning for Ranking Policies with Human Feedback
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Parameter Efficient Reinforcement Learning from Human Feedback
by: Sidahmed, Hakim, et al.
Published: (2024)
by: Sidahmed, Hakim, et al.
Published: (2024)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
by: Hong, Ilgee, et al.
Published: (2024)
by: Hong, Ilgee, et al.
Published: (2024)
Corruption Robust Offline Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2024)
by: Mandal, Debmalya, et al.
Published: (2024)
Distributionally Robust Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2025)
by: Mandal, Debmalya, et al.
Published: (2025)
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Data-dependent Exploration for Online Reinforcement Learning from Human Feedback
by: Zhang, Zhen-Yu, et al.
Published: (2026)
by: Zhang, Zhen-Yu, et al.
Published: (2026)
Low-Rank Contextual Reinforcement Learning from Heterogeneous Human Feedback
by: Lee, Seong Jin, et al.
Published: (2024)
by: Lee, Seong Jin, et al.
Published: (2024)
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
by: Lambert, Nathan, et al.
Published: (2023)
by: Lambert, Nathan, et al.
Published: (2023)
Provable Reinforcement Learning from Human Feedback with an Unknown Link Function
by: Zhang, Qining, et al.
Published: (2025)
by: Zhang, Qining, et al.
Published: (2025)
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
by: Peng, Xiyue, et al.
Published: (2024)
by: Peng, Xiyue, et al.
Published: (2024)
Quantum-inspired Reinforcement Learning for Synthesizable Drug Design
by: Wang, Dannong, et al.
Published: (2024)
by: Wang, Dannong, et al.
Published: (2024)
Similar Items
-
Do Papers Tell the Whole Story? A Benchmark and Framework for Uncovering Hidden Implementation Gaps in Bioinformatics
by: Xu, Tianxiang, et al.
Published: (2026) -
Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models
by: Hu, Wentao, et al.
Published: (2025) -
Raising the ClaSS of Streaming Time Series Segmentation
by: Ermshaus, Arik, et al.
Published: (2023) -
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
by: Chen, Ruitao, et al.
Published: (2024) -
M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
by: Wang, Ziyan, et al.
Published: (2025)