Saved in:
| Main Authors: | Xu, Yifan, Ye, Xichen, Chen, Yifan, Zhang, Qiaosheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.00709 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Influencing Humans to Conform to Preference Models for RLHF
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
by: Xiong, Wei, et al.
Published: (2023)
by: Xiong, Wei, et al.
Published: (2023)
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control
by: Zhang, Yifan
Published: (2025)
by: Zhang, Yifan
Published: (2025)
WPO: Enhancing RLHF with Weighted Preference Optimization
by: Zhou, Wenxuan, et al.
Published: (2024)
by: Zhou, Wenxuan, et al.
Published: (2024)
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
by: Ji, Jiaming, et al.
Published: (2024)
by: Ji, Jiaming, et al.
Published: (2024)
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
by: Zhou, Jiayi, et al.
Published: (2025)
by: Zhou, Jiayi, et al.
Published: (2025)
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
by: Cho, Taehyun, et al.
Published: (2025)
by: Cho, Taehyun, et al.
Published: (2025)
RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
by: Wei, Zeming, et al.
Published: (2026)
by: Wei, Zeming, et al.
Published: (2026)
How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors
by: Xu, Yifan, et al.
Published: (2026)
by: Xu, Yifan, et al.
Published: (2026)
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
Active Negative Loss: A Robust Framework for Learning with Noisy Labels
by: Ye, Xichen, et al.
Published: (2024)
by: Ye, Xichen, et al.
Published: (2024)
Avoiding $\mathbf{exp(R_{max})}$ scaling in RLHF through Preference-based Exploration
by: Chen, Mingyu, et al.
Published: (2025)
by: Chen, Mingyu, et al.
Published: (2025)
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
by: Park, Chanwoo, et al.
Published: (2024)
by: Park, Chanwoo, et al.
Published: (2024)
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)
by: Das, Nirjhar, et al.
Published: (2024)
A Rational Model of Dimension-reduced Human Categorization
by: Hong, Yifan, et al.
Published: (2023)
by: Hong, Yifan, et al.
Published: (2023)
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
by: Nishimori, Soichiro, et al.
Published: (2025)
by: Nishimori, Soichiro, et al.
Published: (2025)
Distributionally Robust Token Optimization in RLHF
by: Jin, Yeping, et al.
Published: (2026)
by: Jin, Yeping, et al.
Published: (2026)
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
by: Belakaria, Syrine, et al.
Published: (2025)
by: Belakaria, Syrine, et al.
Published: (2025)
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
by: Li, Aaron J., et al.
Published: (2024)
by: Li, Aaron J., et al.
Published: (2024)
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
by: Siththaranjan, Anand, et al.
Published: (2023)
by: Siththaranjan, Anand, et al.
Published: (2023)
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
by: Chen, Yanjun, et al.
Published: (2024)
by: Chen, Yanjun, et al.
Published: (2024)
Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
by: Yu, Chengzhi, et al.
Published: (2025)
by: Yu, Chengzhi, et al.
Published: (2025)
Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback
by: Hosseini, Seyed Amir, et al.
Published: (2026)
by: Hosseini, Seyed Amir, et al.
Published: (2026)
Democratic Preference Alignment via Sortition-Weighted RLHF
by: Sana, Suvadip, et al.
Published: (2026)
by: Sana, Suvadip, et al.
Published: (2026)
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
by: Zhang, Yifan, et al.
Published: (2024)
by: Zhang, Yifan, et al.
Published: (2024)
APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs
by: Srewa, Mahmoud, et al.
Published: (2026)
by: Srewa, Mahmoud, et al.
Published: (2026)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
by: Liu, Zhihan, et al.
Published: (2024)
by: Liu, Zhihan, et al.
Published: (2024)
A Single-Point Measurement Framework for Robust Cyber-Attack Diagnosis in Smart Microgrids Using Dual Fractional-Order Feature Analysis
by: Wang, Yifan
Published: (2025)
by: Wang, Yifan
Published: (2025)
Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks
by: Sun, Ruopei, et al.
Published: (2025)
by: Sun, Ruopei, et al.
Published: (2025)
MicroNAS: Zero-Shot Neural Architecture Search for MCUs
by: Qiao, Ye, et al.
Published: (2024)
by: Qiao, Ye, et al.
Published: (2024)
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
by: Hu, Rui, et al.
Published: (2024)
by: Hu, Rui, et al.
Published: (2024)
A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs
by: Srewa, Mahmoud, et al.
Published: (2025)
by: Srewa, Mahmoud, et al.
Published: (2025)
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
by: Cen, Shicong, et al.
Published: (2024)
by: Cen, Shicong, et al.
Published: (2024)
When Truthful Representations Flip Under Deceptive Instructions?
by: Long, Xianxuan, et al.
Published: (2025)
by: Long, Xianxuan, et al.
Published: (2025)
Towards Robust Influence Functions with Flat Validation Minima
by: Ye, Xichen, et al.
Published: (2025)
by: Ye, Xichen, et al.
Published: (2025)
SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition
by: Liu, Yunbo, et al.
Published: (2025)
by: Liu, Yunbo, et al.
Published: (2025)
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)
by: Hu, Jian, et al.
Published: (2024)
A Descriptive and Normative Theory of Human Beliefs in RLHF
by: Dandekar, Sylee, et al.
Published: (2025)
by: Dandekar, Sylee, et al.
Published: (2025)
Similar Items
-
Influencing Humans to Conform to Preference Models for RLHF
by: Hatgis-Kessell, Stephane, et al.
Published: (2025) -
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
by: Xiong, Wei, et al.
Published: (2023) -
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024) -
Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control
by: Zhang, Yifan
Published: (2025) -
WPO: Enhancing RLHF with Weighted Preference Optimization
by: Zhou, Wenxuan, et al.
Published: (2024)