:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Yifan, Ye, Xichen, Chen, Yifan, Zhang, Qiaosheng
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.00709
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Influencing Humans to Conform to Preference Models for RLHF
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
by: Xiong, Wei, et al.
Published: (2023)

MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)

Single Agent Robust Deep Reinforcement Learning for Bus Fleet Control
by: Zhang, Yifan
Published: (2025)

WPO: Enhancing RLHF with Weighted Preference Optimization
by: Zhou, Wenxuan, et al.
Published: (2024)

PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
by: Ji, Jiaming, et al.
Published: (2024)

Generative RLHF-V: Learning Principles from Multi-modal Human Preference
by: Zhou, Jiayi, et al.
Published: (2025)

Policy-labeled Preference Learning: Is Preference Enough for RLHF?
by: Cho, Taehyun, et al.
Published: (2025)

RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning
by: Wei, Zeming, et al.
Published: (2026)

How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors
by: Xu, Yifan, et al.
Published: (2026)

Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)

Active Negative Loss: A Robust Framework for Learning with Noisy Labels
by: Ye, Xichen, et al.
Published: (2024)

Avoiding $\mathbf{exp(R_{max})}$ scaling in RLHF through Preference-based Exploration
by: Chen, Mingyu, et al.
Published: (2025)

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
by: Park, Chanwoo, et al.
Published: (2024)

Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)

Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)

A Rational Model of Dimension-reduced Human Categorization
by: Hong, Yifan, et al.
Published: (2023)

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
by: Nishimori, Soichiro, et al.
Published: (2025)

Distributionally Robust Token Optimization in RLHF
by: Jin, Yeping, et al.
Published: (2026)

Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
by: Belakaria, Syrine, et al.
Published: (2025)

More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
by: Li, Aaron J., et al.
Published: (2024)

Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
by: Siththaranjan, Anand, et al.
Published: (2023)

The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
by: Chen, Yanjun, et al.
Published: (2024)

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
by: Yu, Chengzhi, et al.
Published: (2025)

Trust, Don't Trust, or Flip: Robust Preference-Based Reinforcement Learning with Multi-Expert Feedback
by: Hosseini, Seyed Amir, et al.
Published: (2026)

Democratic Preference Alignment via Sortition-Weighted RLHF
by: Sana, Suvadip, et al.
Published: (2026)

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
by: Zhang, Yifan, et al.
Published: (2024)

APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs
by: Srewa, Mahmoud, et al.
Published: (2026)

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
by: Liu, Zhihan, et al.
Published: (2024)

A Single-Point Measurement Framework for Robust Cyber-Attack Diagnosis in Smart Microgrids Using Dual Fractional-Order Feature Analysis
by: Wang, Yifan
Published: (2025)

Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks
by: Sun, Ruopei, et al.
Published: (2025)

MicroNAS: Zero-Shot Neural Architecture Search for MCUs
by: Qiao, Ye, et al.
Published: (2024)

Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
by: Hu, Rui, et al.
Published: (2024)

A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs
by: Srewa, Mahmoud, et al.
Published: (2025)

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
by: Cen, Shicong, et al.
Published: (2024)

When Truthful Representations Flip Under Deceptive Instructions?
by: Long, Xianxuan, et al.
Published: (2025)

Towards Robust Influence Functions with Flat Validation Minima
by: Ye, Xichen, et al.
Published: (2025)

SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition
by: Liu, Yunbo, et al.
Published: (2025)

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)

A Descriptive and Normative Theory of Human Beliefs in RLHF
by: Dandekar, Sylee, et al.
Published: (2025)