Saved in:
| Main Authors: | Chen, Zhuotong, Liu, Fang, Zhu, Xuan, Qi, Yanjun, Ghavamzadeh, Mohammad |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.04567 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization
by: Chen, Zhuotong, et al.
Published: (2024)
by: Chen, Zhuotong, et al.
Published: (2024)
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
by: Afsharrad, Amirhossein, et al.
Published: (2026)
by: Afsharrad, Amirhossein, et al.
Published: (2026)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
by: Rafailov, Rafael, et al.
Published: (2023)
by: Rafailov, Rafael, et al.
Published: (2023)
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
by: Kim, Kyuyoung, et al.
Published: (2024)
by: Kim, Kyuyoung, et al.
Published: (2024)
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
by: Liu, Ning, et al.
Published: (2026)
by: Liu, Ning, et al.
Published: (2026)
Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model
by: Kallus, Nathan
Published: (2025)
by: Kallus, Nathan
Published: (2025)
Diffusion Blend: Inference-Time Multi-Preference Alignment for Diffusion Models
by: Cheng, Min, et al.
Published: (2025)
by: Cheng, Min, et al.
Published: (2025)
C2-DPO: Constrained Controlled Direct Preference Optimization
by: Asadi, Kavosh, et al.
Published: (2025)
by: Asadi, Kavosh, et al.
Published: (2025)
Selective Preference Optimization via Token-Level Reward Function Estimation
by: Yang, Kailai, et al.
Published: (2024)
by: Yang, Kailai, et al.
Published: (2024)
Conservative Contextual Bandits: Beyond Linear Representations
by: Deb, Rohan, et al.
Published: (2024)
by: Deb, Rohan, et al.
Published: (2024)
Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors
by: Fang, Hao, et al.
Published: (2025)
by: Fang, Hao, et al.
Published: (2025)
CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English
by: Rueda, Andrew, et al.
Published: (2024)
by: Rueda, Andrew, et al.
Published: (2024)
Secrets of RLHF in Large Language Models Part II: Reward Modeling
by: Wang, Binghai, et al.
Published: (2024)
by: Wang, Binghai, et al.
Published: (2024)
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
by: Qi, Xuan, et al.
Published: (2025)
by: Qi, Xuan, et al.
Published: (2025)
PerPO: Perceptual Preference Optimization via Discriminative Rewarding
by: Zhu, Zining, et al.
Published: (2025)
by: Zhu, Zining, et al.
Published: (2025)
AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models
by: Liu, Qi, et al.
Published: (2025)
by: Liu, Qi, et al.
Published: (2025)
APO: Alpha-Divergence Preference Optimization
by: Zixian, Wang
Published: (2025)
by: Zixian, Wang
Published: (2025)
Cross-Modal Content Optimization for Steering Web Agent Preferences
by: Jiang, Tanqiu, et al.
Published: (2025)
by: Jiang, Tanqiu, et al.
Published: (2025)
Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor
by: Ma, Guoxin, et al.
Published: (2026)
by: Ma, Guoxin, et al.
Published: (2026)
Visual Preference Optimization with Rubric Rewards
by: Yu, Ya-Qi, et al.
Published: (2026)
by: Yu, Ya-Qi, et al.
Published: (2026)
$ξ$-DPO: Direct Preference Optimization via Ratio Reward Margin
by: Fan, Zhengyuan, et al.
Published: (2026)
by: Fan, Zhengyuan, et al.
Published: (2026)
Aligning Crowd Feedback via Distributional Preference Reward Modeling
by: Li, Dexun, et al.
Published: (2024)
by: Li, Dexun, et al.
Published: (2024)
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
by: Li, Jian, et al.
Published: (2024)
by: Li, Jian, et al.
Published: (2024)
Reward Learning From Preference With Ties
by: Liu, Jinsong, et al.
Published: (2024)
by: Liu, Jinsong, et al.
Published: (2024)
Preference as Reward, Maximum Preference Optimization with Importance Sampling
by: Jiang, Zaifan, et al.
Published: (2023)
by: Jiang, Zaifan, et al.
Published: (2023)
Towards Comprehensive Preference Data Collection for Reward Modeling
by: Hu, Yulan, et al.
Published: (2024)
by: Hu, Yulan, et al.
Published: (2024)
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)
by: Sullivan, Michael, et al.
Published: (2025)
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
by: Chen, Yanjun, et al.
Published: (2024)
by: Chen, Yanjun, et al.
Published: (2024)
CRPO: Confidence-Reward Driven Preference Optimization for Machine Translation
by: Cui, Guofeng, et al.
Published: (2025)
by: Cui, Guofeng, et al.
Published: (2025)
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
by: Gu, Yu, et al.
Published: (2024)
by: Gu, Yu, et al.
Published: (2024)
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes
by: Hau, Jia Lin, et al.
Published: (2023)
by: Hau, Jia Lin, et al.
Published: (2023)
RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk
by: Hau, Jia Lin, et al.
Published: (2022)
by: Hau, Jia Lin, et al.
Published: (2022)
Your Language Model Secretly Contains Personality Subnetworks
by: Ye, Ruimeng, et al.
Published: (2026)
by: Ye, Ruimeng, et al.
Published: (2026)
Your Transformer is Secretly Linear
by: Razzhigaev, Anton, et al.
Published: (2024)
by: Razzhigaev, Anton, et al.
Published: (2024)
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
by: Jin, Zhuoran, et al.
Published: (2025)
by: Jin, Zhuoran, et al.
Published: (2025)
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
by: Zhu, Mingkang, et al.
Published: (2025)
by: Zhu, Mingkang, et al.
Published: (2025)
Bayesian Pseudo-Coresets via Contrastive Divergence
by: Tiwary, Piyush, et al.
Published: (2023)
by: Tiwary, Piyush, et al.
Published: (2023)
Interpreting Language Reward Models via Contrastive Explanations
by: Jiang, Junqi, et al.
Published: (2024)
by: Jiang, Junqi, et al.
Published: (2024)
SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence
by: Liu, Zhining, et al.
Published: (2025)
by: Liu, Zhining, et al.
Published: (2025)
Similar Items
-
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization
by: Chen, Zhuotong, et al.
Published: (2024) -
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
by: Afsharrad, Amirhossein, et al.
Published: (2026) -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
by: Rafailov, Rafael, et al.
Published: (2023) -
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
by: Kim, Kyuyoung, et al.
Published: (2024) -
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
by: Liu, Ning, et al.
Published: (2026)