Saved in:
| Main Authors: | Shi, Ruizhe, Zhou, Runlong, Du, Simon S. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.19605 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025)
by: Shi, Ruizhe, et al.
Published: (2025)
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
by: Zhou, Runlong, et al.
Published: (2024)
by: Zhou, Runlong, et al.
Published: (2024)
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing
by: Qi, Biqing, et al.
Published: (2024)
by: Qi, Biqing, et al.
Published: (2024)
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
by: Zhou, Runlong, et al.
Published: (2025)
by: Zhou, Runlong, et al.
Published: (2025)
Direct Multi-Turn Preference Optimization for Language Agents
by: Shi, Wentao, et al.
Published: (2024)
by: Shi, Wentao, et al.
Published: (2024)
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
by: Zhou, Runlong, et al.
Published: (2025)
by: Zhou, Runlong, et al.
Published: (2025)
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
On the Role of Preference Variance in Preference Optimization
by: Guo, Jiacheng, et al.
Published: (2025)
by: Guo, Jiacheng, et al.
Published: (2025)
Understanding Reference Policies in Direct Preference Optimization
by: Liu, Yixin, et al.
Published: (2024)
by: Liu, Yixin, et al.
Published: (2024)
Accelerating Direct Preference Optimization with Prefix Sharing
by: Wang, Franklin, et al.
Published: (2024)
by: Wang, Franklin, et al.
Published: (2024)
Filtered Direct Preference Optimization
by: Morimura, Tetsuro, et al.
Published: (2024)
by: Morimura, Tetsuro, et al.
Published: (2024)
Direct Preference Optimization with an Offset
by: Amini, Afra, et al.
Published: (2024)
by: Amini, Afra, et al.
Published: (2024)
Disentangling Length from Quality in Direct Preference Optimization
by: Park, Ryan, et al.
Published: (2024)
by: Park, Ryan, et al.
Published: (2024)
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
by: Zhou, Zhenglin, et al.
Published: (2025)
by: Zhou, Zhenglin, et al.
Published: (2025)
Entropy Controllable Direct Preference Optimization
by: Omura, Motoki, et al.
Published: (2024)
by: Omura, Motoki, et al.
Published: (2024)
Orthogonal Finetuning for Direct Preference Optimization
by: Yang, Chenxu, et al.
Published: (2024)
by: Yang, Chenxu, et al.
Published: (2024)
Attention-Based Sampler for Diffusion Language Models
by: Zhou, Yuyan, et al.
Published: (2026)
by: Zhou, Yuyan, et al.
Published: (2026)
Enhancing LLM Safety via Constrained Direct Preference Optimization
by: Liu, Zixuan, et al.
Published: (2024)
by: Liu, Zixuan, et al.
Published: (2024)
OPTune: Efficient Online Preference Tuning
by: Chen, Lichang, et al.
Published: (2024)
by: Chen, Lichang, et al.
Published: (2024)
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
by: Gallego, Víctor
Published: (2024)
by: Gallego, Víctor
Published: (2024)
Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
DOS: Dependency-Oriented Sampler for Masked Diffusion Language Models
by: Zhou, Xueyu, et al.
Published: (2026)
by: Zhou, Xueyu, et al.
Published: (2026)
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
by: Lyu, Kaifeng, et al.
Published: (2024)
by: Lyu, Kaifeng, et al.
Published: (2024)
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization
by: Lin, Yong, et al.
Published: (2024)
by: Lin, Yong, et al.
Published: (2024)
AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates
by: Chen, Shaolong, et al.
Published: (2026)
by: Chen, Shaolong, et al.
Published: (2026)
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
by: Razin, Noam, et al.
Published: (2024)
by: Razin, Noam, et al.
Published: (2024)
Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation
by: Banerjee, Oishi, et al.
Published: (2024)
by: Banerjee, Oishi, et al.
Published: (2024)
Transformers are Efficient Compilers, Provably
by: Zhai, Xiyu, et al.
Published: (2024)
by: Zhai, Xiyu, et al.
Published: (2024)
FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
by: Zhu, Kewen, et al.
Published: (2026)
by: Zhu, Kewen, et al.
Published: (2026)
A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications
by: Xiao, Wenyi, et al.
Published: (2024)
by: Xiao, Wenyi, et al.
Published: (2024)
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
by: Zhang, Xuan, et al.
Published: (2024)
by: Zhang, Xuan, et al.
Published: (2024)
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization
by: Spadea, Fernando, et al.
Published: (2025)
by: Spadea, Fernando, et al.
Published: (2025)
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
by: Li, Xintong, et al.
Published: (2025)
by: Li, Xintong, et al.
Published: (2025)
Inducing Robustness in a 2 Dimensional Direct Preference Optimization Paradigm
by: Shashidhar, Sarvesh, et al.
Published: (2025)
by: Shashidhar, Sarvesh, et al.
Published: (2025)
Differential Information Distribution: A Bayesian Perspective on Direct Preference Optimization
by: Won, Yunjae, et al.
Published: (2025)
by: Won, Yunjae, et al.
Published: (2025)
VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization
by: Liu, Weixin, et al.
Published: (2026)
by: Liu, Weixin, et al.
Published: (2026)
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
by: Jung, Sunghee, et al.
Published: (2025)
by: Jung, Sunghee, et al.
Published: (2025)
Robust Multi-Objective Preference Alignment with Online DPO
by: Gupta, Raghav, et al.
Published: (2025)
by: Gupta, Raghav, et al.
Published: (2025)
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
by: Rosset, Corby, et al.
Published: (2024)
by: Rosset, Corby, et al.
Published: (2024)
Similar Items
-
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025) -
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
by: Zhou, Runlong, et al.
Published: (2024) -
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing
by: Qi, Biqing, et al.
Published: (2024) -
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
by: Zhou, Runlong, et al.
Published: (2025) -
Direct Multi-Turn Preference Optimization for Language Agents
by: Shi, Wentao, et al.
Published: (2024)