Saved in:
| Main Authors: | Li, Gengxu, Xia, Tingyu, Chang, Yi, Wu, Yuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.14643 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models
by: Li, Jinzhe, et al.
Published: (2025)
by: Li, Jinzhe, et al.
Published: (2025)
Large Language Model Evaluation via Matrix Nuclear-Norm
by: Li, Yahan, et al.
Published: (2024)
by: Li, Yahan, et al.
Published: (2024)
A Survey of RWKV
by: Li, Zhiyuan, et al.
Published: (2024)
by: Li, Zhiyuan, et al.
Published: (2024)
Language Models can Evaluate Themselves via Probability Discrepancy
by: Xia, Tingyu, et al.
Published: (2024)
by: Xia, Tingyu, et al.
Published: (2024)
ORPO: Monolithic Preference Optimization without Reference Model
by: Hong, Jiwoo, et al.
Published: (2024)
by: Hong, Jiwoo, et al.
Published: (2024)
DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization
by: Deng, Mengyi, et al.
Published: (2026)
by: Deng, Mengyi, et al.
Published: (2026)
Robust Preference Optimization via Dynamic Target Margins
by: Sun, Jie, et al.
Published: (2025)
by: Sun, Jie, et al.
Published: (2025)
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
by: Xia, Tingyu, et al.
Published: (2024)
by: Xia, Tingyu, et al.
Published: (2024)
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
by: Kim, Kyuyoung, et al.
Published: (2024)
by: Kim, Kyuyoung, et al.
Published: (2024)
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
SimPO: Simple Preference Optimization with a Reference-Free Reward
by: Meng, Yu, et al.
Published: (2024)
by: Meng, Yu, et al.
Published: (2024)
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
by: Hong, Jiwoo, et al.
Published: (2024)
by: Hong, Jiwoo, et al.
Published: (2024)
Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization
by: Chang, Yupeng, et al.
Published: (2025)
by: Chang, Yupeng, et al.
Published: (2025)
Asymmetric Co-Training for Source-Free Few-Shot Domain Adaptation
by: Li, Gengxu, et al.
Published: (2025)
by: Li, Gengxu, et al.
Published: (2025)
Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability
by: Yang, Haiqi, et al.
Published: (2025)
by: Yang, Haiqi, et al.
Published: (2025)
AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
AMaPO: Adaptive Margin-attached Preference Optimization for Language Model Alignment
by: Deng, Ruibo, et al.
Published: (2025)
by: Deng, Ruibo, et al.
Published: (2025)
Explaining Length Bias in LLM-Based Preference Evaluations
by: Hu, Zhengyu, et al.
Published: (2024)
by: Hu, Zhengyu, et al.
Published: (2024)
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
by: Zhao, Shuai, et al.
Published: (2025)
by: Zhao, Shuai, et al.
Published: (2025)
Multi-Reference Preference Optimization for Large Language Models
by: Le, Hung, et al.
Published: (2024)
by: Le, Hung, et al.
Published: (2024)
BPO: Revisiting Preference Modeling in Direct Preference Optimization
by: Sun, Lin, et al.
Published: (2025)
by: Sun, Lin, et al.
Published: (2025)
Disentangling Length from Quality in Direct Preference Optimization
by: Park, Ryan, et al.
Published: (2024)
by: Park, Ryan, et al.
Published: (2024)
Offline Preference Optimization via Maximum Marginal Likelihood Estimation
by: Najafi, Saeed, et al.
Published: (2025)
by: Najafi, Saeed, et al.
Published: (2025)
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
by: Gou, Qi, et al.
Published: (2024)
by: Gou, Qi, et al.
Published: (2024)
Understanding Reference Policies in Direct Preference Optimization
by: Liu, Yixin, et al.
Published: (2024)
by: Liu, Yixin, et al.
Published: (2024)
Plan-and-Write: Structure-Guided Length Control for LLMs without Model Retraining
by: Akinfaderin, Adewale, et al.
Published: (2025)
by: Akinfaderin, Adewale, et al.
Published: (2025)
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
by: Choi, Jaepill, et al.
Published: (2024)
by: Choi, Jaepill, et al.
Published: (2024)
XTRUST: On the Multilingual Trustworthiness of Large Language Models
by: Li, Yahan, et al.
Published: (2024)
by: Li, Yahan, et al.
Published: (2024)
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models
by: Li, Zhiyuan, et al.
Published: (2025)
by: Li, Zhiyuan, et al.
Published: (2025)
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
by: Lu, Junru, et al.
Published: (2024)
by: Lu, Junru, et al.
Published: (2024)
Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling
by: Cai, Jianfeng, et al.
Published: (2025)
by: Cai, Jianfeng, et al.
Published: (2025)
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization
by: Jin, Zhensheng, et al.
Published: (2025)
by: Jin, Zhensheng, et al.
Published: (2025)
References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
by: Kim, Doyoung, et al.
Published: (2025)
by: Kim, Doyoung, et al.
Published: (2025)
LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization
by: Chang, Yupeng, et al.
Published: (2025)
by: Chang, Yupeng, et al.
Published: (2025)
Length Generalization of Causal Transformers without Position Encoding
by: Wang, Jie, et al.
Published: (2024)
by: Wang, Jie, et al.
Published: (2024)
Larger or Smaller Reward Margins to Select Preferences for Alignment?
by: Huang, Kexin, et al.
Published: (2025)
by: Huang, Kexin, et al.
Published: (2025)
Towards Understanding the Influence of Reward Margin on Preference Model Performance
by: Qin, Bowen, et al.
Published: (2024)
by: Qin, Bowen, et al.
Published: (2024)
Multiplayer Nash Preference Optimization
by: Wu, Fang, et al.
Published: (2025)
by: Wu, Fang, et al.
Published: (2025)
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR
by: Liu, Fanfan, et al.
Published: (2026)
by: Liu, Fanfan, et al.
Published: (2026)
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
Similar Items
-
Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models
by: Li, Jinzhe, et al.
Published: (2025) -
Large Language Model Evaluation via Matrix Nuclear-Norm
by: Li, Yahan, et al.
Published: (2024) -
A Survey of RWKV
by: Li, Zhiyuan, et al.
Published: (2024) -
Language Models can Evaluate Themselves via Probability Discrepancy
by: Xia, Tingyu, et al.
Published: (2024) -
ORPO: Monolithic Preference Optimization without Reference Model
by: Hong, Jiwoo, et al.
Published: (2024)