Saved in:
| Main Authors: | Zhao, Shuai, Xu, Yunqiu, Zhu, Linchao, Yang, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.09895 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models
by: Lin, Tianqianjin, et al.
Published: (2025)
by: Lin, Tianqianjin, et al.
Published: (2025)
Self-Play Preference Optimization for Language Model Alignment
by: Wu, Yue, et al.
Published: (2024)
by: Wu, Yue, et al.
Published: (2024)
Group Preference Optimization: Few-Shot Alignment of Large Language Models
by: Zhao, Siyan, et al.
Published: (2023)
by: Zhao, Siyan, et al.
Published: (2023)
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
by: Zhang, Yifan, et al.
Published: (2024)
by: Zhang, Yifan, et al.
Published: (2024)
Binary Classifier Optimization for Large Language Model Alignment
by: Jung, Seungjae, et al.
Published: (2024)
by: Jung, Seungjae, et al.
Published: (2024)
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Accelerated Preference Optimization for Large Language Model Alignment
by: He, Jiafan, et al.
Published: (2024)
by: He, Jiafan, et al.
Published: (2024)
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
by: Wei, Jiaheng, et al.
Published: (2024)
by: Wei, Jiaheng, et al.
Published: (2024)
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
by: Yang, Junming, et al.
Published: (2025)
by: Yang, Junming, et al.
Published: (2025)
Can Brain Signals Reveal Inner Alignment with Human Languages?
by: Han, William, et al.
Published: (2022)
by: Han, William, et al.
Published: (2022)
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization
by: Li, Junsong, et al.
Published: (2025)
by: Li, Junsong, et al.
Published: (2025)
Rewarding Intellectual Humility Learning When Not To Answer In Large Language Models
by: Jha, Abha, et al.
Published: (2026)
by: Jha, Abha, et al.
Published: (2026)
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
by: Gao, Mingqi, et al.
Published: (2024)
by: Gao, Mingqi, et al.
Published: (2024)
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
by: Wang, Haoxiang, et al.
Published: (2024)
by: Wang, Haoxiang, et al.
Published: (2024)
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
by: Gu, Alex, et al.
Published: (2025)
by: Gu, Alex, et al.
Published: (2025)
DSPA: Dynamic SAE Steering for Data-Efficient Preference Alignment
by: Wedgwood, James, et al.
Published: (2026)
by: Wedgwood, James, et al.
Published: (2026)
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
by: Huang, Audrey, et al.
Published: (2024)
by: Huang, Audrey, et al.
Published: (2024)
Auto-ICL: In-Context Learning without Human Supervision
by: Yang, Jinghan, et al.
Published: (2023)
by: Yang, Jinghan, et al.
Published: (2023)
Evolutionary Contrastive Distillation for Language Model Alignment
by: Katz-Samuels, Julian, et al.
Published: (2024)
by: Katz-Samuels, Julian, et al.
Published: (2024)
Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
Course-Correction: Safety Alignment Using Synthetic Preferences
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs
by: Sun, Hao, et al.
Published: (2025)
by: Sun, Hao, et al.
Published: (2025)
Active Preference Learning for Large Language Models
by: Muldrew, William, et al.
Published: (2024)
by: Muldrew, William, et al.
Published: (2024)
MixDPO: Modeling Preference Strength for Pluralistic Alignment
by: Imai, Saki, et al.
Published: (2026)
by: Imai, Saki, et al.
Published: (2026)
Self-Supervised Visual Preference Alignment
by: Zhu, Ke, et al.
Published: (2024)
by: Zhu, Ke, et al.
Published: (2024)
Less is More: Improving LLM Alignment via Preference Data Selection
by: Deng, Xun, et al.
Published: (2025)
by: Deng, Xun, et al.
Published: (2025)
Teaching Your Models to Understand Code via Focal Preference Alignment
by: Wu, Jie, et al.
Published: (2025)
by: Wu, Jie, et al.
Published: (2025)
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
by: Yang, Rui, et al.
Published: (2024)
by: Yang, Rui, et al.
Published: (2024)
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
by: Liu, Yinhong, et al.
Published: (2024)
by: Liu, Yinhong, et al.
Published: (2024)
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
by: Hadji-Kyriacou, Avelina Asada, et al.
Published: (2024)
by: Hadji-Kyriacou, Avelina Asada, et al.
Published: (2024)
Answer Matching Outperforms Multiple Choice for Language Model Evaluation
by: Chandak, Nikhil, et al.
Published: (2025)
by: Chandak, Nikhil, et al.
Published: (2025)
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
by: Pattnaik, Pulkit, et al.
Published: (2024)
by: Pattnaik, Pulkit, et al.
Published: (2024)
AdapThink: Adaptive Thinking Preferences for Reasoning Language Model
by: Wan, Xu, et al.
Published: (2025)
by: Wan, Xu, et al.
Published: (2025)
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
by: Kim, Dongyoung, et al.
Published: (2024)
by: Kim, Dongyoung, et al.
Published: (2024)
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
by: Liu, Chris Yuhao, et al.
Published: (2025)
by: Liu, Chris Yuhao, et al.
Published: (2025)
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
by: D'Oosterlinck, Karel, et al.
Published: (2024)
by: D'Oosterlinck, Karel, et al.
Published: (2024)
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit
by: Gong, Ruihao, et al.
Published: (2024)
by: Gong, Ruihao, et al.
Published: (2024)
From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification
by: Wang, Fei, et al.
Published: (2024)
by: Wang, Fei, et al.
Published: (2024)
Similar Items
-
RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models
by: Lin, Tianqianjin, et al.
Published: (2025) -
Self-Play Preference Optimization for Language Model Alignment
by: Wu, Yue, et al.
Published: (2024) -
Group Preference Optimization: Few-Shot Alignment of Large Language Models
by: Zhao, Siyan, et al.
Published: (2023) -
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
by: Zhang, Yifan, et al.
Published: (2024) -
Binary Classifier Optimization for Large Language Model Alignment
by: Jung, Seungjae, et al.
Published: (2024)