Saved in:
| Main Authors: | Bell, Henry, Schertel, Lara Neubauer da Costa, Ding, Bochu, Fain, Brandon |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.18760 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale
by: Bell, Henry, et al.
Published: (2026)
by: Bell, Henry, et al.
Published: (2026)
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
by: Bao, Yuwei, et al.
Published: (2023)
by: Bao, Yuwei, et al.
Published: (2023)
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
by: Cai, Tianchi, et al.
Published: (2023)
by: Cai, Tianchi, et al.
Published: (2023)
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input
by: Peng, Andi, et al.
Published: (2024)
by: Peng, Andi, et al.
Published: (2024)
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
by: Gao, Mingqi, et al.
Published: (2024)
by: Gao, Mingqi, et al.
Published: (2024)
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
by: Zhang, Yifan, et al.
Published: (2024)
by: Zhang, Yifan, et al.
Published: (2024)
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
by: Zhao, Shuai, et al.
Published: (2025)
by: Zhao, Shuai, et al.
Published: (2025)
Larger or Smaller Reward Margins to Select Preferences for Alignment?
by: Huang, Kexin, et al.
Published: (2025)
by: Huang, Kexin, et al.
Published: (2025)
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
by: Sun, Zhiqing, et al.
Published: (2024)
by: Sun, Zhiqing, et al.
Published: (2024)
Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment
by: Lin, Zhiyu, et al.
Published: (2025)
by: Lin, Zhiyu, et al.
Published: (2025)
Value Alignment from Unstructured Text
by: Padhi, Inkit, et al.
Published: (2024)
by: Padhi, Inkit, et al.
Published: (2024)
Transfer Q Star: Principled Decoding for LLM Alignment
by: Chakraborty, Souradip, et al.
Published: (2024)
by: Chakraborty, Souradip, et al.
Published: (2024)
Robust Multi-Objective Preference Alignment with Online DPO
by: Gupta, Raghav, et al.
Published: (2025)
by: Gupta, Raghav, et al.
Published: (2025)
Alignment is Localized: A Causal Probe into Preference Layers
by: Chaudhury, Archie
Published: (2025)
by: Chaudhury, Archie
Published: (2025)
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)
by: Le, Quang-Hung, et al.
Published: (2024)
PORT: Preference Optimization on Reasoning Traces
by: Lahlou, Salem, et al.
Published: (2024)
by: Lahlou, Salem, et al.
Published: (2024)
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values
by: Yu, Dian, et al.
Published: (2025)
by: Yu, Dian, et al.
Published: (2025)
COPR: Continual Learning Human Preference through Optimal Policy Regularization
by: Zhang, Han, et al.
Published: (2023)
by: Zhang, Han, et al.
Published: (2023)
Beyond Correctness: Learning Robust Reasoning via Transfer
by: Lee, Hyunseok, et al.
Published: (2026)
by: Lee, Hyunseok, et al.
Published: (2026)
Holistic Utility Preference Learning for Listwise Alignment
by: Zhou, Jiacong, et al.
Published: (2024)
by: Zhou, Jiacong, et al.
Published: (2024)
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
by: Hong, Yuzhong, et al.
Published: (2024)
by: Hong, Yuzhong, et al.
Published: (2024)
Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return
by: Peng, Nianli, et al.
Published: (2023)
by: Peng, Nianli, et al.
Published: (2023)
Value Drifts: Tracing Value Alignment During LLM Post-Training
by: Bhatia, Mehar, et al.
Published: (2025)
by: Bhatia, Mehar, et al.
Published: (2025)
Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences
by: Pattnaik, Pulkit, et al.
Published: (2024)
by: Pattnaik, Pulkit, et al.
Published: (2024)
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
by: Kim, Dongyoung, et al.
Published: (2024)
by: Kim, Dongyoung, et al.
Published: (2024)
Advancing LLM Reasoning Generalists with Preference Trees
by: Yuan, Lifan, et al.
Published: (2024)
by: Yuan, Lifan, et al.
Published: (2024)
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
by: Wu, Junkang, et al.
Published: (2024)
by: Wu, Junkang, et al.
Published: (2024)
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
by: Gallego, Víctor
Published: (2024)
by: Gallego, Víctor
Published: (2024)
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
by: Xiao, Teng, et al.
Published: (2025)
by: Xiao, Teng, et al.
Published: (2025)
Cal-DPO: Calibrated Direct Preference Optimization for Language Model Alignment
by: Xiao, Teng, et al.
Published: (2024)
by: Xiao, Teng, et al.
Published: (2024)
Reasoning Boosts Opinion Alignment in LLMs
by: Berdoz, Frédéric, et al.
Published: (2026)
by: Berdoz, Frédéric, et al.
Published: (2026)
GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language Understanding
by: Yan, Yibo, et al.
Published: (2024)
by: Yan, Yibo, et al.
Published: (2024)
PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning
by: Holk, Simon, et al.
Published: (2024)
by: Holk, Simon, et al.
Published: (2024)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
by: Yin, Yueqin, et al.
Published: (2024)
by: Yin, Yueqin, et al.
Published: (2024)
Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment
by: Sun, Shengyang, et al.
Published: (2025)
by: Sun, Shengyang, et al.
Published: (2025)
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)
Value Profiles for Encoding Human Variation
by: Sorensen, Taylor, et al.
Published: (2025)
by: Sorensen, Taylor, et al.
Published: (2025)
How Many Human Judgments Are Enough? Feasibility Limits of Human Preference Evaluation
by: Lee, Wilson Y.
Published: (2026)
by: Lee, Wilson Y.
Published: (2026)
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
by: Brandon, William, et al.
Published: (2024)
by: Brandon, William, et al.
Published: (2024)
Similar Items
-
Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale
by: Bell, Henry, et al.
Published: (2026) -
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
by: Bao, Yuwei, et al.
Published: (2023) -
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
by: Cai, Tianchi, et al.
Published: (2023) -
MaxMin-RLHF: Alignment with Diverse Human Preferences
by: Chakraborty, Souradip, et al.
Published: (2024) -
Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input
by: Peng, Andi, et al.
Published: (2024)