:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nan, Tianlong, Li, Xiaopeng, Kroer, Christian, Lin, Tianyi
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2606.01382
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
by: Zhang, Yuheng, et al.
Published: (2024)

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
by: Chen, Peter, et al.
Published: (2025)

Stepwise Guided Policy Optimization: Coloring your Incorrect Reasoning in GRPO
by: Chen, Peter, et al.
Published: (2025)

Reward-free Alignment for Conflicting Objectives
by: Chen, Peter, et al.
Published: (2026)

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
by: Rosset, Corby, et al.
Published: (2024)

Sample Efficient Preference Alignment in LLMs via Active Exploration
by: Mehta, Viraj, et al.
Published: (2023)

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!
by: Stradi, Francesco Emanuele, et al.
Published: (2025)

LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning
by: Lin, Xiaotian, et al.
Published: (2025)

ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
by: Lin, Xiaoqiang, et al.
Published: (2025)

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
by: Xiong, Wei, et al.
Published: (2023)

Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training
by: Moya, Christian, et al.
Published: (2026)

ComPO: Preference Alignment via Comparison Oracles
by: Chen, Peter, et al.
Published: (2025)

Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
by: Schlaginhaufen, Andreas, et al.
Published: (2025)

MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
by: Lin, Yunze
Published: (2025)

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking
by: Ren, Jie, et al.
Published: (2025)

Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
by: Ye, Yaowen, et al.
Published: (2025)

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024)

Preference Guided Iterated Pareto Referent Optimisation for Accessible Route Planning
by: Speziali, Paolo, et al.
Published: (2026)

Iterative Refinement Neural Operators are Learned Fixed-Point Solvers: A Principled Approach to Spectral Bias Mitigation
by: Liu, Xiaotian, et al.
Published: (2026)

Automated Skill Discovery for Language Agents through Exploration and Iterative Feedback
by: Yang, Yongjin, et al.
Published: (2025)

Active Preference Optimization for Sample Efficient RLHF
by: Das, Nirjhar, et al.
Published: (2024)

PB$^2$: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning
by: Driss, Brahim, et al.
Published: (2025)

FraPPE: Fast and Efficient Preference-based Pure Exploration
by: Das, Udvas, et al.
Published: (2025)

On the Role of Preference Variance in Preference Optimization
by: Guo, Jiacheng, et al.
Published: (2025)

Thinking Preference Optimization
by: Yang, Wang, et al.
Published: (2025)

Aligning CodeLLMs with Direct Preference Optimization
by: Miao, Yibo, et al.
Published: (2024)

Risk-aware Direct Preference Optimization under Nested Risk Measure
by: Zhang, Lijun, et al.
Published: (2025)

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
by: Xie, Yuxi, et al.
Published: (2024)

Preference as Reward, Maximum Preference Optimization with Importance Sampling
by: Jiang, Zaifan, et al.
Published: (2023)

Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
by: Luo, Haocheng, et al.
Published: (2026)

Efficient Exploration at Scale
by: Asghari, Seyed Mohammad, et al.
Published: (2026)

Confidence-Controlled Exploration: Efficient Sparse-Reward Policy Learning for Robot Navigation
by: Patel, Bhrij, et al.
Published: (2023)

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
by: Cheng, Pengyu, et al.
Published: (2023)

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
by: Yang, Yiqin, et al.
Published: (2026)

Nash CoT: Multi-Path Inference with Preference Equilibrium
by: Zhang, Ziqi, et al.
Published: (2024)

Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
by: Yan, Tianyi Lorena, et al.
Published: (2025)

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
by: Verma, Arun, et al.
Published: (2024)

The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
by: Zhang, Ruichen, et al.
Published: (2025)

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
by: Yue, Bo, et al.
Published: (2024)

Graph Unlearning Meets Influence-aware Negative Preference Optimization
by: Chen, Qiang, et al.
Published: (2025)