:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bose, Avinandan, Xiong, Zhihan, Saha, Aadirupa, Du, Simon Shaolei, Fazel, Maryam
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2412.10616
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Offline Multi-task Transfer RL with Representational Penalization
by: Bose, Avinandan, et al.
Published: (2024)

LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025)

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
by: Zhou, Runlong, et al.
Published: (2025)

DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024)

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback
by: Chen, Shulun, et al.
Published: (2025)

Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
by: Bose, Avinandan, et al.
Published: (2025)

Offline congestion games: How feedback type affects data coverage requirement
by: Jiang, Haozhe, et al.
Published: (2022)

Dual Approximation Policy Optimization
by: Xiong, Zhihan, et al.
Published: (2024)

Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
by: Zhang, Yiran, et al.
Published: (2025)

Global Convergence of Four-Layer Matrix Factorization under Random Initialization
by: Luo, Minrui, et al.
Published: (2025)

Cold-Start Personalization via Training-Free Priors from Structured World Models
by: Bose, Avinandan, et al.
Published: (2026)

Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
by: Xu, Weihang, et al.
Published: (2024)

Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization
by: Saha, Aadirupa, et al.
Published: (2024)

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025)

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
by: Jiang, Haozhe, et al.
Published: (2023)

Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures
by: Zhou, Mo, et al.
Published: (2025)

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
by: Maynard-Zhang, Leo, et al.
Published: (2026)

Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains
by: Viano, Luca, et al.
Published: (2026)

Initializing Services in Interactive ML Systems for Diverse Users
by: Bose, Avinandan, et al.
Published: (2023)

The Crucial Role of Samplers in Online Direct Preference Optimization
by: Shi, Ruizhe, et al.
Published: (2024)

Generalized Preference Optimization: A Unified Approach to Offline Alignment
by: Tang, Yunhao, et al.
Published: (2024)

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
by: Xiong, Zhihan, et al.
Published: (2023)

Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)

Online Policy Learning from Offline Preferences
by: Zhang, Guoxi, et al.
Published: (2024)

Online Preference Alignment for Language Models via Count-based Exploration
by: Bai, Chenjia, et al.
Published: (2025)

Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
by: Zhang, Shenao, et al.
Published: (2024)

One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise
by: Bhat, Amith, et al.
Published: (2026)

LLM-as-Judge on a Budget
by: Saha, Aadirupa, et al.
Published: (2026)

Tracking the Best Expert Privately
by: Saha, Aadirupa, et al.
Published: (2025)

Transformers are Efficient Compilers, Provably
by: Zhai, Xiyu, et al.
Published: (2024)

Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences
by: Ferbach, Damien, et al.
Published: (2024)

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
by: Cen, Shicong, et al.
Published: (2024)

Online Bandit Learning with Offline Preference Data for Improved RLHF
by: Agnihotri, Akhil, et al.
Published: (2024)

PrefDisco: Benchmarking Proactive Personalized Reasoning
by: Li, Shuyue Stella, et al.
Published: (2025)

Latent Adversarial Regularization for Offline Preference Optimization
by: Jiang, Enyi, et al.
Published: (2026)

Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
by: Hong, Yuzhong, et al.
Published: (2024)

Provably Convergent Primal-Dual DPO for Constrained LLM Alignment
by: Du, Yihan, et al.
Published: (2025)

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
by: Yang, Yiqin, et al.
Published: (2026)

Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
by: Gölz, Paul, et al.
Published: (2025)