Saved in:
| Main Authors: | Bose, Avinandan, Xiong, Zhihan, Saha, Aadirupa, Du, Simon Shaolei, Fazel, Maryam |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.10616 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Offline Multi-task Transfer RL with Representational Penalization
by: Bose, Avinandan, et al.
Published: (2024)
by: Bose, Avinandan, et al.
Published: (2024)
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025)
by: Bose, Avinandan, et al.
Published: (2025)
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
by: Zhou, Runlong, et al.
Published: (2025)
by: Zhou, Runlong, et al.
Published: (2025)
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024)
by: Saha, Aadirupa, et al.
Published: (2024)
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback
by: Chen, Shulun, et al.
Published: (2025)
by: Chen, Shulun, et al.
Published: (2025)
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
by: Bose, Avinandan, et al.
Published: (2025)
by: Bose, Avinandan, et al.
Published: (2025)
Offline congestion games: How feedback type affects data coverage requirement
by: Jiang, Haozhe, et al.
Published: (2022)
by: Jiang, Haozhe, et al.
Published: (2022)
Dual Approximation Policy Optimization
by: Xiong, Zhihan, et al.
Published: (2024)
by: Xiong, Zhihan, et al.
Published: (2024)
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
Global Convergence of Four-Layer Matrix Factorization under Random Initialization
by: Luo, Minrui, et al.
Published: (2025)
by: Luo, Minrui, et al.
Published: (2025)
Cold-Start Personalization via Training-Free Priors from Structured World Models
by: Bose, Avinandan, et al.
Published: (2026)
by: Bose, Avinandan, et al.
Published: (2026)
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
by: Xu, Weihang, et al.
Published: (2024)
by: Xu, Weihang, et al.
Published: (2024)
Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization
by: Saha, Aadirupa, et al.
Published: (2024)
by: Saha, Aadirupa, et al.
Published: (2024)
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025)
by: Shi, Ruizhe, et al.
Published: (2025)
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
by: Jiang, Haozhe, et al.
Published: (2023)
by: Jiang, Haozhe, et al.
Published: (2023)
Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures
by: Zhou, Mo, et al.
Published: (2025)
by: Zhou, Mo, et al.
Published: (2025)
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
by: Maynard-Zhang, Leo, et al.
Published: (2026)
by: Maynard-Zhang, Leo, et al.
Published: (2026)
Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains
by: Viano, Luca, et al.
Published: (2026)
by: Viano, Luca, et al.
Published: (2026)
Initializing Services in Interactive ML Systems for Diverse Users
by: Bose, Avinandan, et al.
Published: (2023)
by: Bose, Avinandan, et al.
Published: (2023)
The Crucial Role of Samplers in Online Direct Preference Optimization
by: Shi, Ruizhe, et al.
Published: (2024)
by: Shi, Ruizhe, et al.
Published: (2024)
Generalized Preference Optimization: A Unified Approach to Offline Alignment
by: Tang, Yunhao, et al.
Published: (2024)
by: Tang, Yunhao, et al.
Published: (2024)
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
by: Xiong, Zhihan, et al.
Published: (2023)
by: Xiong, Zhihan, et al.
Published: (2023)
Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)
by: Prasad, Archiki, et al.
Published: (2024)
Online Policy Learning from Offline Preferences
by: Zhang, Guoxi, et al.
Published: (2024)
by: Zhang, Guoxi, et al.
Published: (2024)
Online Preference Alignment for Language Models via Count-based Exploration
by: Bai, Chenjia, et al.
Published: (2025)
by: Bai, Chenjia, et al.
Published: (2025)
Policy-Based Trajectory Clustering in Offline Reinforcement Learning
by: Hu, Hao, et al.
Published: (2025)
by: Hu, Hao, et al.
Published: (2025)
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
by: Zhang, Shenao, et al.
Published: (2024)
by: Zhang, Shenao, et al.
Published: (2024)
One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise
by: Bhat, Amith, et al.
Published: (2026)
by: Bhat, Amith, et al.
Published: (2026)
LLM-as-Judge on a Budget
by: Saha, Aadirupa, et al.
Published: (2026)
by: Saha, Aadirupa, et al.
Published: (2026)
Tracking the Best Expert Privately
by: Saha, Aadirupa, et al.
Published: (2025)
by: Saha, Aadirupa, et al.
Published: (2025)
Transformers are Efficient Compilers, Provably
by: Zhai, Xiyu, et al.
Published: (2024)
by: Zhai, Xiyu, et al.
Published: (2024)
Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences
by: Ferbach, Damien, et al.
Published: (2024)
by: Ferbach, Damien, et al.
Published: (2024)
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
by: Cen, Shicong, et al.
Published: (2024)
by: Cen, Shicong, et al.
Published: (2024)
Online Bandit Learning with Offline Preference Data for Improved RLHF
by: Agnihotri, Akhil, et al.
Published: (2024)
by: Agnihotri, Akhil, et al.
Published: (2024)
PrefDisco: Benchmarking Proactive Personalized Reasoning
by: Li, Shuyue Stella, et al.
Published: (2025)
by: Li, Shuyue Stella, et al.
Published: (2025)
Latent Adversarial Regularization for Offline Preference Optimization
by: Jiang, Enyi, et al.
Published: (2026)
by: Jiang, Enyi, et al.
Published: (2026)
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
by: Hong, Yuzhong, et al.
Published: (2024)
by: Hong, Yuzhong, et al.
Published: (2024)
Provably Convergent Primal-Dual DPO for Constrained LLM Alignment
by: Du, Yihan, et al.
Published: (2025)
by: Du, Yihan, et al.
Published: (2025)
OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration
by: Yang, Yiqin, et al.
Published: (2026)
by: Yang, Yiqin, et al.
Published: (2026)
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
by: Gölz, Paul, et al.
Published: (2025)
by: Gölz, Paul, et al.
Published: (2025)
Similar Items
-
Offline Multi-task Transfer RL with Representational Penalization
by: Bose, Avinandan, et al.
Published: (2024) -
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025) -
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
by: Zhou, Runlong, et al.
Published: (2025) -
DP-Dueling: Learning from Preference Feedback without Compromising User Privacy
by: Saha, Aadirupa, et al.
Published: (2024) -
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback
by: Chen, Shulun, et al.
Published: (2025)