:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiong, Zhihan, Fazel, Maryam, Xiao, Lin
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2410.01249
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
by: Bose, Avinandan, et al.
Published: (2024)

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
by: Maynard-Zhang, Leo, et al.
Published: (2026)

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
by: Xiong, Zhihan, et al.
Published: (2023)

LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025)

Offline congestion games: How feedback type affects data coverage requirement
by: Jiang, Haozhe, et al.
Published: (2022)

A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
by: Jiang, Haozhe, et al.
Published: (2023)

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
by: Zhou, Runlong, et al.
Published: (2025)

Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing
by: Arasteh, Fazel, et al.
Published: (2025)

Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
by: Xu, Weihang, et al.
Published: (2024)

Local linear convergence of gradient methods for overparameterized Gaussian mixtures
by: Wang, Jingxing, et al.
Published: (2026)

Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures
by: Zhou, Mo, et al.
Published: (2025)

Offline Multi-task Transfer RL with Representational Penalization
by: Bose, Avinandan, et al.
Published: (2024)

Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
by: Bose, Avinandan, et al.
Published: (2025)

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models
by: Zhu, Libin, et al.
Published: (2026)

Learning Optimal Tax Design in Nonatomic Congestion Games
by: Cui, Qiwen, et al.
Published: (2024)

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs
by: Chen, Shulun, et al.
Published: (2025)

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback
by: Chen, Shulun, et al.
Published: (2025)

Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates
by: Roulet, Vincent, et al.
Published: (2022)

Dynamics of Learning under User Choice: Overspecialization and Peer-Model Probing
by: Narang, Adhyyan, et al.
Published: (2026)

Iteratively reweighted kernel machines efficiently learn sparse functions
by: Zhu, Libin, et al.
Published: (2025)

Finite Sample Identification of Partially Observed Bilinear Dynamical Systems
by: Sattar, Yahya, et al.
Published: (2025)

Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback
by: Narang, Adhyyan, et al.
Published: (2022)

Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
by: Zhang, Yiran, et al.
Published: (2025)

High-dimensional Limit of SGD for Diagonal Linear Networks
by: Malaxechebarría, Begoña García, et al.
Published: (2026)

Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization
by: Zeng, Zhihan, et al.
Published: (2025)

Global Convergence of Four-Layer Matrix Factorization under Random Initialization
by: Luo, Minrui, et al.
Published: (2025)

Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics
by: Choi, Sunmook, et al.
Published: (2025)

Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations
by: Sattar, Yahya, et al.
Published: (2025)

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
by: Hu, Miaobo, et al.
Published: (2026)

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025)

Emergent specialization from participation dynamics and multi-learner retraining
by: Dean, Sarah, et al.
Published: (2022)

Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine
by: Fazel, Reza E., et al.
Published: (2026)

Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)

Divergence-Augmented Policy Optimization
by: Wang, Qing, et al.
Published: (2025)

Federated Offline Policy Optimization with Dual Regularization
by: Yue, Sheng, et al.
Published: (2024)

Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
by: Yu, Kihyun, et al.
Published: (2026)

Universal Approximation of Operators with Transformers and Neural Integral Operators
by: Zappala, Emanuele, et al.
Published: (2024)

Near-Optimal Regret for Policy Optimization in Contextual MDPs with General Offline Function Approximation
by: Levy, Orin, et al.
Published: (2026)

Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens
by: Qiu, Junbin, et al.
Published: (2026)

Soft Adaptive Policy Optimization
by: Gao, Chang, et al.
Published: (2025)