Saved in:
| Main Authors: | Xiong, Zhihan, Fazel, Maryam, Xiao, Lin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.01249 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
by: Bose, Avinandan, et al.
Published: (2024)
by: Bose, Avinandan, et al.
Published: (2024)
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
by: Maynard-Zhang, Leo, et al.
Published: (2026)
by: Maynard-Zhang, Leo, et al.
Published: (2026)
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
by: Xiong, Zhihan, et al.
Published: (2023)
by: Xiong, Zhihan, et al.
Published: (2023)
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025)
by: Bose, Avinandan, et al.
Published: (2025)
Offline congestion games: How feedback type affects data coverage requirement
by: Jiang, Haozhe, et al.
Published: (2022)
by: Jiang, Haozhe, et al.
Published: (2022)
A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
by: Jiang, Haozhe, et al.
Published: (2023)
by: Jiang, Haozhe, et al.
Published: (2023)
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
by: Zhou, Runlong, et al.
Published: (2025)
by: Zhou, Runlong, et al.
Published: (2025)
Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing
by: Arasteh, Fazel, et al.
Published: (2025)
by: Arasteh, Fazel, et al.
Published: (2025)
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
by: Xu, Weihang, et al.
Published: (2024)
by: Xu, Weihang, et al.
Published: (2024)
Local linear convergence of gradient methods for overparameterized Gaussian mixtures
by: Wang, Jingxing, et al.
Published: (2026)
by: Wang, Jingxing, et al.
Published: (2026)
Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures
by: Zhou, Mo, et al.
Published: (2025)
by: Zhou, Mo, et al.
Published: (2025)
Offline Multi-task Transfer RL with Representational Penalization
by: Bose, Avinandan, et al.
Published: (2024)
by: Bose, Avinandan, et al.
Published: (2024)
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
by: Bose, Avinandan, et al.
Published: (2025)
by: Bose, Avinandan, et al.
Published: (2025)
Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models
by: Zhu, Libin, et al.
Published: (2026)
by: Zhu, Libin, et al.
Published: (2026)
Learning Optimal Tax Design in Nonatomic Congestion Games
by: Cui, Qiwen, et al.
Published: (2024)
by: Cui, Qiwen, et al.
Published: (2024)
Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs
by: Chen, Shulun, et al.
Published: (2025)
by: Chen, Shulun, et al.
Published: (2025)
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback
by: Chen, Shulun, et al.
Published: (2025)
by: Chen, Shulun, et al.
Published: (2025)
Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates
by: Roulet, Vincent, et al.
Published: (2022)
by: Roulet, Vincent, et al.
Published: (2022)
Dynamics of Learning under User Choice: Overspecialization and Peer-Model Probing
by: Narang, Adhyyan, et al.
Published: (2026)
by: Narang, Adhyyan, et al.
Published: (2026)
Iteratively reweighted kernel machines efficiently learn sparse functions
by: Zhu, Libin, et al.
Published: (2025)
by: Zhu, Libin, et al.
Published: (2025)
Finite Sample Identification of Partially Observed Bilinear Dynamical Systems
by: Sattar, Yahya, et al.
Published: (2025)
by: Sattar, Yahya, et al.
Published: (2025)
Online SuBmodular + SuPermodular (BP) Maximization with Bandit Feedback
by: Narang, Adhyyan, et al.
Published: (2022)
by: Narang, Adhyyan, et al.
Published: (2022)
Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
High-dimensional Limit of SGD for Diagonal Linear Networks
by: Malaxechebarría, Begoña García, et al.
Published: (2026)
by: Malaxechebarría, Begoña García, et al.
Published: (2026)
Optimization and generalization analysis for two-layer physics-informed neural networks without over-parametrization
by: Zeng, Zhihan, et al.
Published: (2025)
by: Zeng, Zhihan, et al.
Published: (2025)
Global Convergence of Four-Layer Matrix Factorization under Random Initialization
by: Luo, Minrui, et al.
Published: (2025)
by: Luo, Minrui, et al.
Published: (2025)
Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics
by: Choi, Sunmook, et al.
Published: (2025)
by: Choi, Sunmook, et al.
Published: (2025)
Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations
by: Sattar, Yahya, et al.
Published: (2025)
by: Sattar, Yahya, et al.
Published: (2025)
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
by: Hu, Miaobo, et al.
Published: (2026)
by: Hu, Miaobo, et al.
Published: (2026)
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025)
by: Shi, Ruizhe, et al.
Published: (2025)
Emergent specialization from participation dynamics and multi-learner retraining
by: Dean, Sarah, et al.
Published: (2022)
by: Dean, Sarah, et al.
Published: (2022)
Improving Credit Card Fraud Detection with an Optimized Explainable Boosting Machine
by: Fazel, Reza E., et al.
Published: (2026)
by: Fazel, Reza E., et al.
Published: (2026)
Self-Consistency Preference Optimization
by: Prasad, Archiki, et al.
Published: (2024)
by: Prasad, Archiki, et al.
Published: (2024)
Divergence-Augmented Policy Optimization
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Federated Offline Policy Optimization with Dual Regularization
by: Yue, Sheng, et al.
Published: (2024)
by: Yue, Sheng, et al.
Published: (2024)
Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
by: Yu, Kihyun, et al.
Published: (2026)
by: Yu, Kihyun, et al.
Published: (2026)
Universal Approximation of Operators with Transformers and Neural Integral Operators
by: Zappala, Emanuele, et al.
Published: (2024)
by: Zappala, Emanuele, et al.
Published: (2024)
Near-Optimal Regret for Policy Optimization in Contextual MDPs with General Offline Function Approximation
by: Levy, Orin, et al.
Published: (2026)
by: Levy, Orin, et al.
Published: (2026)
Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens
by: Qiu, Junbin, et al.
Published: (2026)
by: Qiu, Junbin, et al.
Published: (2026)
Soft Adaptive Policy Optimization
by: Gao, Chang, et al.
Published: (2025)
by: Gao, Chang, et al.
Published: (2025)
Similar Items
-
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
by: Bose, Avinandan, et al.
Published: (2024) -
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
by: Maynard-Zhang, Leo, et al.
Published: (2026) -
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
by: Xiong, Zhihan, et al.
Published: (2023) -
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
by: Bose, Avinandan, et al.
Published: (2025) -
Offline congestion games: How feedback type affects data coverage requirement
by: Jiang, Haozhe, et al.
Published: (2022)