Saved in:
| Main Authors: | Yu, Kihyun, Baek, Beomhan, Lee, Dabeen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11586 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards
by: Yu, Kihyun, et al.
Published: (2026)
by: Yu, Kihyun, et al.
Published: (2026)
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
by: Yu, Kihyun, et al.
Published: (2024)
by: Yu, Kihyun, et al.
Published: (2024)
Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation
by: Chae, Woojin, et al.
Published: (2024)
by: Chae, Woojin, et al.
Published: (2024)
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span
by: Chae, Woojin, et al.
Published: (2024)
by: Chae, Woojin, et al.
Published: (2024)
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
by: Baek, Beomhan, et al.
Published: (2025)
by: Baek, Beomhan, et al.
Published: (2025)
Primal-Dual Policy Optimization for Linear CMDPs with Adversarial Losses
by: Yu, Kihyun, et al.
Published: (2026)
by: Yu, Kihyun, et al.
Published: (2026)
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes
by: Wan, Yi, et al.
Published: (2024)
by: Wan, Yi, et al.
Published: (2024)
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
by: Song, Minhak, et al.
Published: (2025)
by: Song, Minhak, et al.
Published: (2025)
Parameter-Free Algorithms for Performative Regret Minimization under Decision-Dependent Distributions
by: Park, Sungwoo, et al.
Published: (2024)
by: Park, Sungwoo, et al.
Published: (2024)
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
by: Boone, Victor, et al.
Published: (2024)
by: Boone, Victor, et al.
Published: (2024)
Stochastic-Constrained Stochastic Optimization with Markovian Data
by: Kim, Yeongjong, et al.
Published: (2023)
by: Kim, Yeongjong, et al.
Published: (2023)
Infinite-Horizon Reinforcement Learning with Multinomial Logistic Function Approximation
by: Park, Jaehyun, et al.
Published: (2024)
by: Park, Jaehyun, et al.
Published: (2024)
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
by: Zurek, Matthew, et al.
Published: (2024)
by: Zurek, Matthew, et al.
Published: (2024)
Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs
by: Yoon, Hoyeol, et al.
Published: (2026)
by: Yoon, Hoyeol, et al.
Published: (2026)
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning
by: Chen, Zijun, et al.
Published: (2025)
by: Chen, Zijun, et al.
Published: (2025)
Quotient-Categorical Representations for Bellman-Compatible Average-Reward Distributional Reinforcement Learning
by: Kaya, Ege C., et al.
Published: (2026)
by: Kaya, Ege C., et al.
Published: (2026)
Optimal Sample Complexity for Average Reward Markov Decision Processes
by: Wang, Shengbo, et al.
Published: (2023)
by: Wang, Shengbo, et al.
Published: (2023)
Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values
by: Wang, Shengbo, et al.
Published: (2026)
by: Wang, Shengbo, et al.
Published: (2026)
Tail Distribution of Regret in Optimistic Reinforcement Learning
by: Khodadadian, Sajad, et al.
Published: (2025)
by: Khodadadian, Sajad, et al.
Published: (2025)
Reinforcement Learning and Regret Bounds for Admission Control
by: Weber, Lucas, et al.
Published: (2024)
by: Weber, Lucas, et al.
Published: (2024)
Bellman Optimality of Average-Reward Robust Markov Decision Processes with a Constant Gain
by: Wang, Shengbo, et al.
Published: (2025)
by: Wang, Shengbo, et al.
Published: (2025)
Learning Decentralized Linear Quadratic Regulators with $\sqrt{T}$ Regret
by: Ye, Lintao, et al.
Published: (2022)
by: Ye, Lintao, et al.
Published: (2022)
Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs
by: Omidi, Saber, et al.
Published: (2025)
by: Omidi, Saber, et al.
Published: (2025)
Span-Based Optimal Sample Complexity for Average Reward MDPs
by: Zurek, Matthew, et al.
Published: (2023)
by: Zurek, Matthew, et al.
Published: (2023)
Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems
by: Ziemann, Ingvar, et al.
Published: (2022)
by: Ziemann, Ingvar, et al.
Published: (2022)
Adaptivity and Universality: Problem-dependent Universal Regret for Online Convex Optimization
by: Zhao, Peng, et al.
Published: (2025)
by: Zhao, Peng, et al.
Published: (2025)
Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback
by: Ba, Wenjia, et al.
Published: (2021)
by: Ba, Wenjia, et al.
Published: (2021)
Regret of exploratory policy improvement and $q$-learning
by: Tang, Wenpin, et al.
Published: (2024)
by: Tang, Wenpin, et al.
Published: (2024)
Lagrangian Index Policy for Restless Bandits with Average Reward
by: Avrachenkov, Konstantin, et al.
Published: (2024)
by: Avrachenkov, Konstantin, et al.
Published: (2024)
Robust Regression over Averaged Uncertainty
by: Bertsimas, Dimitris, et al.
Published: (2023)
by: Bertsimas, Dimitris, et al.
Published: (2023)
Regret Analysis: a control perspective
by: Gibson, Travis E., et al.
Published: (2025)
by: Gibson, Travis E., et al.
Published: (2025)
Wasserstein Distributionally Robust Regret Optimization
by: Fiechtner, Lukas-Benedikt, et al.
Published: (2025)
by: Fiechtner, Lukas-Benedikt, et al.
Published: (2025)
Rethinking PCA Through Duality
by: Quan, Jan, et al.
Published: (2025)
by: Quan, Jan, et al.
Published: (2025)
Planning and Learning in Average Risk-aware MDPs
by: Wang, Weikai, et al.
Published: (2025)
by: Wang, Weikai, et al.
Published: (2025)
Kernel Mean Embedding Topology: Weak and Strong Forms for Stochastic Kernels and Implications for Model Learning
by: Saldi, Naci, et al.
Published: (2025)
by: Saldi, Naci, et al.
Published: (2025)
The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis
by: Zurek, Matthew, et al.
Published: (2024)
by: Zurek, Matthew, et al.
Published: (2024)
Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL
by: Zurek, Matthew, et al.
Published: (2025)
by: Zurek, Matthew, et al.
Published: (2025)
Almost Surely $\sqrt{T}$ Regret for Adaptive LQR
by: Lu, Yiwen, et al.
Published: (2023)
by: Lu, Yiwen, et al.
Published: (2023)
Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems
by: Huang, Yilie, et al.
Published: (2024)
by: Huang, Yilie, et al.
Published: (2024)
Rate-Optimal Regret for the Safe Learning-based Control of the Constrained Linear Quadratic Regulator
by: Hutchinson, Spencer, et al.
Published: (2026)
by: Hutchinson, Spencer, et al.
Published: (2026)
Similar Items
-
Near-Optimal Primal-Dual Algorithm for Learning Linear Mixture CMDPs with Adversarial Rewards
by: Yu, Kihyun, et al.
Published: (2026) -
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
by: Yu, Kihyun, et al.
Published: (2024) -
Provably Efficient Infinite-Horizon Average-Reward Reinforcement Learning with Linear Function Approximation
by: Chae, Woojin, et al.
Published: (2024) -
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span
by: Chae, Woojin, et al.
Published: (2024) -
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
by: Baek, Beomhan, et al.
Published: (2025)