Saved in:
| Main Authors: | Lee, Jongmin, Ryu, Ernest K. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.17391 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs
by: Lee, Jongmin, et al.
Published: (2025)
by: Lee, Jongmin, et al.
Published: (2025)
Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
by: Lee, Jongmin, et al.
Published: (2025)
by: Lee, Jongmin, et al.
Published: (2025)
Deflated Dynamics Value Iteration
by: Lee, Jongmin, et al.
Published: (2024)
by: Lee, Jongmin, et al.
Published: (2024)
Policy Gradient Algorithms in Average-Reward Multichain MDPs
by: Lee, Jongmin, et al.
Published: (2026)
by: Lee, Jongmin, et al.
Published: (2026)
A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration
by: Haussmann, Manuel, et al.
Published: (2026)
by: Haussmann, Manuel, et al.
Published: (2026)
From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
by: Chen, Zaiwei, et al.
Published: (2025)
by: Chen, Zaiwei, et al.
Published: (2025)
Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning
by: Manda, Kausthubh, et al.
Published: (2025)
by: Manda, Kausthubh, et al.
Published: (2025)
Generalized Fitted Q-Iteration with Clustered Data
by: Hu, Liyuan, et al.
Published: (2025)
by: Hu, Liyuan, et al.
Published: (2025)
Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games
by: Jeong, Narim, et al.
Published: (2026)
by: Jeong, Narim, et al.
Published: (2026)
Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases
by: Mustafin, Arsenii, et al.
Published: (2025)
by: Mustafin, Arsenii, et al.
Published: (2025)
Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
by: van der Laan, Lars, et al.
Published: (2025)
by: van der Laan, Lars, et al.
Published: (2025)
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span
by: Chae, Woojin, et al.
Published: (2024)
by: Chae, Woojin, et al.
Published: (2024)
LoRA Training in the NTK Regime has No Spurious Local Minima
by: Jang, Uijeong, et al.
Published: (2024)
by: Jang, Uijeong, et al.
Published: (2024)
Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
by: Wei, Yukuan, et al.
Published: (2025)
by: Wei, Yukuan, et al.
Published: (2025)
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
by: Xu, Yang, et al.
Published: (2025)
by: Xu, Yang, et al.
Published: (2025)
Finite-Time Analysis of Simultaneous Double Q-learning
by: Na, Hyunjun, et al.
Published: (2024)
by: Na, Hyunjun, et al.
Published: (2024)
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
by: Bruns-Smith, David, et al.
Published: (2023)
by: Bruns-Smith, David, et al.
Published: (2023)
Fitted Q-Iteration via Max-Plus-Linear Approximation
by: Liu, Y., et al.
Published: (2024)
by: Liu, Y., et al.
Published: (2024)
LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)
by: Kim, Junsu, et al.
Published: (2025)
by: Kim, Junsu, et al.
Published: (2025)
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes
by: Wan, Yi, et al.
Published: (2024)
by: Wan, Yi, et al.
Published: (2024)
Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle
by: Chen, Zijun, et al.
Published: (2026)
by: Chen, Zijun, et al.
Published: (2026)
Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning
by: Jiao, Yuchen, et al.
Published: (2026)
by: Jiao, Yuchen, et al.
Published: (2026)
Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach
by: Jeong, Narim, et al.
Published: (2024)
by: Jeong, Narim, et al.
Published: (2024)
Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
by: Xu, Yang, et al.
Published: (2025)
by: Xu, Yang, et al.
Published: (2025)
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
by: Choi, Moonseok, et al.
Published: (2023)
by: Choi, Moonseok, et al.
Published: (2023)
A Finite Sample Complexity Bound for Distributionally Robust Q-learning
by: Wang, Shengbo, et al.
Published: (2023)
by: Wang, Shengbo, et al.
Published: (2023)
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs
by: Hong, Kihyuk, et al.
Published: (2024)
by: Hong, Kihyuk, et al.
Published: (2024)
Bandit Simulation for Average Reward Inference
by: Praharaj, Samya, et al.
Published: (2026)
by: Praharaj, Samya, et al.
Published: (2026)
Convergence Analyses of Davis-Yin Splitting via Scaled Relative Graphs
by: Lee, Jongmin, et al.
Published: (2022)
by: Lee, Jongmin, et al.
Published: (2022)
Finite-Time Error Bounds for Greedy-GQ
by: Wang, Yue, et al.
Published: (2022)
by: Wang, Yue, et al.
Published: (2022)
Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ
by: Lim, Han-Dong, et al.
Published: (2024)
by: Lim, Han-Dong, et al.
Published: (2024)
Sequential Flow Straightening for Generative Modeling
by: Yoon, Jongmin, et al.
Published: (2024)
by: Yoon, Jongmin, et al.
Published: (2024)
Exponential Convergence Guarantees for Iterative Markovian Fitting
by: Silveri, Marta Gentiloni, et al.
Published: (2025)
by: Silveri, Marta Gentiloni, et al.
Published: (2025)
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
by: Lee, Jongmin, et al.
Published: (2025)
by: Lee, Jongmin, et al.
Published: (2025)
Sharpness-Aware Minimization Can Hallucinate Minimizers
by: Park, Chanwoong, et al.
Published: (2025)
by: Park, Chanwoong, et al.
Published: (2025)
Average-Reward Soft Actor-Critic
by: Adamczyk, Jacob, et al.
Published: (2025)
by: Adamczyk, Jacob, et al.
Published: (2025)
Finite-Time Logarithmic Bayes Regret Upper Bounds
by: Atsidakou, Alexia, et al.
Published: (2023)
by: Atsidakou, Alexia, et al.
Published: (2023)
Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret
by: Yu, Kihyun, et al.
Published: (2026)
by: Yu, Kihyun, et al.
Published: (2026)
Exponential convergence rate for Iterative Markovian Fitting
by: Sokolov, Kirill, et al.
Published: (2025)
by: Sokolov, Kirill, et al.
Published: (2025)
Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
by: Patel, Bhrij, et al.
Published: (2024)
by: Patel, Bhrij, et al.
Published: (2024)
Similar Items
-
Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs
by: Lee, Jongmin, et al.
Published: (2025) -
Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
by: Lee, Jongmin, et al.
Published: (2025) -
Deflated Dynamics Value Iteration
by: Lee, Jongmin, et al.
Published: (2024) -
Policy Gradient Algorithms in Average-Reward Multichain MDPs
by: Lee, Jongmin, et al.
Published: (2026) -
A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration
by: Haussmann, Manuel, et al.
Published: (2026)