:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Jongmin, Ryu, Ernest K.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.17391
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Why Policy Gradient Algorithms Work for Undiscounted Total-Reward MDPs
by: Lee, Jongmin, et al.
Published: (2025)

Optimal Non-Asymptotic Rates of Value Iteration for Average-Reward Markov Decision Processes
by: Lee, Jongmin, et al.
Published: (2025)

Deflated Dynamics Value Iteration
by: Lee, Jongmin, et al.
Published: (2024)

Policy Gradient Algorithms in Average-Reward Multichain MDPs
by: Lee, Jongmin, et al.
Published: (2026)

A Measure-Theoretic Finite-Sample Theory for Adaptive-Data Fitted Q-Iteration
by: Haussmann, Manuel, et al.
Published: (2026)

From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
by: Chen, Zaiwei, et al.
Published: (2025)

Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning
by: Manda, Kausthubh, et al.
Published: (2025)

Generalized Fitted Q-Iteration with Clustered Data
by: Hu, Liyuan, et al.
Published: (2025)

Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games
by: Jeong, Narim, et al.
Published: (2026)

Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases
by: Mustafin, Arsenii, et al.
Published: (2025)

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
by: van der Laan, Lars, et al.
Published: (2025)

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span
by: Chae, Woojin, et al.
Published: (2024)

LoRA Training in the NTK Regime has No Spurious Local Minima
by: Jang, Uijeong, et al.
Published: (2024)

Near-Optimal Sample Complexity Bounds for Constrained Average-Reward MDPs
by: Wei, Yukuan, et al.
Published: (2025)

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
by: Xu, Yang, et al.
Published: (2025)

Finite-Time Analysis of Simultaneous Double Q-learning
by: Na, Hyunjun, et al.
Published: (2024)

Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
by: Bruns-Smith, David, et al.
Published: (2023)

Fitted Q-Iteration via Max-Plus-Linear Approximation
by: Liu, Y., et al.
Published: (2024)

LoRA Training Provably Converges to a Low-Rank Global Minimum or It Fails Loudly (But it Probably Won't Fail)
by: Kim, Junsu, et al.
Published: (2025)

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes
by: Wan, Yi, et al.
Published: (2024)

Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle
by: Chen, Zijun, et al.
Published: (2026)

Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning
by: Jiao, Yuchen, et al.
Published: (2026)

Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach
by: Jeong, Narim, et al.
Published: (2024)

Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
by: Xu, Yang, et al.
Published: (2025)

Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
by: Choi, Moonseok, et al.
Published: (2023)

A Finite Sample Complexity Bound for Distributionally Robust Q-learning
by: Wang, Shengbo, et al.
Published: (2023)

Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs
by: Hong, Kihyuk, et al.
Published: (2024)

Bandit Simulation for Average Reward Inference
by: Praharaj, Samya, et al.
Published: (2026)

Convergence Analyses of Davis-Yin Splitting via Scaled Relative Graphs
by: Lee, Jongmin, et al.
Published: (2022)

Finite-Time Error Bounds for Greedy-GQ
by: Wang, Yue, et al.
Published: (2022)

Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ
by: Lim, Han-Dong, et al.
Published: (2024)

Sequential Flow Straightening for Generative Modeling
by: Yoon, Jongmin, et al.
Published: (2024)

Exponential Convergence Guarantees for Iterative Markovian Fitting
by: Silveri, Marta Gentiloni, et al.
Published: (2025)

SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
by: Lee, Jongmin, et al.
Published: (2025)

Sharpness-Aware Minimization Can Hallucinate Minimizers
by: Park, Chanwoong, et al.
Published: (2025)

Average-Reward Soft Actor-Critic
by: Adamczyk, Jacob, et al.
Published: (2025)

Finite-Time Logarithmic Bayes Regret Upper Bounds
by: Atsidakou, Alexia, et al.
Published: (2023)

Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret
by: Yu, Kihyun, et al.
Published: (2026)

Exponential convergence rate for Iterative Markovian Fitting
by: Sokolov, Kirill, et al.
Published: (2025)

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles
by: Patel, Bhrij, et al.
Published: (2024)