Saved in:
| Main Authors: | Tian, Haoxing, Chen, Zaiwei, Paschalidis, Ioannis Ch., Olshevsky, Alex |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.02103 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling
by: Tian, Haoxing, et al.
Published: (2024)
by: Tian, Haoxing, et al.
Published: (2024)
Closing the gap between SVRG and TD-SVRG with Gradient Splitting
by: Mustafin, Arsenii, et al.
Published: (2022)
by: Mustafin, Arsenii, et al.
Published: (2022)
On Value Iteration Convergence in Connected MDPs
by: Mustafin, Arsenii, et al.
Published: (2024)
by: Mustafin, Arsenii, et al.
Published: (2024)
Analysis of Value Iteration Through Absolute Probability Sequences
by: Mustafin, Arsenii, et al.
Published: (2025)
by: Mustafin, Arsenii, et al.
Published: (2025)
Geometric Re-Analysis of Classical MDP Solving Algorithms
by: Mustafin, Arsenii, et al.
Published: (2025)
by: Mustafin, Arsenii, et al.
Published: (2025)
MDP Geometry, Normalization and Reward Balancing Solvers
by: Mustafin, Arsenii, et al.
Published: (2024)
by: Mustafin, Arsenii, et al.
Published: (2024)
Distributionally Robust Learning in Survival Analysis
by: Jin, Yeping, et al.
Published: (2025)
by: Jin, Yeping, et al.
Published: (2025)
Adversarial Imitation Learning from Visual Observations using Latent Information
by: Giammarino, Vittorio, et al.
Published: (2023)
by: Giammarino, Vittorio, et al.
Published: (2023)
Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning
by: Giammarino, Vittorio, et al.
Published: (2024)
by: Giammarino, Vittorio, et al.
Published: (2024)
Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees
by: Chen, Yilei, et al.
Published: (2024)
by: Chen, Yilei, et al.
Published: (2024)
Multiple-policy Evaluation via Density Estimation
by: Chen, Yilei, et al.
Published: (2024)
by: Chen, Yilei, et al.
Published: (2024)
Improving Adaptive Online Learning Using Refined Discretization
by: Zhang, Zhiyu, et al.
Published: (2023)
by: Zhang, Zhiyu, et al.
Published: (2023)
From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
by: Chen, Zaiwei, et al.
Published: (2025)
by: Chen, Zaiwei, et al.
Published: (2025)
Distributionally Robust Token Optimization in RLHF
by: Jin, Yeping, et al.
Published: (2026)
by: Jin, Yeping, et al.
Published: (2026)
DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation
by: Hu, Jiaming, et al.
Published: (2025)
by: Hu, Jiaming, et al.
Published: (2025)
A Model-Based Approach for Improving Reinforcement Learning Efficiency Leveraging Expert Observations
by: Ozcan, Erhan Can, et al.
Published: (2024)
by: Ozcan, Erhan Can, et al.
Published: (2024)
Network Epidemic Control via Model Predictive Control: Extended Version
by: Talaei, Mahtab, et al.
Published: (2026)
by: Talaei, Mahtab, et al.
Published: (2026)
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse
by: Queeney, James, et al.
Published: (2022)
by: Queeney, James, et al.
Published: (2022)
Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle
by: Chen, Zijun, et al.
Published: (2026)
by: Chen, Zijun, et al.
Published: (2026)
Optimal Transport Perturbations for Safe Reinforcement Learning with Robustness Guarantees
by: Queeney, James, et al.
Published: (2023)
by: Queeney, James, et al.
Published: (2023)
Convex SGD: Generalization Without Early Stopping
by: Hendrickx, Julien, et al.
Published: (2024)
by: Hendrickx, Julien, et al.
Published: (2024)
Smooth Ranking SVM via Cutting-Plane Method
by: Ozcan, Erhan Can, et al.
Published: (2024)
by: Ozcan, Erhan Can, et al.
Published: (2024)
Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning
by: Yin, Shuyu, et al.
Published: (2024)
by: Yin, Shuyu, et al.
Published: (2024)
Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor
by: Grand-Clément, Julien, et al.
Published: (2023)
by: Grand-Clément, Julien, et al.
Published: (2023)
A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies
by: Nanda, Phalguni, et al.
Published: (2025)
by: Nanda, Phalguni, et al.
Published: (2025)
Towards General Preference Alignment: Diffusion Models at Nash Equilibrium
by: Hu, Jiaming, et al.
Published: (2026)
by: Hu, Jiaming, et al.
Published: (2026)
Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens
by: Moghaddam, Amirreza Neshaei, et al.
Published: (2024)
by: Moghaddam, Amirreza Neshaei, et al.
Published: (2024)
Network-Based Epidemic Control Through Optimal Travel and Quarantine Management
by: Talaei, Mahtab, et al.
Published: (2024)
by: Talaei, Mahtab, et al.
Published: (2024)
Learning to Reason Efficiently with Discounted Reinforcement Learning
by: Ayoub, Alex, et al.
Published: (2025)
by: Ayoub, Alex, et al.
Published: (2025)
Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions
by: Hamza, Ishaq, et al.
Published: (2026)
by: Hamza, Ishaq, et al.
Published: (2026)
Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework
by: Nanda, Phalguni, et al.
Published: (2026)
by: Nanda, Phalguni, et al.
Published: (2026)
ASPEST: Bridging the Gap Between Active Learning and Selective Prediction
by: Chen, Jiefeng, et al.
Published: (2023)
by: Chen, Jiefeng, et al.
Published: (2023)
Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts
by: Xiong, Xinlei, et al.
Published: (2025)
by: Xiong, Xinlei, et al.
Published: (2025)
Closing the Gap between TD Learning and Supervised Learning with $Q$-Conditioned Maximization
by: Lei, Xing, et al.
Published: (2025)
by: Lei, Xing, et al.
Published: (2025)
Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework
by: Chen, Zaiwei, et al.
Published: (2026)
by: Chen, Zaiwei, et al.
Published: (2026)
Sample Complexity of Linear Quadratic Regulator Without Initial Stability
by: Moghaddam, Amirreza Neshaei, et al.
Published: (2025)
by: Moghaddam, Amirreza Neshaei, et al.
Published: (2025)
Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation
by: Wang, Leo Muxing, et al.
Published: (2026)
by: Wang, Leo Muxing, et al.
Published: (2026)
Data Deletion Can Help in Adaptive RL
by: Budhraja, Param, et al.
Published: (2026)
by: Budhraja, Param, et al.
Published: (2026)
Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases
by: Mustafin, Arsenii, et al.
Published: (2025)
by: Mustafin, Arsenii, et al.
Published: (2025)
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View
by: Ghugare, Raj, et al.
Published: (2024)
by: Ghugare, Raj, et al.
Published: (2024)
Similar Items
-
One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling
by: Tian, Haoxing, et al.
Published: (2024) -
Closing the gap between SVRG and TD-SVRG with Gradient Splitting
by: Mustafin, Arsenii, et al.
Published: (2022) -
On Value Iteration Convergence in Connected MDPs
by: Mustafin, Arsenii, et al.
Published: (2024) -
Analysis of Value Iteration Through Absolute Probability Sequences
by: Mustafin, Arsenii, et al.
Published: (2025) -
Geometric Re-Analysis of Classical MDP Solving Algorithms
by: Mustafin, Arsenii, et al.
Published: (2025)