Saved in:
| Main Author: | Lattimore, Tor |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.26547 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Diffusion Analysis of Policy Gradient for Stochastic Bandits
by: Lattimore, Tor
Published: (2026)
by: Lattimore, Tor
Published: (2026)
Bandit Convex Optimisation
by: Lattimore, Tor
Published: (2024)
by: Lattimore, Tor
Published: (2024)
Refined Detection for Gumbel Watermarking
by: Lattimore, Tor
Published: (2026)
by: Lattimore, Tor
Published: (2026)
Online Newton Method for Bandit Convex Optimisation
by: Fokkema, Hidde, et al.
Published: (2024)
by: Fokkema, Hidde, et al.
Published: (2024)
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
by: Klein, Sara, et al.
Published: (2023)
by: Klein, Sara, et al.
Published: (2023)
Beyond Softmax: A New Perspective on Gradient Bandits
by: Melo, Emerson, et al.
Published: (2025)
by: Melo, Emerson, et al.
Published: (2025)
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
by: György, András, et al.
Published: (2025)
by: György, András, et al.
Published: (2025)
Logit Dynamics in Softmax Policy Gradient Methods
by: Li, Yingru
Published: (2025)
by: Li, Yingru
Published: (2025)
Stochastic Gradient Succeeds for Bandits
by: Mei, Jincheng, et al.
Published: (2024)
by: Mei, Jincheng, et al.
Published: (2024)
Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation
by: Lin, Max Qiushi, et al.
Published: (2025)
by: Lin, Max Qiushi, et al.
Published: (2025)
On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
by: Labbi, Safwan, et al.
Published: (2025)
by: Labbi, Safwan, et al.
Published: (2025)
Annealed Softmax Greedy in Many-Armed Bayesian Bandits
by: Overman, William, et al.
Published: (2026)
by: Overman, William, et al.
Published: (2026)
First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints
by: Luo, Zhankun, et al.
Published: (2026)
by: Luo, Zhankun, et al.
Published: (2026)
Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization
by: Labbi, Safwan, et al.
Published: (2026)
by: Labbi, Safwan, et al.
Published: (2026)
Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs
by: Lu, Michael, et al.
Published: (2024)
by: Lu, Michael, et al.
Published: (2024)
Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent
by: Chang, Xiangyu, et al.
Published: (2022)
by: Chang, Xiangyu, et al.
Published: (2022)
Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss
by: Pu, Yuanhao, et al.
Published: (2026)
by: Pu, Yuanhao, et al.
Published: (2026)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
by: Montenegro, Alessandro, et al.
Published: (2024)
by: Montenegro, Alessandro, et al.
Published: (2024)
Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits
by: Liu, Jiayuan, et al.
Published: (2025)
by: Liu, Jiayuan, et al.
Published: (2025)
Fast Convergence of Softmax Policy Mirror Ascent
by: Asad, Reza, et al.
Published: (2024)
by: Asad, Reza, et al.
Published: (2024)
Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant
by: Lv, Qi, et al.
Published: (2025)
by: Lv, Qi, et al.
Published: (2025)
A Concise Lyapunov Analysis of Nesterov's Accelerated Gradient Method
by: Liu, Jun
Published: (2025)
by: Liu, Jun
Published: (2025)
Efficient Clustering in Stochastic Bandits
by: Chandran, G Dhinesh, et al.
Published: (2026)
by: Chandran, G Dhinesh, et al.
Published: (2026)
Stochastic Bandits for Egalitarian Assignment
by: Lim, Eugene, et al.
Published: (2024)
by: Lim, Eugene, et al.
Published: (2024)
Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits
by: Huang, Ziyi, et al.
Published: (2024)
by: Huang, Ziyi, et al.
Published: (2024)
Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability
by: Comte, Céline, et al.
Published: (2023)
by: Comte, Céline, et al.
Published: (2023)
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
by: Varre, Aditya, et al.
Published: (2026)
by: Varre, Aditya, et al.
Published: (2026)
HyperArm Bandit Optimization: A Novel approach to Hyperparameter Optimization and an Analysis of Bandit Algorithms in Stochastic and Adversarial Settings
by: Karroum, Samih, et al.
Published: (2025)
by: Karroum, Samih, et al.
Published: (2025)
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)
by: Sheen, Heejune, et al.
Published: (2024)
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
by: Zhang, Shangtong, et al.
Published: (2021)
by: Zhang, Shangtong, et al.
Published: (2021)
Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
by: Guan, Changkun, et al.
Published: (2026)
by: Guan, Changkun, et al.
Published: (2026)
Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning
by: Zhang, Haobin, et al.
Published: (2024)
by: Zhang, Haobin, et al.
Published: (2024)
Batched Stochastic Bandit for Nondegenerate Functions
by: Liu, Yu, et al.
Published: (2024)
by: Liu, Yu, et al.
Published: (2024)
Stochastic Bandits Robust to Adversarial Attacks
by: Wang, Xuchuang, et al.
Published: (2024)
by: Wang, Xuchuang, et al.
Published: (2024)
Lipschitz Bandits with Stochastic Delayed Feedback
by: Liu, Zhongxuan, et al.
Published: (2025)
by: Liu, Zhongxuan, et al.
Published: (2025)
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits
by: Simchi-Levi, David, et al.
Published: (2022)
by: Simchi-Levi, David, et al.
Published: (2022)
Stochastic $k$-Submodular Bandits with Full Bandit Feedback
by: Nie, Guanyu, et al.
Published: (2024)
by: Nie, Guanyu, et al.
Published: (2024)
An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability
by: Soni, Ashutosh, et al.
Published: (2026)
by: Soni, Ashutosh, et al.
Published: (2026)
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization
by: Ding, Yuhao, et al.
Published: (2021)
by: Ding, Yuhao, et al.
Published: (2021)
Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis
by: Jose, Sharu Theresa, et al.
Published: (2024)
by: Jose, Sharu Theresa, et al.
Published: (2024)
Similar Items
-
A Diffusion Analysis of Policy Gradient for Stochastic Bandits
by: Lattimore, Tor
Published: (2026) -
Bandit Convex Optimisation
by: Lattimore, Tor
Published: (2024) -
Refined Detection for Gumbel Watermarking
by: Lattimore, Tor
Published: (2026) -
Online Newton Method for Bandit Convex Optimisation
by: Fokkema, Hidde, et al.
Published: (2024) -
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
by: Klein, Sara, et al.
Published: (2023)