:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Lattimore, Tor
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.26547
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Diffusion Analysis of Policy Gradient for Stochastic Bandits
by: Lattimore, Tor
Published: (2026)

Bandit Convex Optimisation
by: Lattimore, Tor
Published: (2024)

Refined Detection for Gumbel Watermarking
by: Lattimore, Tor
Published: (2026)

Online Newton Method for Bandit Convex Optimisation
by: Fokkema, Hidde, et al.
Published: (2024)

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
by: Klein, Sara, et al.
Published: (2023)

Beyond Softmax: A New Perspective on Gradient Bandits
by: Melo, Emerson, et al.
Published: (2025)

Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
by: György, András, et al.
Published: (2025)

Logit Dynamics in Softmax Policy Gradient Methods
by: Li, Yingru
Published: (2025)

Stochastic Gradient Succeeds for Bandits
by: Mei, Jincheng, et al.
Published: (2024)

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation
by: Lin, Max Qiushi, et al.
Published: (2025)

On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
by: Labbi, Safwan, et al.
Published: (2025)

Annealed Softmax Greedy in Many-Armed Bayesian Bandits
by: Overman, William, et al.
Published: (2026)

First-Order Softmax Weighted Switching Gradient Method for Distributed Stochastic Minimax Optimization with Stochastic Constraints
by: Luo, Zhankun, et al.
Published: (2026)

Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization
by: Labbi, Safwan, et al.
Published: (2026)

Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs
by: Lu, Michael, et al.
Published: (2024)

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent
by: Chang, Xiangyu, et al.
Published: (2022)

Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss
by: Pu, Yuanhao, et al.
Published: (2026)

Learning Optimal Deterministic Policies with Stochastic Policy Gradients
by: Montenegro, Alessandro, et al.
Published: (2024)

Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits
by: Liu, Jiayuan, et al.
Published: (2025)

Fast Convergence of Softmax Policy Mirror Ascent
by: Asad, Reza, et al.
Published: (2024)

Adaptive Sparse Softmax: An Effective and Efficient Softmax Variant
by: Lv, Qi, et al.
Published: (2025)

A Concise Lyapunov Analysis of Nesterov's Accelerated Gradient Method
by: Liu, Jun
Published: (2025)

Efficient Clustering in Stochastic Bandits
by: Chandran, G Dhinesh, et al.
Published: (2026)

Stochastic Bandits for Egalitarian Assignment
by: Lim, Eugene, et al.
Published: (2024)

Bayesian Bandit Algorithms with Approximate Inference in Stochastic Linear Bandits
by: Huang, Ziyi, et al.
Published: (2024)

Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability
by: Comte, Céline, et al.
Published: (2023)

Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
by: Varre, Aditya, et al.
Published: (2026)

HyperArm Bandit Optimization: A Novel approach to Hyperparameter Optimization and an Analysis of Bandit Algorithms in Stochastic and Adversarial Settings
by: Karroum, Samih, et al.
Published: (2025)

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
by: Sheen, Heejune, et al.
Published: (2024)

Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
by: Zhang, Shangtong, et al.
Published: (2021)

Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?
by: Guan, Changkun, et al.
Published: (2026)

Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning
by: Zhang, Haobin, et al.
Published: (2024)

Batched Stochastic Bandit for Nondegenerate Functions
by: Liu, Yu, et al.
Published: (2024)

Stochastic Bandits Robust to Adversarial Attacks
by: Wang, Xuchuang, et al.
Published: (2024)

Lipschitz Bandits with Stochastic Delayed Feedback
by: Liu, Zhongxuan, et al.
Published: (2025)

A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits
by: Simchi-Levi, David, et al.
Published: (2022)

Stochastic $k$-Submodular Bandits with Full Bandit Feedback
by: Nie, Guanyu, et al.
Published: (2024)

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability
by: Soni, Ashutosh, et al.
Published: (2026)

Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization
by: Ding, Yuhao, et al.
Published: (2021)

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis
by: Jose, Sharu Theresa, et al.
Published: (2024)