:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tiapkin, Daniil, Belomestny, Denis, Calandriello, Daniele, Moulines, Eric, Munos, Remi, Naumov, Alexey, Perrault, Pierre, Valko, Michal, Menard, Pierre
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2310.18186
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Demonstration-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023)

A New Bound on the Cumulant Generating Function of Dirichlet Processes
by: Perrault, Pierre, et al.
Published: (2024)

Proximal Point Nash Learning from Human Feedback
by: Tiapkin, Daniil, et al.
Published: (2025)

Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
by: Samsonov, Sergey, et al.
Published: (2023)

Rates of convergence for density estimation with generative adversarial networks
by: Puchkin, Nikita, et al.
Published: (2021)

Planning in entropy-regularized Markov decision processes and games
by: Grill, Jean-Bastien, et al.
Published: (2026)

On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
by: Labbi, Safwan, et al.
Published: (2025)

Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization
by: Labbi, Safwan, et al.
Published: (2026)

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
by: Sheshukova, Marina, et al.
Published: (2024)

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
by: Grill, Jean-Bastien, et al.
Published: (2026)

Statistical analysis of Inverse Entropy-regularized Reinforcement Learning
by: Belomestny, Denis, et al.
Published: (2025)

Bandits attack function optimization
by: Preux, Philippe, et al.
Published: (2026)

Stochastic simultaneous optimistic optimization
by: Valko, Michal, et al.
Published: (2026)

Large-scale semi-supervised learning with online spectral graph sparsification
by: Calandriello, Daniele, et al.
Published: (2026)

Analysis of Nystrom method with sequential ridge leverage scores
by: Calandriello, Daniele, et al.
Published: (2026)

Pack only the essentials: Adaptive dictionary learning for kernel ridge regression
by: Calandriello, Daniele, et al.
Published: (2026)

Covariance-adapting algorithm for semi-bandits with application to sparse rewards
by: Perrault, Pierre, et al.
Published: (2026)

Schrödinger bridge problem via empirical risk minimization
by: Belomestny, Denis, et al.
Published: (2026)

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms
by: Belomestny, Denis, et al.
Published: (2021)

Generative Flow Networks as Entropy-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023)

Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
by: Ocello, Antonio, et al.
Published: (2025)

Optimal Design for Reward Modeling in RLHF
by: Scheid, Antoine, et al.
Published: (2024)

Refined Analysis of Entropy-Regularized Actor-Critic
by: Labbi, Safwan, et al.
Published: (2026)

Black-box optimization of noisy functions with unknown smoothness
by: Grill, Jean-Bastien, et al.
Published: (2026)

Tight Bounds for Schrödinger Potential Estimation in Unpaired Data Translation
by: Puchkin, Nikita, et al.
Published: (2025)

Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent
by: Sheshukova, Marina, et al.
Published: (2025)

Generalized Preference Optimization: A Unified Approach to Offline Alignment
by: Tang, Yunhao, et al.
Published: (2024)

Spectral Thompson sampling
by: Kocak, Tomas, et al.
Published: (2026)

VA-learning as a more efficient alternative to Q-learning
by: Tang, Yunhao, et al.
Published: (2023)

Spectral bandits for smooth graph functions
by: Valko, Michal, et al.
Published: (2026)

Efficient learning by implicit exploration in bandit problems with side observations
by: Kocak, Tomas, et al.
Published: (2026)

Improved large-scale graph learning through ridge spectral sparsification
by: Calandriello, Daniele, et al.
Published: (2026)

Sample complexity of Schrödinger potential estimation
by: Puchkin, Nikita, et al.
Published: (2025)

Budgeted Online Influence Maximization
by: Perrault, Pierre, et al.
Published: (2026)

Improving GFlowNets with Monte Carlo Tree Search
by: Morozov, Nikita, et al.
Published: (2024)

Theoretical guarantees for neural control variates in MCMC
by: Belomestny, Denis, et al.
Published: (2023)

Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
by: Labbi, Safwan, et al.
Published: (2024)

A note on concentration inequalities for the overlapped batch mean variance estimators for Markov chains
by: Moulines, Eric, et al.
Published: (2025)

On Teacher Hacking in Language Model Distillation
by: Tiapkin, Daniil, et al.
Published: (2025)

A single algorithm for both restless and rested rotting bandits
by: Seznec, Julien, et al.
Published: (2026)