Saved in:
| Main Authors: | Tiapkin, Daniil, Belomestny, Denis, Calandriello, Daniele, Moulines, Eric, Munos, Remi, Naumov, Alexey, Perrault, Pierre, Valko, Michal, Menard, Pierre |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.18186 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Demonstration-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023)
by: Tiapkin, Daniil, et al.
Published: (2023)
A New Bound on the Cumulant Generating Function of Dirichlet Processes
by: Perrault, Pierre, et al.
Published: (2024)
by: Perrault, Pierre, et al.
Published: (2024)
Proximal Point Nash Learning from Human Feedback
by: Tiapkin, Daniil, et al.
Published: (2025)
by: Tiapkin, Daniil, et al.
Published: (2025)
Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
by: Samsonov, Sergey, et al.
Published: (2023)
by: Samsonov, Sergey, et al.
Published: (2023)
Rates of convergence for density estimation with generative adversarial networks
by: Puchkin, Nikita, et al.
Published: (2021)
by: Puchkin, Nikita, et al.
Published: (2021)
Planning in entropy-regularized Markov decision processes and games
by: Grill, Jean-Bastien, et al.
Published: (2026)
by: Grill, Jean-Bastien, et al.
Published: (2026)
On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
by: Labbi, Safwan, et al.
Published: (2025)
by: Labbi, Safwan, et al.
Published: (2025)
Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with f-SoftArgmax Parameterization & Coupled Regularization
by: Labbi, Safwan, et al.
Published: (2026)
by: Labbi, Safwan, et al.
Published: (2026)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
by: Sheshukova, Marina, et al.
Published: (2024)
by: Sheshukova, Marina, et al.
Published: (2024)
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning
by: Grill, Jean-Bastien, et al.
Published: (2026)
by: Grill, Jean-Bastien, et al.
Published: (2026)
Statistical analysis of Inverse Entropy-regularized Reinforcement Learning
by: Belomestny, Denis, et al.
Published: (2025)
by: Belomestny, Denis, et al.
Published: (2025)
Bandits attack function optimization
by: Preux, Philippe, et al.
Published: (2026)
by: Preux, Philippe, et al.
Published: (2026)
Stochastic simultaneous optimistic optimization
by: Valko, Michal, et al.
Published: (2026)
by: Valko, Michal, et al.
Published: (2026)
Large-scale semi-supervised learning with online spectral graph sparsification
by: Calandriello, Daniele, et al.
Published: (2026)
by: Calandriello, Daniele, et al.
Published: (2026)
Analysis of Nystrom method with sequential ridge leverage scores
by: Calandriello, Daniele, et al.
Published: (2026)
by: Calandriello, Daniele, et al.
Published: (2026)
Pack only the essentials: Adaptive dictionary learning for kernel ridge regression
by: Calandriello, Daniele, et al.
Published: (2026)
by: Calandriello, Daniele, et al.
Published: (2026)
Covariance-adapting algorithm for semi-bandits with application to sparse rewards
by: Perrault, Pierre, et al.
Published: (2026)
by: Perrault, Pierre, et al.
Published: (2026)
Schrödinger bridge problem via empirical risk minimization
by: Belomestny, Denis, et al.
Published: (2026)
by: Belomestny, Denis, et al.
Published: (2026)
UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms
by: Belomestny, Denis, et al.
Published: (2021)
by: Belomestny, Denis, et al.
Published: (2021)
Generative Flow Networks as Entropy-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023)
by: Tiapkin, Daniil, et al.
Published: (2023)
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
by: Ocello, Antonio, et al.
Published: (2025)
by: Ocello, Antonio, et al.
Published: (2025)
Optimal Design for Reward Modeling in RLHF
by: Scheid, Antoine, et al.
Published: (2024)
by: Scheid, Antoine, et al.
Published: (2024)
Refined Analysis of Entropy-Regularized Actor-Critic
by: Labbi, Safwan, et al.
Published: (2026)
by: Labbi, Safwan, et al.
Published: (2026)
Black-box optimization of noisy functions with unknown smoothness
by: Grill, Jean-Bastien, et al.
Published: (2026)
by: Grill, Jean-Bastien, et al.
Published: (2026)
Tight Bounds for Schrödinger Potential Estimation in Unpaired Data Translation
by: Puchkin, Nikita, et al.
Published: (2025)
by: Puchkin, Nikita, et al.
Published: (2025)
Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent
by: Sheshukova, Marina, et al.
Published: (2025)
by: Sheshukova, Marina, et al.
Published: (2025)
Generalized Preference Optimization: A Unified Approach to Offline Alignment
by: Tang, Yunhao, et al.
Published: (2024)
by: Tang, Yunhao, et al.
Published: (2024)
Spectral Thompson sampling
by: Kocak, Tomas, et al.
Published: (2026)
by: Kocak, Tomas, et al.
Published: (2026)
VA-learning as a more efficient alternative to Q-learning
by: Tang, Yunhao, et al.
Published: (2023)
by: Tang, Yunhao, et al.
Published: (2023)
Spectral bandits for smooth graph functions
by: Valko, Michal, et al.
Published: (2026)
by: Valko, Michal, et al.
Published: (2026)
Efficient learning by implicit exploration in bandit problems with side observations
by: Kocak, Tomas, et al.
Published: (2026)
by: Kocak, Tomas, et al.
Published: (2026)
Improved large-scale graph learning through ridge spectral sparsification
by: Calandriello, Daniele, et al.
Published: (2026)
by: Calandriello, Daniele, et al.
Published: (2026)
Sample complexity of Schrödinger potential estimation
by: Puchkin, Nikita, et al.
Published: (2025)
by: Puchkin, Nikita, et al.
Published: (2025)
Budgeted Online Influence Maximization
by: Perrault, Pierre, et al.
Published: (2026)
by: Perrault, Pierre, et al.
Published: (2026)
Improving GFlowNets with Monte Carlo Tree Search
by: Morozov, Nikita, et al.
Published: (2024)
by: Morozov, Nikita, et al.
Published: (2024)
Theoretical guarantees for neural control variates in MCMC
by: Belomestny, Denis, et al.
Published: (2023)
by: Belomestny, Denis, et al.
Published: (2023)
Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
by: Labbi, Safwan, et al.
Published: (2024)
by: Labbi, Safwan, et al.
Published: (2024)
A note on concentration inequalities for the overlapped batch mean variance estimators for Markov chains
by: Moulines, Eric, et al.
Published: (2025)
by: Moulines, Eric, et al.
Published: (2025)
On Teacher Hacking in Language Model Distillation
by: Tiapkin, Daniil, et al.
Published: (2025)
by: Tiapkin, Daniil, et al.
Published: (2025)
A single algorithm for both restless and rested rotting bandits
by: Seznec, Julien, et al.
Published: (2026)
by: Seznec, Julien, et al.
Published: (2026)
Similar Items
-
Demonstration-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023) -
A New Bound on the Cumulant Generating Function of Dirichlet Processes
by: Perrault, Pierre, et al.
Published: (2024) -
Proximal Point Nash Learning from Human Feedback
by: Tiapkin, Daniil, et al.
Published: (2025) -
Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
by: Samsonov, Sergey, et al.
Published: (2023) -
Rates of convergence for density estimation with generative adversarial networks
by: Puchkin, Nikita, et al.
Published: (2021)