Saved in:
| Main Authors: | Scheid, Antoine, Boursier, Etienne, Durmus, Alain, Jordan, Michael I., Ménard, Pierre, Moulines, Eric, Valko, Michal |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.17055 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality
by: Scheid, Antoine, et al.
Published: (2024)
by: Scheid, Antoine, et al.
Published: (2024)
Online Decision-Making in Tree-Like Multi-Agent Games with Transfers
by: Scheid, Antoine, et al.
Published: (2025)
by: Scheid, Antoine, et al.
Published: (2025)
Incentivized Learning in Principal-Agent Bandit Games
by: Scheid, Antoine, et al.
Published: (2024)
by: Scheid, Antoine, et al.
Published: (2024)
Online Decision-Focused Learning
by: Capitaine, Aymeric, et al.
Published: (2025)
by: Capitaine, Aymeric, et al.
Published: (2025)
Test-then-Punish: A Statistical Approach to Repeated Games
by: Capitaine, Aymeric, et al.
Published: (2026)
by: Capitaine, Aymeric, et al.
Published: (2026)
Unravelling in Collaborative Learning
by: Capitaine, Aymeric, et al.
Published: (2024)
by: Capitaine, Aymeric, et al.
Published: (2024)
Prediction-Aware Learning in Multi-Agent Systems
by: Capitaine, Aymeric, et al.
Published: (2025)
by: Capitaine, Aymeric, et al.
Published: (2025)
Demonstration-Regularized RL
by: Tiapkin, Daniil, et al.
Published: (2023)
by: Tiapkin, Daniil, et al.
Published: (2023)
Model-free Posterior Sampling via Learning Rate Randomization
by: Tiapkin, Daniil, et al.
Published: (2023)
by: Tiapkin, Daniil, et al.
Published: (2023)
Proximal Point Nash Learning from Human Feedback
by: Tiapkin, Daniil, et al.
Published: (2025)
by: Tiapkin, Daniil, et al.
Published: (2025)
Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
by: Fiegel, Come, et al.
Published: (2026)
by: Fiegel, Come, et al.
Published: (2026)
A single algorithm for both restless and rested rotting bandits
by: Seznec, Julien, et al.
Published: (2026)
by: Seznec, Julien, et al.
Published: (2026)
Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians
by: Huix, Tom, et al.
Published: (2024)
by: Huix, Tom, et al.
Published: (2024)
Briding Diffusion Posterior Sampling and Monte Carlo methods: a survey
by: Janati, Yazid, et al.
Published: (2025)
by: Janati, Yazid, et al.
Published: (2025)
On Sampling with Approximate Transport Maps
by: Grenioux, Louis, et al.
Published: (2023)
by: Grenioux, Louis, et al.
Published: (2023)
Scaffold with Stochastic Gradients: New Analysis with Linear Speed-Up
by: Mangold, Paul, et al.
Published: (2025)
by: Mangold, Paul, et al.
Published: (2025)
Explaining and Preventing Alignment Collapse in Iterative RLHF
by: Gauthier, Etienne, et al.
Published: (2026)
by: Gauthier, Etienne, et al.
Published: (2026)
The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
by: Fiegel, Côme, et al.
Published: (2026)
by: Fiegel, Côme, et al.
Published: (2026)
Piecewise deterministic generative models
by: Bertazzi, Andrea, et al.
Published: (2024)
by: Bertazzi, Andrea, et al.
Published: (2024)
Divide-and-Conquer Posterior Sampling for Denoising Diffusion Priors
by: Janati, Yazid, et al.
Published: (2024)
by: Janati, Yazid, et al.
Published: (2024)
Categorical Reparameterization with Denoising Diffusion models
by: Gourevitch, Samson, et al.
Published: (2026)
by: Gourevitch, Samson, et al.
Published: (2026)
Refined Analysis of Federated Averaging and Federated Richardson-Romberg
by: Mangold, Paul, et al.
Published: (2024)
by: Mangold, Paul, et al.
Published: (2024)
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance
by: Moufad, Badr, et al.
Published: (2025)
by: Moufad, Badr, et al.
Published: (2025)
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
by: Zhu, Banghua, et al.
Published: (2024)
by: Zhu, Banghua, et al.
Published: (2024)
Planning in entropy-regularized Markov decision processes and games
by: Grill, Jean-Bastien, et al.
Published: (2026)
by: Grill, Jean-Bastien, et al.
Published: (2026)
Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation
by: Gourevitch, Samson, et al.
Published: (2026)
by: Gourevitch, Samson, et al.
Published: (2026)
Early alignment in two-layer networks training is a two-edged sword
by: Boursier, Etienne, et al.
Published: (2024)
by: Boursier, Etienne, et al.
Published: (2024)
Simplicity bias and optimization threshold in two-layer ReLU networks
by: Boursier, Etienne, et al.
Published: (2024)
by: Boursier, Etienne, et al.
Published: (2024)
Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective
by: Boursier, Etienne, et al.
Published: (2025)
by: Boursier, Etienne, et al.
Published: (2025)
Penalising the biases in norm regularisation enforces sparsity
by: Boursier, Etienne, et al.
Published: (2023)
by: Boursier, Etienne, et al.
Published: (2023)
A Mixture-Based Framework for Guiding Diffusion Models
by: Janati, Yazid, et al.
Published: (2025)
by: Janati, Yazid, et al.
Published: (2025)
Bandits on graphs and structures
by: Valko, Michal
Published: (2026)
by: Valko, Michal
Published: (2026)
Adaptive graph-based algorithms for conditional anomaly detection and semi-supervised learning
by: Valko, Michal
Published: (2026)
by: Valko, Michal
Published: (2026)
A survey on multi-player bandits
by: Boursier, Etienne, et al.
Published: (2022)
by: Boursier, Etienne, et al.
Published: (2022)
Variational Diffusion Posterior Sampling with Midpoint Guidance
by: Moufad, Badr, et al.
Published: (2024)
by: Moufad, Badr, et al.
Published: (2024)
Rosenthal-type inequalities for linear statistics of Markov chains
by: Durmus, Alain, et al.
Published: (2023)
by: Durmus, Alain, et al.
Published: (2023)
Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation
by: Sheshukova, Marina, et al.
Published: (2024)
by: Sheshukova, Marina, et al.
Published: (2024)
Covariance-adapting algorithm for semi-bandits with application to sparse rewards
by: Perrault, Pierre, et al.
Published: (2026)
by: Perrault, Pierre, et al.
Published: (2026)
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
Joint Channel Selection using FedDRL in V2X
by: Mancini, Lorenzo, et al.
Published: (2024)
by: Mancini, Lorenzo, et al.
Published: (2024)
Similar Items
-
Learning to Mitigate Externalities: the Coase Theorem with Hindsight Rationality
by: Scheid, Antoine, et al.
Published: (2024) -
Online Decision-Making in Tree-Like Multi-Agent Games with Transfers
by: Scheid, Antoine, et al.
Published: (2025) -
Incentivized Learning in Principal-Agent Bandit Games
by: Scheid, Antoine, et al.
Published: (2024) -
Online Decision-Focused Learning
by: Capitaine, Aymeric, et al.
Published: (2025) -
Test-then-Punish: A Statistical Approach to Repeated Games
by: Capitaine, Aymeric, et al.
Published: (2026)