:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bohbot, Léa, Letrouit, Cyril, Peyré, Gabriel, Vialard, François-Xavier
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2512.10656
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
by: Barboni, Raphaël, et al.
Published: (2025)

Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
by: Barboni, Raphaël, et al.
Published: (2024)

Sliced ReLU attention: Quasi-linear contextual expressivity via sorting
by: Vialard, François-Xavier, et al.
Published: (2025)

Understanding the training of infinitely deep and wide ResNets with conditional optimal transport
by: Raphaël Barboni, et al.
Published: (2025)

Towards Understanding the Universality of Transformers for Next-Token Prediction
by: Sander, Michael E., et al.
Published: (2024)

How Smooth Is Attention?
by: Castin, Valérie, et al.
Published: (2023)

The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)

Robust Sublinear Convergence Rates for Iterative Bregman Projections
by: Peyré, Gabriel
Published: (2026)

A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023)

Optimal and Diffusion Transports in Machine Learning
by: Peyré, Gabriel
Published: (2025)

Optimal Transport for Machine Learners
by: Peyré, Gabriel
Published: (2025)

Muon Dynamics as a Spectral Wasserstein Flow
by: Peyré, Gabriel
Published: (2026)

Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses
by: Andrade, Francisco, et al.
Published: (2025)

Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport
by: Genans, Ferdinand, et al.
Published: (2025)

Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization
by: Genans, Ferdinand, et al.
Published: (2024)

Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate
by: Genans, Ferdinand, et al.
Published: (2025)

Unstable optimal transport maps
by: Letrouit, Cyril
Published: (2025)

Intrinsic training dynamics of deep neural networks
by: Marcotte, Sibylle, et al.
Published: (2025)

Transformative or Conservative? Conservation laws for ResNets and Transformers
by: Marcotte, Sibylle, et al.
Published: (2025)

Geometry-Aware Discretization Error of Diffusion Models
by: Hurault, Samuel, et al.
Published: (2026)

Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
by: Marcotte, Sibylle, et al.
Published: (2023)

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
by: Petit, Romain, et al.
Published: (2026)

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
by: Marcotte, Sibylle, et al.
Published: (2024)

Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization
by: Ye, Zhenzhang, et al.
Published: (2024)

Transformers are Universal In-context Learners
by: Furuya, Takashi, et al.
Published: (2024)

Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence
by: Castin, Valérie, et al.
Published: (2026)

From Score Matching to Diffusion: A Fine-Grained Error Analysis in the Gaussian Setting
by: Hurault, Samuel, et al.
Published: (2025)

How do Transformers perform In-Context Autoregressive Learning?
by: Sander, Michael E., et al.
Published: (2024)

Gluing methods for quantitative stability of optimal transport maps
by: Letrouit, Cyril, et al.
Published: (2024)

Delocalized eigenvectors of transitive graphs and beyond
by: Burq, Nicolas, et al.
Published: (2024)

Maximal multiplicity of Laplacian eigenvalues in negatively curved surfaces
by: Letrouit, Cyril, et al.
Published: (2023)

Generic controllability of equivariant systems and applications to particle systems and neural networks
by: Agrachev, Andrei, et al.
Published: (2024)

Training Infinitely Deep and Wide Transformers
by: Barboni, Raphaël, et al.
Published: (2026)

Token Distillation: Attention-aware Input Embeddings For New Tokens
by: Dobler, Konstantin, et al.
Published: (2025)

Benign Overfitting in Token Selection of Attention Mechanism
by: Sakamoto, Keitaro, et al.
Published: (2024)

A Unified Perspective on the Dynamics of Deep Transformers
by: Castin, Valérie, et al.
Published: (2025)

On the global convergence of Wasserstein gradient flow of the Coulomb discrepancy
by: Boufadène, Siwan, et al.
Published: (2023)

NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
by: Liu, Enshu, et al.
Published: (2026)

Stability of optimal transport maps on Riemannian manifolds
by: Kitagawa, Jun, et al.
Published: (2025)

Quantum mixing on large Schreier graphs
by: Bordenave, Charles, et al.
Published: (2026)