Saved in:
| Main Authors: | Bohbot, Léa, Letrouit, Cyril, Peyré, Gabriel, Vialard, François-Xavier |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10656 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
by: Barboni, Raphaël, et al.
Published: (2025)
by: Barboni, Raphaël, et al.
Published: (2025)
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
by: Barboni, Raphaël, et al.
Published: (2024)
by: Barboni, Raphaël, et al.
Published: (2024)
Sliced ReLU attention: Quasi-linear contextual expressivity via sorting
by: Vialard, François-Xavier, et al.
Published: (2025)
by: Vialard, François-Xavier, et al.
Published: (2025)
Understanding the training of infinitely deep and wide ResNets with conditional optimal transport
by: Raphaël Barboni, et al.
Published: (2025)
by: Raphaël Barboni, et al.
Published: (2025)
Towards Understanding the Universality of Transformers for Next-Token Prediction
by: Sander, Michael E., et al.
Published: (2024)
by: Sander, Michael E., et al.
Published: (2024)
How Smooth Is Attention?
by: Castin, Valérie, et al.
Published: (2023)
by: Castin, Valérie, et al.
Published: (2023)
The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)
by: Geshkovski, Borjan, et al.
Published: (2023)
Robust Sublinear Convergence Rates for Iterative Bregman Projections
by: Peyré, Gabriel
Published: (2026)
by: Peyré, Gabriel
Published: (2026)
A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023)
by: Geshkovski, Borjan, et al.
Published: (2023)
Optimal and Diffusion Transports in Machine Learning
by: Peyré, Gabriel
Published: (2025)
by: Peyré, Gabriel
Published: (2025)
Optimal Transport for Machine Learners
by: Peyré, Gabriel
Published: (2025)
by: Peyré, Gabriel
Published: (2025)
Muon Dynamics as a Spectral Wasserstein Flow
by: Peyré, Gabriel
Published: (2026)
by: Peyré, Gabriel
Published: (2026)
Learning from Samples: Inverse Problems over measures via Sharpened Fenchel-Young Losses
by: Andrade, Francisco, et al.
Published: (2025)
by: Andrade, Francisco, et al.
Published: (2025)
Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport
by: Genans, Ferdinand, et al.
Published: (2025)
by: Genans, Ferdinand, et al.
Published: (2025)
Semi-Discrete Optimal Transport: Nearly Minimax Estimation With Stochastic Gradient Descent and Adaptive Entropic Regularization
by: Genans, Ferdinand, et al.
Published: (2024)
by: Genans, Ferdinand, et al.
Published: (2024)
Stochastic Optimization in Semi-Discrete Optimal Transport: Convergence Analysis and Minimax Rate
by: Genans, Ferdinand, et al.
Published: (2025)
by: Genans, Ferdinand, et al.
Published: (2025)
Unstable optimal transport maps
by: Letrouit, Cyril
Published: (2025)
by: Letrouit, Cyril
Published: (2025)
Intrinsic training dynamics of deep neural networks
by: Marcotte, Sibylle, et al.
Published: (2025)
by: Marcotte, Sibylle, et al.
Published: (2025)
Transformative or Conservative? Conservation laws for ResNets and Transformers
by: Marcotte, Sibylle, et al.
Published: (2025)
by: Marcotte, Sibylle, et al.
Published: (2025)
Geometry-Aware Discretization Error of Diffusion Models
by: Hurault, Samuel, et al.
Published: (2026)
by: Hurault, Samuel, et al.
Published: (2026)
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
by: Marcotte, Sibylle, et al.
Published: (2023)
by: Marcotte, Sibylle, et al.
Published: (2023)
On the global convergence of gradient descent for wide shallow models with bounded nonlinearities
by: Petit, Romain, et al.
Published: (2026)
by: Petit, Romain, et al.
Published: (2026)
Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
by: Marcotte, Sibylle, et al.
Published: (2024)
by: Marcotte, Sibylle, et al.
Published: (2024)
Enhancing Hypergradients Estimation: A Study of Preconditioning and Reparameterization
by: Ye, Zhenzhang, et al.
Published: (2024)
by: Ye, Zhenzhang, et al.
Published: (2024)
Transformers are Universal In-context Learners
by: Furuya, Takashi, et al.
Published: (2024)
by: Furuya, Takashi, et al.
Published: (2024)
Balanced LoRA: Removing Parameter Invariance to Accelerate Convergence
by: Castin, Valérie, et al.
Published: (2026)
by: Castin, Valérie, et al.
Published: (2026)
From Score Matching to Diffusion: A Fine-Grained Error Analysis in the Gaussian Setting
by: Hurault, Samuel, et al.
Published: (2025)
by: Hurault, Samuel, et al.
Published: (2025)
How do Transformers perform In-Context Autoregressive Learning?
by: Sander, Michael E., et al.
Published: (2024)
by: Sander, Michael E., et al.
Published: (2024)
Gluing methods for quantitative stability of optimal transport maps
by: Letrouit, Cyril, et al.
Published: (2024)
by: Letrouit, Cyril, et al.
Published: (2024)
Delocalized eigenvectors of transitive graphs and beyond
by: Burq, Nicolas, et al.
Published: (2024)
by: Burq, Nicolas, et al.
Published: (2024)
Maximal multiplicity of Laplacian eigenvalues in negatively curved surfaces
by: Letrouit, Cyril, et al.
Published: (2023)
by: Letrouit, Cyril, et al.
Published: (2023)
Generic controllability of equivariant systems and applications to particle systems and neural networks
by: Agrachev, Andrei, et al.
Published: (2024)
by: Agrachev, Andrei, et al.
Published: (2024)
Training Infinitely Deep and Wide Transformers
by: Barboni, Raphaël, et al.
Published: (2026)
by: Barboni, Raphaël, et al.
Published: (2026)
Token Distillation: Attention-aware Input Embeddings For New Tokens
by: Dobler, Konstantin, et al.
Published: (2025)
by: Dobler, Konstantin, et al.
Published: (2025)
Benign Overfitting in Token Selection of Attention Mechanism
by: Sakamoto, Keitaro, et al.
Published: (2024)
by: Sakamoto, Keitaro, et al.
Published: (2024)
A Unified Perspective on the Dynamics of Deep Transformers
by: Castin, Valérie, et al.
Published: (2025)
by: Castin, Valérie, et al.
Published: (2025)
On the global convergence of Wasserstein gradient flow of the Coulomb discrepancy
by: Boufadène, Siwan, et al.
Published: (2023)
by: Boufadène, Siwan, et al.
Published: (2023)
NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization
by: Liu, Enshu, et al.
Published: (2026)
by: Liu, Enshu, et al.
Published: (2026)
Stability of optimal transport maps on Riemannian manifolds
by: Kitagawa, Jun, et al.
Published: (2025)
by: Kitagawa, Jun, et al.
Published: (2025)
Quantum mixing on large Schreier graphs
by: Bordenave, Charles, et al.
Published: (2026)
by: Bordenave, Charles, et al.
Published: (2026)
Similar Items
-
Ultra-fast feature learning for the training of two-layer neural networks in the two-timescale regime
by: Barboni, Raphaël, et al.
Published: (2025) -
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
by: Barboni, Raphaël, et al.
Published: (2024) -
Sliced ReLU attention: Quasi-linear contextual expressivity via sorting
by: Vialard, François-Xavier, et al.
Published: (2025) -
Understanding the training of infinitely deep and wide ResNets with conditional optimal transport
by: Raphaël Barboni, et al.
Published: (2025) -
Towards Understanding the Universality of Transformers for Next-Token Prediction
by: Sander, Michael E., et al.
Published: (2024)