Saved in:
| Main Authors: | Vandergrift, Matthew, White, Martha, Polyanskiy, Yury, Rigollet, Philippe, Atanackovic, Lazar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.28075 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
YuriiFormer: A Suite of Nesterov-Accelerated Transformers
by: Zimin, Aleksandr, et al.
Published: (2026)
by: Zimin, Aleksandr, et al.
Published: (2026)
A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023)
by: Geshkovski, Borjan, et al.
Published: (2023)
Quantitative Clustering in Mean-Field Transformer Models
by: Chen, Shi, et al.
Published: (2025)
by: Chen, Shi, et al.
Published: (2025)
Synchronization of mean-field models on the circle
by: Polyanskiy, Yury, et al.
Published: (2025)
by: Polyanskiy, Yury, et al.
Published: (2025)
The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)
by: Geshkovski, Borjan, et al.
Published: (2023)
Scaling Limits of Long-Context Transformers
by: Bruno, Giuseppe, et al.
Published: (2026)
by: Bruno, Giuseppe, et al.
Published: (2026)
Clustering in Causal Attention Masking
by: Karagodin, Nikita, et al.
Published: (2024)
by: Karagodin, Nikita, et al.
Published: (2024)
Dynamic metastability in the self-attention model
by: Geshkovski, Borjan, et al.
Published: (2024)
by: Geshkovski, Borjan, et al.
Published: (2024)
Critical attention scaling in long-context transformers
by: Chen, Shi, et al.
Published: (2025)
by: Chen, Shi, et al.
Published: (2025)
Residual connections provably mitigate oversmoothing in graph neural networks
by: Chen, Ziang, et al.
Published: (2025)
by: Chen, Ziang, et al.
Published: (2025)
Normalization in Attention Dynamics
by: Karagodin, Nikita, et al.
Published: (2025)
by: Karagodin, Nikita, et al.
Published: (2025)
Investigating Generalization Behaviours of Generative Flow Networks
by: Atanackovic, Lazar, et al.
Published: (2024)
by: Atanackovic, Lazar, et al.
Published: (2024)
Splat Regression Models
by: Daniels, Mara, et al.
Published: (2025)
by: Daniels, Mara, et al.
Published: (2025)
Measure-to-measure interpolation using Transformers
by: Geshkovski, Borjan, et al.
Published: (2024)
by: Geshkovski, Borjan, et al.
Published: (2024)
The Mean-Field Dynamics of Transformers
by: Rigollet, Philippe
Published: (2025)
by: Rigollet, Philippe
Published: (2025)
Homogenized Transformers
by: Koubbi, Hugo, et al.
Published: (2026)
by: Koubbi, Hugo, et al.
Published: (2026)
Solving Empirical Bayes via Transformers
by: Teh, Anzo, et al.
Published: (2025)
by: Teh, Anzo, et al.
Published: (2025)
A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
by: Guan, Vincent, et al.
Published: (2026)
by: Guan, Vincent, et al.
Published: (2026)
The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning
by: Block, Adam, et al.
Published: (2023)
by: Block, Adam, et al.
Published: (2023)
High-Rate Quantized Matrix Multiplication II
by: Ordentlich, Or, et al.
Published: (2026)
by: Ordentlich, Or, et al.
Published: (2026)
Optimal Quantization for Matrix Multiplication
by: Ordentlich, Or, et al.
Published: (2024)
by: Ordentlich, Or, et al.
Published: (2024)
Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior
by: Polyanskiy, Yury, et al.
Published: (2025)
by: Polyanskiy, Yury, et al.
Published: (2025)
The Superposition of Diffusion Models Using the Itô Density Estimator
by: Skreta, Marta, et al.
Published: (2024)
by: Skreta, Marta, et al.
Published: (2024)
Price of universality in vector quantization is at most 0.11 bit
by: Harbuzova, Alina, et al.
Published: (2026)
by: Harbuzova, Alina, et al.
Published: (2026)
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
by: Boix-Adsera, Enric, et al.
Published: (2025)
by: Boix-Adsera, Enric, et al.
Published: (2025)
A Computational Framework for Solving Wasserstein Lagrangian Flows
by: Neklyudov, Kirill, et al.
Published: (2023)
by: Neklyudov, Kirill, et al.
Published: (2023)
Representation Alignment Rests on Linear Structure
by: Bangachev, Kiril, et al.
Published: (2026)
by: Bangachev, Kiril, et al.
Published: (2026)
On the Minimax Regret of Sequential Probability Assignment via Square-Root Entropy
by: Jia, Zeyu, et al.
Published: (2025)
by: Jia, Zeyu, et al.
Published: (2025)
A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity
by: Jia, Zeyu, et al.
Published: (2025)
by: Jia, Zeyu, et al.
Published: (2025)
Gaussian mixture layers for neural networks
by: Chewi, Sinho, et al.
Published: (2025)
by: Chewi, Sinho, et al.
Published: (2025)
On the number of modes of Gaussian kernel density estimators
by: Geshkovski, Borjan, et al.
Published: (2024)
by: Geshkovski, Borjan, et al.
Published: (2024)
The Radio-Frequency Transformer for Signal Separation
by: Lifar, Egor, et al.
Published: (2026)
by: Lifar, Egor, et al.
Published: (2026)
WaterSIC: information-theoretically (near) optimal linear layer quantization
by: Lifar, Egor, et al.
Published: (2026)
by: Lifar, Egor, et al.
Published: (2026)
Universal priors: solving empirical Bayes via Bayesian inference and pretraining
by: Cannella, Nick, et al.
Published: (2026)
by: Cannella, Nick, et al.
Published: (2026)
Statistical optimal transport
by: Chewi, Sinho, et al.
Published: (2024)
by: Chewi, Sinho, et al.
Published: (2024)
NestQuant: Nested Lattice Quantization for Matrix Products and LLMs
by: Savkin, Semyon, et al.
Published: (2025)
by: Savkin, Semyon, et al.
Published: (2025)
Simulation-free Schrödinger bridges via score and flow matching
by: Tong, Alexander, et al.
Published: (2023)
by: Tong, Alexander, et al.
Published: (2023)
Global Minimizers of Sigmoid Contrastive Loss
by: Bangachev, Kiril, et al.
Published: (2025)
by: Bangachev, Kiril, et al.
Published: (2025)
On the Structure of Stationary Solutions to McKean-Vlasov Equations with Applications to Noisy Transformers
by: Balasubramanian, Krishnakumar, et al.
Published: (2025)
by: Balasubramanian, Krishnakumar, et al.
Published: (2025)
Is Dimensionality a Barrier for Retrieval Models?
by: Bangachev, Kiril, et al.
Published: (2026)
by: Bangachev, Kiril, et al.
Published: (2026)
Similar Items
-
YuriiFormer: A Suite of Nesterov-Accelerated Transformers
by: Zimin, Aleksandr, et al.
Published: (2026) -
A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023) -
Quantitative Clustering in Mean-Field Transformer Models
by: Chen, Shi, et al.
Published: (2025) -
Synchronization of mean-field models on the circle
by: Polyanskiy, Yury, et al.
Published: (2025) -
The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)