:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Vandergrift, Matthew, White, Martha, Polyanskiy, Yury, Rigollet, Philippe, Atanackovic, Lazar
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.28075
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

YuriiFormer: A Suite of Nesterov-Accelerated Transformers
by: Zimin, Aleksandr, et al.
Published: (2026)

A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023)

Quantitative Clustering in Mean-Field Transformer Models
by: Chen, Shi, et al.
Published: (2025)

Synchronization of mean-field models on the circle
by: Polyanskiy, Yury, et al.
Published: (2025)

The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)

Scaling Limits of Long-Context Transformers
by: Bruno, Giuseppe, et al.
Published: (2026)

Clustering in Causal Attention Masking
by: Karagodin, Nikita, et al.
Published: (2024)

Dynamic metastability in the self-attention model
by: Geshkovski, Borjan, et al.
Published: (2024)

Critical attention scaling in long-context transformers
by: Chen, Shi, et al.
Published: (2025)

Residual connections provably mitigate oversmoothing in graph neural networks
by: Chen, Ziang, et al.
Published: (2025)

Normalization in Attention Dynamics
by: Karagodin, Nikita, et al.
Published: (2025)

Investigating Generalization Behaviours of Generative Flow Networks
by: Atanackovic, Lazar, et al.
Published: (2024)

Splat Regression Models
by: Daniels, Mara, et al.
Published: (2025)

Measure-to-measure interpolation using Transformers
by: Geshkovski, Borjan, et al.
Published: (2024)

The Mean-Field Dynamics of Transformers
by: Rigollet, Philippe
Published: (2025)

Homogenized Transformers
by: Koubbi, Hugo, et al.
Published: (2026)

Solving Empirical Bayes via Transformers
by: Teh, Anzo, et al.
Published: (2025)

A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots
by: Guan, Vincent, et al.
Published: (2026)

The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning
by: Block, Adam, et al.
Published: (2023)

High-Rate Quantized Matrix Multiplication II
by: Ordentlich, Or, et al.
Published: (2026)

Optimal Quantization for Matrix Multiplication
by: Ordentlich, Or, et al.
Published: (2024)

Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior
by: Polyanskiy, Yury, et al.
Published: (2025)

The Superposition of Diffusion Models Using the Itô Density Estimator
by: Skreta, Marta, et al.
Published: (2024)

Price of universality in vector quantization is at most 0.11 bit
by: Harbuzova, Alina, et al.
Published: (2026)

The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
by: Boix-Adsera, Enric, et al.
Published: (2025)

A Computational Framework for Solving Wasserstein Lagrangian Flows
by: Neklyudov, Kirill, et al.
Published: (2023)

Representation Alignment Rests on Linear Structure
by: Bangachev, Kiril, et al.
Published: (2026)

On the Minimax Regret of Sequential Probability Assignment via Square-Root Entropy
by: Jia, Zeyu, et al.
Published: (2025)

A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity
by: Jia, Zeyu, et al.
Published: (2025)

Gaussian mixture layers for neural networks
by: Chewi, Sinho, et al.
Published: (2025)

On the number of modes of Gaussian kernel density estimators
by: Geshkovski, Borjan, et al.
Published: (2024)

The Radio-Frequency Transformer for Signal Separation
by: Lifar, Egor, et al.
Published: (2026)

WaterSIC: information-theoretically (near) optimal linear layer quantization
by: Lifar, Egor, et al.
Published: (2026)

Universal priors: solving empirical Bayes via Bayesian inference and pretraining
by: Cannella, Nick, et al.
Published: (2026)

Statistical optimal transport
by: Chewi, Sinho, et al.
Published: (2024)

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs
by: Savkin, Semyon, et al.
Published: (2025)

Simulation-free Schrödinger bridges via score and flow matching
by: Tong, Alexander, et al.
Published: (2023)

Global Minimizers of Sigmoid Contrastive Loss
by: Bangachev, Kiril, et al.
Published: (2025)

On the Structure of Stationary Solutions to McKean-Vlasov Equations with Applications to Noisy Transformers
by: Balasubramanian, Krishnakumar, et al.
Published: (2025)

Is Dimensionality a Barrier for Retrieval Models?
by: Bangachev, Kiril, et al.
Published: (2026)