:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cannella, Nick, Teh, Anzo, Han, Yanjun, Polyanskiy, Yury
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.15136
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Solving Empirical Bayes via Transformers
by: Teh, Anzo, et al.
Published: (2025)

Function estimation in the empirical Bayes setting
by: Kang, Benjamin, et al.
Published: (2026)

The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning
by: Block, Adam, et al.
Published: (2023)

High-Rate Quantized Matrix Multiplication II
by: Ordentlich, Or, et al.
Published: (2026)

Optimal Quantization for Matrix Multiplication
by: Ordentlich, Or, et al.
Published: (2024)

Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior
by: Polyanskiy, Yury, et al.
Published: (2025)

On the Minimax Regret of Sequential Probability Assignment via Square-Root Entropy
by: Jia, Zeyu, et al.
Published: (2025)

Price of universality in vector quantization is at most 0.11 bit
by: Harbuzova, Alina, et al.
Published: (2026)

Representation Alignment Rests on Linear Structure
by: Bangachev, Kiril, et al.
Published: (2026)

A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity
by: Jia, Zeyu, et al.
Published: (2025)

Optimal empirical Bayes estimation for the Poisson model via minimum-distance methods
by: Jana, Soham, et al.
Published: (2022)

YuriiFormer: A Suite of Nesterov-Accelerated Transformers
by: Zimin, Aleksandr, et al.
Published: (2026)

WaterSIC: information-theoretically (near) optimal linear layer quantization
by: Lifar, Egor, et al.
Published: (2026)

Synchronization of mean-field models on the circle
by: Polyanskiy, Yury, et al.
Published: (2025)

NestQuant: Nested Lattice Quantization for Matrix Products and LLMs
by: Savkin, Semyon, et al.
Published: (2025)

The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)

Global Minimizers of Sigmoid Contrastive Loss
by: Bangachev, Kiril, et al.
Published: (2025)

Is Dimensionality a Barrier for Retrieval Models?
by: Bangachev, Kiril, et al.
Published: (2026)

Measure-to-measure Regression with Transformers
by: Vandergrift, Matthew, et al.
Published: (2026)

A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023)

Dynamic metastability in the self-attention model
by: Geshkovski, Borjan, et al.
Published: (2024)

Quantitative Clustering in Mean-Field Transformer Models
by: Chen, Shi, et al.
Published: (2025)

Data-driven informative priors for Bayesian inference with quasi-periodic data
by: Lopez-Santiago, Javier, et al.
Published: (2025)

Gradient descent inference in empirical risk minimization
by: Han, Qiyang, et al.
Published: (2024)

Critical attention scaling in long-context transformers
by: Chen, Shi, et al.
Published: (2025)

Weak neural variational inference for solving Bayesian inverse problems without forward models: applications in elastography
by: Scholz, Vincent C., et al.
Published: (2024)

Residual connections provably mitigate oversmoothing in graph neural networks
by: Chen, Ziang, et al.
Published: (2025)

Clustering in Causal Attention Masking
by: Karagodin, Nikita, et al.
Published: (2024)

Optimal score estimation via empirical Bayes smoothing
by: Wibisono, Andre, et al.
Published: (2024)

The Radio-Frequency Transformer for Signal Separation
by: Lifar, Egor, et al.
Published: (2026)

Continuous First, Discrete Later: VQ-VAEs Without Dimensional Collapse
by: Zhao, Xinyu, et al.
Published: (2026)

Scaling Limits of Long-Context Transformers
by: Bruno, Giuseppe, et al.
Published: (2026)

The effectiveness of MAE pre-pretraining for billion-scale pretraining
by: Singh, Mannat, et al.
Published: (2023)

Minimax optimal testing by classification
by: Gerber, Patrik Róbert, et al.
Published: (2023)

Evolution Strategies for Deep RL pretraining
by: Martínez, Adrian, et al.
Published: (2026)

Synthetic continued pretraining
by: Yang, Zitong, et al.
Published: (2024)

Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides
by: Karumuri, Sharmila, et al.
Published: (2023)

Interactive Learning of Single-Index Models via Stochastic Gradient Descent
by: Rajaraman, Nived, et al.
Published: (2026)

On Uniform, Bayesian, and PAC-Bayesian Deep Ensembles
by: Hauptvogel, Nick, et al.
Published: (2024)

Normalization in Attention Dynamics
by: Karagodin, Nikita, et al.
Published: (2025)