Saved in:
| Main Authors: | Cannella, Nick, Teh, Anzo, Han, Yanjun, Polyanskiy, Yury |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.15136 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Solving Empirical Bayes via Transformers
by: Teh, Anzo, et al.
Published: (2025)
by: Teh, Anzo, et al.
Published: (2025)
Function estimation in the empirical Bayes setting
by: Kang, Benjamin, et al.
Published: (2026)
by: Kang, Benjamin, et al.
Published: (2026)
The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning
by: Block, Adam, et al.
Published: (2023)
by: Block, Adam, et al.
Published: (2023)
High-Rate Quantized Matrix Multiplication II
by: Ordentlich, Or, et al.
Published: (2026)
by: Ordentlich, Or, et al.
Published: (2026)
Optimal Quantization for Matrix Multiplication
by: Ordentlich, Or, et al.
Published: (2024)
by: Ordentlich, Or, et al.
Published: (2024)
Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior
by: Polyanskiy, Yury, et al.
Published: (2025)
by: Polyanskiy, Yury, et al.
Published: (2025)
On the Minimax Regret of Sequential Probability Assignment via Square-Root Entropy
by: Jia, Zeyu, et al.
Published: (2025)
by: Jia, Zeyu, et al.
Published: (2025)
Price of universality in vector quantization is at most 0.11 bit
by: Harbuzova, Alina, et al.
Published: (2026)
by: Harbuzova, Alina, et al.
Published: (2026)
Representation Alignment Rests on Linear Structure
by: Bangachev, Kiril, et al.
Published: (2026)
by: Bangachev, Kiril, et al.
Published: (2026)
A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity
by: Jia, Zeyu, et al.
Published: (2025)
by: Jia, Zeyu, et al.
Published: (2025)
Optimal empirical Bayes estimation for the Poisson model via minimum-distance methods
by: Jana, Soham, et al.
Published: (2022)
by: Jana, Soham, et al.
Published: (2022)
YuriiFormer: A Suite of Nesterov-Accelerated Transformers
by: Zimin, Aleksandr, et al.
Published: (2026)
by: Zimin, Aleksandr, et al.
Published: (2026)
WaterSIC: information-theoretically (near) optimal linear layer quantization
by: Lifar, Egor, et al.
Published: (2026)
by: Lifar, Egor, et al.
Published: (2026)
Synchronization of mean-field models on the circle
by: Polyanskiy, Yury, et al.
Published: (2025)
by: Polyanskiy, Yury, et al.
Published: (2025)
NestQuant: Nested Lattice Quantization for Matrix Products and LLMs
by: Savkin, Semyon, et al.
Published: (2025)
by: Savkin, Semyon, et al.
Published: (2025)
The emergence of clusters in self-attention dynamics
by: Geshkovski, Borjan, et al.
Published: (2023)
by: Geshkovski, Borjan, et al.
Published: (2023)
Global Minimizers of Sigmoid Contrastive Loss
by: Bangachev, Kiril, et al.
Published: (2025)
by: Bangachev, Kiril, et al.
Published: (2025)
Is Dimensionality a Barrier for Retrieval Models?
by: Bangachev, Kiril, et al.
Published: (2026)
by: Bangachev, Kiril, et al.
Published: (2026)
Measure-to-measure Regression with Transformers
by: Vandergrift, Matthew, et al.
Published: (2026)
by: Vandergrift, Matthew, et al.
Published: (2026)
A mathematical perspective on Transformers
by: Geshkovski, Borjan, et al.
Published: (2023)
by: Geshkovski, Borjan, et al.
Published: (2023)
Dynamic metastability in the self-attention model
by: Geshkovski, Borjan, et al.
Published: (2024)
by: Geshkovski, Borjan, et al.
Published: (2024)
Quantitative Clustering in Mean-Field Transformer Models
by: Chen, Shi, et al.
Published: (2025)
by: Chen, Shi, et al.
Published: (2025)
Data-driven informative priors for Bayesian inference with quasi-periodic data
by: Lopez-Santiago, Javier, et al.
Published: (2025)
by: Lopez-Santiago, Javier, et al.
Published: (2025)
Gradient descent inference in empirical risk minimization
by: Han, Qiyang, et al.
Published: (2024)
by: Han, Qiyang, et al.
Published: (2024)
Critical attention scaling in long-context transformers
by: Chen, Shi, et al.
Published: (2025)
by: Chen, Shi, et al.
Published: (2025)
Weak neural variational inference for solving Bayesian inverse problems without forward models: applications in elastography
by: Scholz, Vincent C., et al.
Published: (2024)
by: Scholz, Vincent C., et al.
Published: (2024)
Residual connections provably mitigate oversmoothing in graph neural networks
by: Chen, Ziang, et al.
Published: (2025)
by: Chen, Ziang, et al.
Published: (2025)
Clustering in Causal Attention Masking
by: Karagodin, Nikita, et al.
Published: (2024)
by: Karagodin, Nikita, et al.
Published: (2024)
Optimal score estimation via empirical Bayes smoothing
by: Wibisono, Andre, et al.
Published: (2024)
by: Wibisono, Andre, et al.
Published: (2024)
The Radio-Frequency Transformer for Signal Separation
by: Lifar, Egor, et al.
Published: (2026)
by: Lifar, Egor, et al.
Published: (2026)
Continuous First, Discrete Later: VQ-VAEs Without Dimensional Collapse
by: Zhao, Xinyu, et al.
Published: (2026)
by: Zhao, Xinyu, et al.
Published: (2026)
Scaling Limits of Long-Context Transformers
by: Bruno, Giuseppe, et al.
Published: (2026)
by: Bruno, Giuseppe, et al.
Published: (2026)
The effectiveness of MAE pre-pretraining for billion-scale pretraining
by: Singh, Mannat, et al.
Published: (2023)
by: Singh, Mannat, et al.
Published: (2023)
Minimax optimal testing by classification
by: Gerber, Patrik Róbert, et al.
Published: (2023)
by: Gerber, Patrik Róbert, et al.
Published: (2023)
Evolution Strategies for Deep RL pretraining
by: Martínez, Adrian, et al.
Published: (2026)
by: Martínez, Adrian, et al.
Published: (2026)
Synthetic continued pretraining
by: Yang, Zitong, et al.
Published: (2024)
by: Yang, Zitong, et al.
Published: (2024)
Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides
by: Karumuri, Sharmila, et al.
Published: (2023)
by: Karumuri, Sharmila, et al.
Published: (2023)
Interactive Learning of Single-Index Models via Stochastic Gradient Descent
by: Rajaraman, Nived, et al.
Published: (2026)
by: Rajaraman, Nived, et al.
Published: (2026)
On Uniform, Bayesian, and PAC-Bayesian Deep Ensembles
by: Hauptvogel, Nick, et al.
Published: (2024)
by: Hauptvogel, Nick, et al.
Published: (2024)
Normalization in Attention Dynamics
by: Karagodin, Nikita, et al.
Published: (2025)
by: Karagodin, Nikita, et al.
Published: (2025)
Similar Items
-
Solving Empirical Bayes via Transformers
by: Teh, Anzo, et al.
Published: (2025) -
Function estimation in the empirical Bayes setting
by: Kang, Benjamin, et al.
Published: (2026) -
The Sample Complexity of Approximate Rejection Sampling with Applications to Smoothed Online Learning
by: Block, Adam, et al.
Published: (2023) -
High-Rate Quantized Matrix Multiplication II
by: Ordentlich, Or, et al.
Published: (2026) -
Optimal Quantization for Matrix Multiplication
by: Ordentlich, Or, et al.
Published: (2024)