Saved in:
Bibliographic Details
Main Authors: Teh, Anzo, Jabbour, Mark, Polyanskiy, Yury
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.09844
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912398583005184
author Teh, Anzo
Jabbour, Mark
Polyanskiy, Yury
author_facet Teh, Anzo
Jabbour, Mark
Polyanskiy, Yury
contents This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting. In Poisson-EB a high-dimensional mean vector $θ$ (with iid coordinates sampled from an unknown prior $π$) is estimated on the basis of $X=\mathrm{Poisson}(θ)$. A transformer model is pre-trained on a set of synthetically generated pairs $(X,θ)$ and learns to do in-context learning (ICL) by adapting to unknown $π$. Theoretically, we show that a sufficiently wide transformer can achieve vanishing regret with respect to an oracle estimator who knows $π$ as dimension grows to infinity. Practically, we discover that already very small models (100k parameters) are able to outperform the best classical algorithm (non-parametric maximum likelihood, or NPMLE) both in runtime and validation loss, which we compute on out-of-distribution synthetic data as well as real-world datasets (NHL hockey, MLB baseball, BookCorpusOpen). Finally, by using linear probes, we confirm that the transformer's EB estimator appears to internally work differently from either NPMLE or Robbins' estimators.
format Preprint
id arxiv_https___arxiv_org_abs_2502_09844
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Solving Empirical Bayes via Transformers
Teh, Anzo
Jabbour, Mark
Polyanskiy, Yury
Machine Learning
This work applies modern AI tools (transformers) to solving one of the oldest statistical problems: Poisson means under empirical Bayes (Poisson-EB) setting. In Poisson-EB a high-dimensional mean vector $θ$ (with iid coordinates sampled from an unknown prior $π$) is estimated on the basis of $X=\mathrm{Poisson}(θ)$. A transformer model is pre-trained on a set of synthetically generated pairs $(X,θ)$ and learns to do in-context learning (ICL) by adapting to unknown $π$. Theoretically, we show that a sufficiently wide transformer can achieve vanishing regret with respect to an oracle estimator who knows $π$ as dimension grows to infinity. Practically, we discover that already very small models (100k parameters) are able to outperform the best classical algorithm (non-parametric maximum likelihood, or NPMLE) both in runtime and validation loss, which we compute on out-of-distribution synthetic data as well as real-world datasets (NHL hockey, MLB baseball, BookCorpusOpen). Finally, by using linear probes, we confirm that the transformer's EB estimator appears to internally work differently from either NPMLE or Robbins' estimators.
title Solving Empirical Bayes via Transformers
topic Machine Learning
url https://arxiv.org/abs/2502.09844