Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910220330991616 |
|---|---|
| author | Robertson, Zachary |
| author_facet | Robertson, Zachary |
| contents | Given $n$ random variables, when does the matrix of pairwise $f$-mutual informations define a PSD kernel over variables? For convex finite generators $f:(0,\infty)\to\mathbb{R}$ with $f(1)=0$ and finite boundary value $f(0)$, we give a closed characterization up to linear transformation $f\sim f+c(t-1)$, which leaves every $f$-divergence and every $f$-mutual-information matrix unchanged. The matrix $M^{(f)}_{ij}:=I_f(X_i;X_j)$ is PSD for every finite-alphabet family if and only if the normalized representative has a globally convergent expansion $\bar f(t)=\sum_{m\ge2}a_m(t-1)^m$, with $a_m\ge0$, on all of $(0,\infty)$. Sufficiency follows from a replica embedding for monomial generators plus closure under nonnegative mixtures. Necessity first extracts the local Taylor cone at $1$ using biased three-point kernels $H_a$, the Belton--Guillot--Khare--Putinar (BGKP) low-rank Hankel positivity-preserver theorem, and then bootstraps analyticity to the divergence. This is a kernel characterization problem, not a metric one: PSD of the variable-indexed matrix is distinct from Hilbertian properties of divergences between distributions. The result explains why Shannon MI and Jensen--Shannon fail, why $χ^2$ succeeds, and why non-analytic divergences such as total variation and ReLU are excluded. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_08929 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | A Global Characterization of $f$-Divergences Yielding PSD Mutual-Information Matrices Robertson, Zachary Information Theory Given $n$ random variables, when does the matrix of pairwise $f$-mutual informations define a PSD kernel over variables? For convex finite generators $f:(0,\infty)\to\mathbb{R}$ with $f(1)=0$ and finite boundary value $f(0)$, we give a closed characterization up to linear transformation $f\sim f+c(t-1)$, which leaves every $f$-divergence and every $f$-mutual-information matrix unchanged. The matrix $M^{(f)}_{ij}:=I_f(X_i;X_j)$ is PSD for every finite-alphabet family if and only if the normalized representative has a globally convergent expansion $\bar f(t)=\sum_{m\ge2}a_m(t-1)^m$, with $a_m\ge0$, on all of $(0,\infty)$. Sufficiency follows from a replica embedding for monomial generators plus closure under nonnegative mixtures. Necessity first extracts the local Taylor cone at $1$ using biased three-point kernels $H_a$, the Belton--Guillot--Khare--Putinar (BGKP) low-rank Hankel positivity-preserver theorem, and then bootstraps analyticity to the divergence. This is a kernel characterization problem, not a metric one: PSD of the variable-indexed matrix is distinct from Hilbertian properties of divergences between distributions. The result explains why Shannon MI and Jensen--Shannon fail, why $χ^2$ succeeds, and why non-analytic divergences such as total variation and ReLU are excluded. |
| title | A Global Characterization of $f$-Divergences Yielding PSD Mutual-Information Matrices |
| topic | Information Theory |
| url | https://arxiv.org/abs/2601.08929 |