Saved in:
Bibliographic Details
Main Authors: Green, Alden, Balakrishnan, Sivaraman, Tibshirani, Ryan J.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.15628
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914955807162368
author Green, Alden
Balakrishnan, Sivaraman
Tibshirani, Ryan J.
author_facet Green, Alden
Balakrishnan, Sivaraman
Tibshirani, Ryan J.
contents We consider a novel multivariate nonparametric two-sample testing problem where, under the alternative, distributions $P$ and $Q$ are separated in an integral probability metric over functions of bounded total variation (TV IPM). We propose a new test, the graph TV test, which uses a graph-based approximation to the TV IPM as its test statistic. We show that this test, computed with an $\varepsilon$-neighborhood graph and calibrated by permutation, is minimax rate-optimal for detecting alternatives separated in the TV IPM. As an important special case, we show that this implies the graph TV test is optimal for detecting spatially localized alternatives, whereas the $χ^2$ test is provably suboptimal. Our theory is supported with numerical experiments on simulated and real data.
format Preprint
id arxiv_https___arxiv_org_abs_2409_15628
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Two-Sample Testing with a Graph-Based Total Variation Integral Probability Metric
Green, Alden
Balakrishnan, Sivaraman
Tibshirani, Ryan J.
Statistics Theory
We consider a novel multivariate nonparametric two-sample testing problem where, under the alternative, distributions $P$ and $Q$ are separated in an integral probability metric over functions of bounded total variation (TV IPM). We propose a new test, the graph TV test, which uses a graph-based approximation to the TV IPM as its test statistic. We show that this test, computed with an $\varepsilon$-neighborhood graph and calibrated by permutation, is minimax rate-optimal for detecting alternatives separated in the TV IPM. As an important special case, we show that this implies the graph TV test is optimal for detecting spatially localized alternatives, whereas the $χ^2$ test is provably suboptimal. Our theory is supported with numerical experiments on simulated and real data.
title Two-Sample Testing with a Graph-Based Total Variation Integral Probability Metric
topic Statistics Theory
url https://arxiv.org/abs/2409.15628