Saved in:
Bibliographic Details
Main Authors: Zhu, Yejiong, Chen, Hao
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2307.15205
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909854639063040
author Zhu, Yejiong
Chen, Hao
author_facet Zhu, Yejiong
Chen, Hao
contents Dimensionality effects pose major challenges in high-dimensional and non-Euclidean data analysis. Graph-based two-sample tests and change-point detection are particularly attractive in this context, as they make minimal distributional assumptions and perform well across a wide range of scenarios. These methods rely on similarity graphs constructed from data, with $K$-nearest neighbor graphs and $K$-minimum spanning trees among the most effective and widely used. However, in high-dimensional and non-Euclidean regimes such graphs often produce hubs -- nodes with disproportionately high degrees -- to which graph-based methods are especially sensitive. To mitigate these dimensionality effects, we propose a robust graph construction that is far less prone to hub formation. Incorporating this construction substantially improves the power of graph-based methods across diverse settings. We further establish a theoretical foundation by proving its consistency under fixed alternatives in both low- and high-dimensional regimes. The effectiveness of the approach is demonstrated through real-world applications, including comparisons of correlation matrices for brain regions, gene expression profiles of T cells, and temporal changes in New York City taxi travel patterns.
format Preprint
id arxiv_https___arxiv_org_abs_2307_15205
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Mitigating dimensionality effects with robust graph constructions for testing
Zhu, Yejiong
Chen, Hao
Methodology
Statistics Theory
Dimensionality effects pose major challenges in high-dimensional and non-Euclidean data analysis. Graph-based two-sample tests and change-point detection are particularly attractive in this context, as they make minimal distributional assumptions and perform well across a wide range of scenarios. These methods rely on similarity graphs constructed from data, with $K$-nearest neighbor graphs and $K$-minimum spanning trees among the most effective and widely used. However, in high-dimensional and non-Euclidean regimes such graphs often produce hubs -- nodes with disproportionately high degrees -- to which graph-based methods are especially sensitive. To mitigate these dimensionality effects, we propose a robust graph construction that is far less prone to hub formation. Incorporating this construction substantially improves the power of graph-based methods across diverse settings. We further establish a theoretical foundation by proving its consistency under fixed alternatives in both low- and high-dimensional regimes. The effectiveness of the approach is demonstrated through real-world applications, including comparisons of correlation matrices for brain regions, gene expression profiles of T cells, and temporal changes in New York City taxi travel patterns.
title Mitigating dimensionality effects with robust graph constructions for testing
topic Methodology
Statistics Theory
url https://arxiv.org/abs/2307.15205