Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13454 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- In this paper, we extend distance correlation to categorical data with general encodings, such as one-hot encoding for nominal variables and semicircle encoding for ordinal variables. Unlike existing methods, our approach leverages the spacing information between categories, which enhances the performance of distance correlation. Two estimates including the maximum likelihood estimate and a bias-corrected estimate are given, together with their limiting distributions under the null and alternative hypotheses. Furthermore, we establish the sure screening property for high-dimensional categorical data under mild conditions. We conduct a simulation study to compare the performance of different encodings, and illustrate their practical utility using the 2018 General Social Survey data.