Saved in:
Bibliographic Details
Main Authors: Zhang, Xiaochen, Cai, Yunfeng, Xiong, Haoyi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.17889
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909469932257280
author Zhang, Xiaochen
Cai, Yunfeng
Xiong, Haoyi
author_facet Zhang, Xiaochen
Cai, Yunfeng
Xiong, Haoyi
contents Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations.
format Preprint
id arxiv_https___arxiv_org_abs_2501_17889
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection
Zhang, Xiaochen
Cai, Yunfeng
Xiong, Haoyi
Machine Learning
Artificial Intelligence
Variable selection plays a crucial role in enhancing modeling effectiveness across diverse fields, addressing the challenges posed by high-dimensional datasets of correlated variables. This work introduces a novel approach namely Knockoff with over-parameterization (Knoop) to enhance Knockoff filters for variable selection. Specifically, Knoop first generates multiple knockoff variables for each original variable and integrates them with the original variables into an over-parameterized Ridgeless regression model. For each original variable, Knoop evaluates the coefficient distribution of its knockoffs and compares these with the original coefficients to conduct an anomaly-based significance test, ensuring robust variable selection. Extensive experiments demonstrate superior performance compared to existing methods in both simulation and real-world datasets. Knoop achieves a notably higher Area under the Curve (AUC) of the Receiver Operating Characteristic (ROC) Curve for effectively identifying relevant variables against the ground truth by controlled simulations, while showcasing enhanced predictive accuracy across diverse regression and classification tasks. The analytical results further backup our observations.
title Knoop: Practical Enhancement of Knockoff with Over-Parameterization for Variable Selection
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2501.17889