Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Deng, Ou, Nishimura, Shoji, Ogihara, Atsushi, Jin, Qun
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.12380
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914263170285568
author	Deng, Ou Nishimura, Shoji Ogihara, Atsushi Jin, Qun
author_facet	Deng, Ou Nishimura, Shoji Ogihara, Atsushi Jin, Qun
contents	Real-world tabular databases routinely combine continuous measurements and categorical records, yet missing entries are pervasive and can distort downstream analysis. We propose Statistical-Neural Interaction (SNI), an interpretable mixed-type imputation framework that couples correlation-derived statistical priors with neural feature attention through a Controllable-Prior Feature Attention (CPFA) module. CPFA learns head-wise prior-strength coefficients $\{λ_h\}$ that softly regularize attention toward the prior while allowing data-driven deviations when nonlinear patterns appear to be present in the data. Beyond imputation, SNI aggregates attention maps into a directed feature-dependency matrix that summarizes which variables the imputer relied on, without requiring post-hoc explainers. We evaluate SNI against six baselines (Mean/Mode, MICE, KNN, MissForest, GAIN, MIWAE) on six datasets spanning ICU monitoring, population surveys, socio-economic statistics, and engineering applications. Under MCAR/strict-MAR at 30\% missingness, SNI is generally competitive on continuous metrics but is often outperformed by accuracy-first baselines (MissForest, MIWAE) on categorical variables; in return, it provides intrinsic dependency diagnostics and explicit statistical-neural trade-off parameters. We additionally report MNAR stress tests (with a mask-aware variant) and discuss computational cost, limitations -- particularly for severely imbalanced categorical targets -- and deployment scenarios where interpretability may justify the trade-off.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_12380
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Statistical-Neural Interaction Networks for Interpretable Mixed-Type Data Imputation Deng, Ou Nishimura, Shoji Ogihara, Atsushi Jin, Qun Machine Learning Real-world tabular databases routinely combine continuous measurements and categorical records, yet missing entries are pervasive and can distort downstream analysis. We propose Statistical-Neural Interaction (SNI), an interpretable mixed-type imputation framework that couples correlation-derived statistical priors with neural feature attention through a Controllable-Prior Feature Attention (CPFA) module. CPFA learns head-wise prior-strength coefficients $\{λ_h\}$ that softly regularize attention toward the prior while allowing data-driven deviations when nonlinear patterns appear to be present in the data. Beyond imputation, SNI aggregates attention maps into a directed feature-dependency matrix that summarizes which variables the imputer relied on, without requiring post-hoc explainers. We evaluate SNI against six baselines (Mean/Mode, MICE, KNN, MissForest, GAIN, MIWAE) on six datasets spanning ICU monitoring, population surveys, socio-economic statistics, and engineering applications. Under MCAR/strict-MAR at 30\% missingness, SNI is generally competitive on continuous metrics but is often outperformed by accuracy-first baselines (MissForest, MIWAE) on categorical variables; in return, it provides intrinsic dependency diagnostics and explicit statistical-neural trade-off parameters. We additionally report MNAR stress tests (with a mask-aware variant) and discuss computational cost, limitations -- particularly for severely imbalanced categorical targets -- and deployment scenarios where interpretability may justify the trade-off.
title	Statistical-Neural Interaction Networks for Interpretable Mixed-Type Data Imputation
topic	Machine Learning
url	https://arxiv.org/abs/2601.12380

Similar Items