Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Shikhman, Lennon
Format:	Preprint
Published:	2026
Subjects:	Machine Learning 60J60 I.2.6; G.3
Online Access:	https://arxiv.org/abs/2601.00554
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909992440823808
author	Shikhman, Lennon
author_facet	Shikhman, Lennon
contents	Machine learning models deployed in nonstationary environments inevitably experience performance degradation due to data drift. While numerous drift detection heuristics exist, most lack a dynamical interpretation and provide limited guidance on how retraining decisions should be balanced against operational cost. In this work, we propose an entropy-based retraining framework grounded in nonequilibrium statistical physics. Interpreting drift as probability flow governed by a Fokker-Planck equation, we quantify model-data mismatch using relative entropy and show that its time derivative admits an entropy-balance decomposition featuring a nonnegative entropy production term driven by probability currents. Guided by this theory, we implement an entropy-triggered retraining policy using an exponentially weighted moving-average (EWMA) control statistic applied to a streaming kernel density estimator of the Kullback-Leibler divergence. We evaluate this approach across multiple nonstationary data streams. In synthetic, financial, and web-traffic domains, entropy-based retraining achieves predictive performance comparable to frequent retraining while reducing retraining frequency by one to two orders of magnitude. However, in a challenging biomedical ECG setting, the entropy-based trigger underperforms the maximum-frequency baseline, highlighting limitations of feature-space entropy monitoring under complex label-conditional drift.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_00554
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Entropy Production in Machine Learning Under Fokker-Planck Probability Flow Shikhman, Lennon Machine Learning 60J60 I.2.6; G.3 Machine learning models deployed in nonstationary environments inevitably experience performance degradation due to data drift. While numerous drift detection heuristics exist, most lack a dynamical interpretation and provide limited guidance on how retraining decisions should be balanced against operational cost. In this work, we propose an entropy-based retraining framework grounded in nonequilibrium statistical physics. Interpreting drift as probability flow governed by a Fokker-Planck equation, we quantify model-data mismatch using relative entropy and show that its time derivative admits an entropy-balance decomposition featuring a nonnegative entropy production term driven by probability currents. Guided by this theory, we implement an entropy-triggered retraining policy using an exponentially weighted moving-average (EWMA) control statistic applied to a streaming kernel density estimator of the Kullback-Leibler divergence. We evaluate this approach across multiple nonstationary data streams. In synthetic, financial, and web-traffic domains, entropy-based retraining achieves predictive performance comparable to frequent retraining while reducing retraining frequency by one to two orders of magnitude. However, in a challenging biomedical ECG setting, the entropy-based trigger underperforms the maximum-frequency baseline, highlighting limitations of feature-space entropy monitoring under complex label-conditional drift.
title	Entropy Production in Machine Learning Under Fokker-Planck Probability Flow
topic	Machine Learning 60J60 I.2.6; G.3
url	https://arxiv.org/abs/2601.00554

Similar Items