Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chakraborty, Diptarka, Chatterjee, Kushagra, Das, Debarati, Nguyen, Tien-Long
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.11500
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910019963846656
author	Chakraborty, Diptarka Chatterjee, Kushagra Das, Debarati Nguyen, Tien-Long
author_facet	Chakraborty, Diptarka Chatterjee, Kushagra Das, Debarati Nguyen, Tien-Long
contents	Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_11500
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	A Generic Framework for Fair Consensus Clustering in Streams Chakraborty, Diptarka Chatterjee, Kushagra Das, Debarati Nguyen, Tien-Long Machine Learning Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.
title	A Generic Framework for Fair Consensus Clustering in Streams
topic	Machine Learning
url	https://arxiv.org/abs/2602.11500

Similar Items