Saved in:
Bibliographic Details
Main Authors: Chakraborty, Diptarka, Chatterjee, Kushagra, Das, Debarati, Nguyen, Tien-Long
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.11500
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910019963846656
author Chakraborty, Diptarka
Chatterjee, Kushagra
Das, Debarati
Nguyen, Tien-Long
author_facet Chakraborty, Diptarka
Chatterjee, Kushagra
Das, Debarati
Nguyen, Tien-Long
contents Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.
format Preprint
id arxiv_https___arxiv_org_abs_2602_11500
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle A Generic Framework for Fair Consensus Clustering in Streams
Chakraborty, Diptarka
Chatterjee, Kushagra
Das, Debarati
Nguyen, Tien-Long
Machine Learning
Consensus clustering seeks to combine multiple clusterings of the same dataset, potentially derived by considering various non-sensitive attributes by different agents in a multi-agent environment, into a single partitioning that best reflects the overall structure of the underlying dataset. Recent work by Chakraborty et al, introduced a fair variant under proportionate fairness and obtained a constant-factor approximation by naively selecting the best closest fair input clustering; however, their offline approach requires storing all input clusterings, which is prohibitively expensive for most large-scale applications. In this paper, we initiate the study of fair consensus clustering in the streaming model, where input clusterings arrive sequentially and memory is limited. We design the first constant-factor algorithm that processes the stream while storing only a logarithmic number of inputs. En route, we introduce a new generic algorithmic framework that integrates closest fair clustering with cluster fitting, yielding improved approximation guarantees not only in the streaming setting but also when revisited offline. Furthermore, the framework is fairness-agnostic: it applies to any fairness definition for which an approximately close fair clustering can be computed efficiently. Finally, we extend our methods to the more general k-median consensus clustering problem.
title A Generic Framework for Fair Consensus Clustering in Streams
topic Machine Learning
url https://arxiv.org/abs/2602.11500