Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Liu, Kaiwen, Villalobos, Seba Daniela, Zhang, Qin
Format:	Preprint
Publié:	2026
Sujets:	Data Structures and Algorithms
Accès en ligne:	https://arxiv.org/abs/2605.07091
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866918489129746432
author	Liu, Kaiwen Villalobos, Seba Daniela Zhang, Qin
author_facet	Liu, Kaiwen Villalobos, Seba Daniela Zhang, Qin
contents	We study the correlation clustering problem in the node-arrival data stream model. Unlike previous work, where the stream consists of the graph's edges, we focus on the setting in which the stream contains only the nodes. This model better reflects many real-world scenarios in which the data stream naturally consists of raw objects (e.g., images, tweets), and the similar/dissimilar edges are derived through a similarity function. We present C$^4$Approx, a streaming algorithm that approximates the cost of correlation clustering using sublinear space in the number of nodes and a constant number of passes. We further complement this result with lower bounds. Experiments on real-world datasets show that by storing only 2% of the nodes, our algorithm achieves performance comparable to the classic Pivot algorithm and the more recent PrunedPivot algorithm, even on sparse graphs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_07091
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Estimating Correlation Clustering Cost in Node-Arrival Stream Liu, Kaiwen Villalobos, Seba Daniela Zhang, Qin Data Structures and Algorithms We study the correlation clustering problem in the node-arrival data stream model. Unlike previous work, where the stream consists of the graph's edges, we focus on the setting in which the stream contains only the nodes. This model better reflects many real-world scenarios in which the data stream naturally consists of raw objects (e.g., images, tweets), and the similar/dissimilar edges are derived through a similarity function. We present C$^4$Approx, a streaming algorithm that approximates the cost of correlation clustering using sublinear space in the number of nodes and a constant number of passes. We further complement this result with lower bounds. Experiments on real-world datasets show that by storing only 2% of the nodes, our algorithm achieves performance comparable to the classic Pivot algorithm and the more recent PrunedPivot algorithm, even on sparse graphs.
title	Estimating Correlation Clustering Cost in Node-Arrival Stream
topic	Data Structures and Algorithms
url	https://arxiv.org/abs/2605.07091

Documents similaires