Saved in:
Bibliographic Details
Main Authors: Ramavarapu, Vikram, Lamy, João Alfredo Cardoso, Dindoost, Mohammad, Bader, David A.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.19717
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909922425307136
author Ramavarapu, Vikram
Lamy, João Alfredo Cardoso
Dindoost, Mohammad
Bader, David A.
author_facet Ramavarapu, Vikram
Lamy, João Alfredo Cardoso
Dindoost, Mohammad
Bader, David A.
contents Community detection, or network clustering, is used to identify latent community structure in networks. Due to the scarcity of labeled ground truth in real-world networks, evaluating these algorithms poses significant challenges. To address this, researchers use synthetic network generators that produce networks with ground-truth community labels. RECCS is one such algorithm that takes a network and its clustering as input and generates a synthetic network through a modular pipeline. Each generated ground truth cluster preserves key characteristics of the corresponding input cluster, including connectivity, minimum degree, and degree sequence distribution. The output consists of a synthetically generated network, and disjoint ground truth cluster labels for all nodes. In this paper, we present two enhanced versions: RECCS+ and RECCS++. RECCS+ maintains algorithmic fidelity to the original RECCS while introducing parallelization through an orchestrator that coordinates algorithmic components across multiple processes and employs multithreading. RECCS++ builds upon this foundation with additional algorithmic optimizations to achieve further speedup. Our experimental results demonstrate that RECCS+ and RECCS++ achieve speedups of up to 49x and 139x respectively on our benchmark datasets, with RECCS++'s additional performance gains involving a modest accuracy tradeoff. With this newfound performance, RECCS++ can now scale to networks with over 100 million nodes and nearly 2 billion edges.
format Preprint
id arxiv_https___arxiv_org_abs_2511_19717
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Large Scale Community-Aware Network Generation
Ramavarapu, Vikram
Lamy, João Alfredo Cardoso
Dindoost, Mohammad
Bader, David A.
Social and Information Networks
Machine Learning
Community detection, or network clustering, is used to identify latent community structure in networks. Due to the scarcity of labeled ground truth in real-world networks, evaluating these algorithms poses significant challenges. To address this, researchers use synthetic network generators that produce networks with ground-truth community labels. RECCS is one such algorithm that takes a network and its clustering as input and generates a synthetic network through a modular pipeline. Each generated ground truth cluster preserves key characteristics of the corresponding input cluster, including connectivity, minimum degree, and degree sequence distribution. The output consists of a synthetically generated network, and disjoint ground truth cluster labels for all nodes. In this paper, we present two enhanced versions: RECCS+ and RECCS++. RECCS+ maintains algorithmic fidelity to the original RECCS while introducing parallelization through an orchestrator that coordinates algorithmic components across multiple processes and employs multithreading. RECCS++ builds upon this foundation with additional algorithmic optimizations to achieve further speedup. Our experimental results demonstrate that RECCS+ and RECCS++ achieve speedups of up to 49x and 139x respectively on our benchmark datasets, with RECCS++'s additional performance gains involving a modest accuracy tradeoff. With this newfound performance, RECCS++ can now scale to networks with over 100 million nodes and nearly 2 billion edges.
title Large Scale Community-Aware Network Generation
topic Social and Information Networks
Machine Learning
url https://arxiv.org/abs/2511.19717