Saved in:
Bibliographic Details
Main Authors: Thuy, Ta Thanh, Zhu, Jiaqi, Liu, Xuan, Shang, Lin, Rabbany, Reihaneh, Rabusseau, Guillaume, Chen, Lihui, Yilun, Zheng, Luan, Sitao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.00513
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915972308271104
author Thuy, Ta Thanh
Zhu, Jiaqi
Liu, Xuan
Shang, Lin
Rabbany, Reihaneh
Rabusseau, Guillaume
Chen, Lihui
Yilun, Zheng
Luan, Sitao
author_facet Thuy, Ta Thanh
Zhu, Jiaqi
Liu, Xuan
Shang, Lin
Rabbany, Reihaneh
Rabusseau, Guillaume
Chen, Lihui
Yilun, Zheng
Luan, Sitao
contents Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.
format Preprint
id arxiv_https___arxiv_org_abs_2605_00513
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
Thuy, Ta Thanh
Zhu, Jiaqi
Liu, Xuan
Shang, Lin
Rabbany, Reihaneh
Rabusseau, Guillaume
Chen, Lihui
Yilun, Zheng
Luan, Sitao
Computation and Language
Machine Learning
Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.
title ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2605.00513