Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Thuy, Ta Thanh, Zhu, Jiaqi, Liu, Xuan, Shang, Lin, Rabbany, Reihaneh, Rabusseau, Guillaume, Chen, Lihui, Yilun, Zheng, Luan, Sitao
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2605.00513
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915972308271104
author	Thuy, Ta Thanh Zhu, Jiaqi Liu, Xuan Shang, Lin Rabbany, Reihaneh Rabusseau, Guillaume Chen, Lihui Yilun, Zheng Luan, Sitao
author_facet	Thuy, Ta Thanh Zhu, Jiaqi Liu, Xuan Shang, Lin Rabbany, Reihaneh Rabusseau, Guillaume Chen, Lihui Yilun, Zheng Luan, Sitao
contents	Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_00513
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks Thuy, Ta Thanh Zhu, Jiaqi Liu, Xuan Shang, Lin Rabbany, Reihaneh Rabusseau, Guillaume Chen, Lihui Yilun, Zheng Luan, Sitao Computation and Language Machine Learning Understanding how people argue across ideological divides online is important for studying political polarization, misinformation, and content moderation. Existing datasets capture only part of this problem: some preserve text but ignore interaction structure, some model structure without rich semantics, and others represent conversations without stable user-level ideological identity. We introduce ControBench, a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics. Built from Reddit discussions on three topics, Trump, abortion, and religion, ControBench contains 7,370 users, 1,783 posts, and 26,525 interactions. The graph contains user and post nodes connected by semantically enriched edges; in particular, user-comment-user edges encode both a reply and the parent comment that it responds to, preserving local argumentative context. User labels are derived from self-declared Reddit flairs, providing a scalable proxy for ideological identity without manual annotation. The resulting datasets exhibit low or negative adjusted homophily (Trump: -0.77, Abortion: 0.06, Religion: 0.04), reflecting the cross-cutting structure of real-world debate. We evaluate graph neural networks, pretrained language models, and large language models on ControBench and observe distinct performance patterns across topics and model families, especially when ideological boundaries are ambiguous. These results position ControBench as a challenging and realistic benchmark for controversial discourse analysis.
title	ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2605.00513

Similar Items