Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Dağ, Arif, Sahin, Simay, Karaköse, Mehmet
Format:	Recurso digital
Language:
Published:	Zenodo 2026
Online Access:	https://doi.org/10.5281/zenodo.19103155
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Crowdsourcing is widely used to collect labels for machine learning, but open participation also allows spammers, colluders, and Sybil-style attackers to create persuasive yet incorrect consensus. This paper studies robust truth inference under such attacks with a label-aware graph neural network that represents workers and tasks as a bipartite graph. The proposed framework combines edge-label-aware message passing, an auxiliary worker-trust head, and adaptive use of task-content features. Rather than relying on worker-maliciousness labels during training, the primary model is trained only with task supervision and selects between content-enabled and no-content variants on validation data. Evaluation uses a held-out train/validation/test protocol on simulated cifar_binary, imdb, and newsgroups labeling tasks under realistic and oracle threat models. We compare against majority voting, weighted majority voting, Dawid–Skene, the binary KOS baseline where applicable, MMSR, content-only baselines, and collusion/Sybil defenses adapted from prior work. We also validate on two public real crowdsourcing benchmarks, relevance-2 and relevance-5. On these real benchmarks, the adaptive GNN reaches 81.85% and 90.80% accuracy, respectively, and significantly outperforms the classical and robust aggregation baselines considered in this study. In simulation, the method is competitive with the strongest fair content-aware baseline, improves substantially over a fixed-content GNN on newsgroups, and remains stronger than classical crowd-only aggregation on the attack-sensitive cifar binary setting. Ablation analysis shows that task content helps on cifar binary and imdb but hurts on newsgroups, motivating adaptive content selection instead of a fixed multimodal design. Overall, the results support a qualified claim: graph-based robust aggregation can work without worker-maliciousness labels, but its gains are dataset-dependent and are strongest when relational evidence and task semantics complement each other.

Similar Items