Saved in:
Bibliographic Details
Main Authors: Dağ, Arif, Sahin, Simay, Karaköse, Mehmet
Format: Recurso digital
Language:
Published: Zenodo 2026
Online Access:https://doi.org/10.5281/zenodo.19103155
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901765943721984
author Dağ, Arif
Sahin, Simay
Karaköse, Mehmet
author_facet Dağ, Arif
Sahin, Simay
Karaköse, Mehmet
contents <p><span lang="EN">Crowdsourcing is widely used to collect labels for machine learning, but open participation also allows spammers, colluders, and Sybil-style attackers to create persuasive yet incorrect consensus. This paper studies robust truth inference under such attacks with a label-aware graph neural network that represents workers and tasks as a bipartite graph. The proposed framework combines edge-label-aware message passing, an auxiliary worker-trust head, and adaptive use of task-content features. Rather than relying on worker-maliciousness labels during training, the primary model is trained only with task supervision and selects between content-enabled and no-content variants on validation data.</span><span lang="EN"> </span><span lang="EN">Evaluation uses a held-out train/validation/test protocol on simulated cifar_binary, imdb, and newsgroups labeling tasks under realistic and oracle threat models. We compare against majority voting, weighted majority voting, Dawid–Skene, the binary KOS baseline where applicable, MMSR, content-only baselines, and collusion/Sybil defenses adapted from prior work. We also validate on two public real crowdsourcing benchmarks, relevance-2 and relevance-5. On these real benchmarks, the adaptive GNN reaches 81.85% and 90.80% accuracy, respectively, and significantly outperforms the classical and robust aggregation baselines considered in this study. In simulation, the method is competitive with the strongest fair content-aware baseline, improves substantially over a fixed-content GNN on newsgroups, and remains stronger than classical crowd-only aggregation on the attack-sensitive cifar binary setting. Ablation analysis shows that task content helps on cifar binary and imdb but hurts on newsgroups, motivating adaptive content selection instead of a fixed multimodal design. Overall, the results support a qualified claim: graph-based robust aggregation can work without worker-maliciousness labels, but its gains are dataset-dependent and are strongest when relational evidence and task semantics complement each other.</span></p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_19103155
institution Zenodo
language
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle Robust Truth Inference in Crowdsourcing under Adversarial Attacks via Graph Neural Networks
Dağ, Arif
Sahin, Simay
Karaköse, Mehmet
<p><span lang="EN">Crowdsourcing is widely used to collect labels for machine learning, but open participation also allows spammers, colluders, and Sybil-style attackers to create persuasive yet incorrect consensus. This paper studies robust truth inference under such attacks with a label-aware graph neural network that represents workers and tasks as a bipartite graph. The proposed framework combines edge-label-aware message passing, an auxiliary worker-trust head, and adaptive use of task-content features. Rather than relying on worker-maliciousness labels during training, the primary model is trained only with task supervision and selects between content-enabled and no-content variants on validation data.</span><span lang="EN"> </span><span lang="EN">Evaluation uses a held-out train/validation/test protocol on simulated cifar_binary, imdb, and newsgroups labeling tasks under realistic and oracle threat models. We compare against majority voting, weighted majority voting, Dawid–Skene, the binary KOS baseline where applicable, MMSR, content-only baselines, and collusion/Sybil defenses adapted from prior work. We also validate on two public real crowdsourcing benchmarks, relevance-2 and relevance-5. On these real benchmarks, the adaptive GNN reaches 81.85% and 90.80% accuracy, respectively, and significantly outperforms the classical and robust aggregation baselines considered in this study. In simulation, the method is competitive with the strongest fair content-aware baseline, improves substantially over a fixed-content GNN on newsgroups, and remains stronger than classical crowd-only aggregation on the attack-sensitive cifar binary setting. Ablation analysis shows that task content helps on cifar binary and imdb but hurts on newsgroups, motivating adaptive content selection instead of a fixed multimodal design. Overall, the results support a qualified claim: graph-based robust aggregation can work without worker-maliciousness labels, but its gains are dataset-dependent and are strongest when relational evidence and task semantics complement each other.</span></p>
title Robust Truth Inference in Crowdsourcing under Adversarial Attacks via Graph Neural Networks
url https://doi.org/10.5281/zenodo.19103155