Saved in:
Bibliographic Details
Main Authors: Yan, Yu, Sun, Sheng, Li, Mingfeng, Song, Yunlong, Zhang, Xingzhou, Lu, Linran, Zheng, Zhifei, Liu, Min, Li, Qi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.21184
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908784143630336
author Yan, Yu
Sun, Sheng
Li, Mingfeng
Song, Yunlong
Zhang, Xingzhou
Lu, Linran
Zheng, Zhifei
Liu, Min
Li, Qi
author_facet Yan, Yu
Sun, Sheng
Li, Mingfeng
Song, Yunlong
Zhang, Xingzhou
Lu, Linran
Zheng, Zhifei
Liu, Min
Li, Qi
contents To prevent the misuse of Large Language Models (LLMs) for malicious purposes, numerous efforts have been made to develop the safety alignment mechanisms of LLMs. However, as multiple LLMs become readily accessible through various Model-as-a-Service (MaaS) platforms, attackers can strategically exploit LLMs' heterogeneous safety policies to fulfill malicious information generation tasks in a distributed manner. In this study, we introduce \textit{\textbf{PoisonSwarm}} to how attackers can reliably launder malicious tasks via the speculative use of LLM crowdsourcing. Building upon a scheduler orchestrating crowdsourced LLMs, PoisonSwarm maps the given malicious task to a benign analogue to derive a content template, decomposes it into semantic units for crowdsourced unit-wise rewriting, and reassembles the outputs into malicious content. Experiments show its superiority over existing methods in data quality, diversity, and success rates. Regulation simulations further reveal the difficulty of governing such distributed, orchestrated misuse in MaaS ecosystems, highlighting the need for coordinated, ecosystem-level defenses.
format Preprint
id arxiv_https___arxiv_org_abs_2505_21184
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Jailbreak-as-a-Service++: Unveiling Distributed AI-Driven Malicious Information Campaigns Powered by LLM Crowdsourcing
Yan, Yu
Sun, Sheng
Li, Mingfeng
Song, Yunlong
Zhang, Xingzhou
Lu, Linran
Zheng, Zhifei
Liu, Min
Li, Qi
Machine Learning
Artificial Intelligence
Computation and Language
To prevent the misuse of Large Language Models (LLMs) for malicious purposes, numerous efforts have been made to develop the safety alignment mechanisms of LLMs. However, as multiple LLMs become readily accessible through various Model-as-a-Service (MaaS) platforms, attackers can strategically exploit LLMs' heterogeneous safety policies to fulfill malicious information generation tasks in a distributed manner. In this study, we introduce \textit{\textbf{PoisonSwarm}} to how attackers can reliably launder malicious tasks via the speculative use of LLM crowdsourcing. Building upon a scheduler orchestrating crowdsourced LLMs, PoisonSwarm maps the given malicious task to a benign analogue to derive a content template, decomposes it into semantic units for crowdsourced unit-wise rewriting, and reassembles the outputs into malicious content. Experiments show its superiority over existing methods in data quality, diversity, and success rates. Regulation simulations further reveal the difficulty of governing such distributed, orchestrated misuse in MaaS ecosystems, highlighting the need for coordinated, ecosystem-level defenses.
title Jailbreak-as-a-Service++: Unveiling Distributed AI-Driven Malicious Information Campaigns Powered by LLM Crowdsourcing
topic Machine Learning
Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2505.21184