Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yan, Yu, Sun, Sheng, Li, Mingfeng, Song, Yunlong, Zhang, Xingzhou, Lu, Linran, Zheng, Zhifei, Liu, Min, Li, Qi
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2505.21184
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908784143630336
author	Yan, Yu Sun, Sheng Li, Mingfeng Song, Yunlong Zhang, Xingzhou Lu, Linran Zheng, Zhifei Liu, Min Li, Qi
author_facet	Yan, Yu Sun, Sheng Li, Mingfeng Song, Yunlong Zhang, Xingzhou Lu, Linran Zheng, Zhifei Liu, Min Li, Qi
contents	To prevent the misuse of Large Language Models (LLMs) for malicious purposes, numerous efforts have been made to develop the safety alignment mechanisms of LLMs. However, as multiple LLMs become readily accessible through various Model-as-a-Service (MaaS) platforms, attackers can strategically exploit LLMs' heterogeneous safety policies to fulfill malicious information generation tasks in a distributed manner. In this study, we introduce \textit{\textbf{PoisonSwarm}} to how attackers can reliably launder malicious tasks via the speculative use of LLM crowdsourcing. Building upon a scheduler orchestrating crowdsourced LLMs, PoisonSwarm maps the given malicious task to a benign analogue to derive a content template, decomposes it into semantic units for crowdsourced unit-wise rewriting, and reassembles the outputs into malicious content. Experiments show its superiority over existing methods in data quality, diversity, and success rates. Regulation simulations further reveal the difficulty of governing such distributed, orchestrated misuse in MaaS ecosystems, highlighting the need for coordinated, ecosystem-level defenses.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_21184
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Jailbreak-as-a-Service++: Unveiling Distributed AI-Driven Malicious Information Campaigns Powered by LLM Crowdsourcing Yan, Yu Sun, Sheng Li, Mingfeng Song, Yunlong Zhang, Xingzhou Lu, Linran Zheng, Zhifei Liu, Min Li, Qi Machine Learning Artificial Intelligence Computation and Language To prevent the misuse of Large Language Models (LLMs) for malicious purposes, numerous efforts have been made to develop the safety alignment mechanisms of LLMs. However, as multiple LLMs become readily accessible through various Model-as-a-Service (MaaS) platforms, attackers can strategically exploit LLMs' heterogeneous safety policies to fulfill malicious information generation tasks in a distributed manner. In this study, we introduce \textit{\textbf{PoisonSwarm}} to how attackers can reliably launder malicious tasks via the speculative use of LLM crowdsourcing. Building upon a scheduler orchestrating crowdsourced LLMs, PoisonSwarm maps the given malicious task to a benign analogue to derive a content template, decomposes it into semantic units for crowdsourced unit-wise rewriting, and reassembles the outputs into malicious content. Experiments show its superiority over existing methods in data quality, diversity, and success rates. Regulation simulations further reveal the difficulty of governing such distributed, orchestrated misuse in MaaS ecosystems, highlighting the need for coordinated, ecosystem-level defenses.
title	Jailbreak-as-a-Service++: Unveiling Distributed AI-Driven Malicious Information Campaigns Powered by LLM Crowdsourcing
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2505.21184

Similar Items