Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hsu, Yu-Ling, Su, Hsuan, Chen, Shang-Tse
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language Artificial Intelligence Cryptography and Security Machine Learning
Online-Zugang:	https://arxiv.org/abs/2502.01154
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866915134393286656
author	Hsu, Yu-Ling Su, Hsuan Chen, Shang-Tse
author_facet	Hsu, Yu-Ling Su, Hsuan Chen, Shang-Tse
contents	Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_01154
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Jailbreaking with Universal Multi-Prompts Hsu, Yu-Ling Su, Hsuan Chen, Shang-Tse Computation and Language Artificial Intelligence Cryptography and Security Machine Learning Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.
title	Jailbreaking with Universal Multi-Prompts
topic	Computation and Language Artificial Intelligence Cryptography and Security Machine Learning
url	https://arxiv.org/abs/2502.01154

Ähnliche Einträge