Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Hsu, Yu-Ling, Su, Hsuan, Chen, Shang-Tse
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2502.01154
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866915134393286656
author Hsu, Yu-Ling
Su, Hsuan
Chen, Shang-Tse
author_facet Hsu, Yu-Ling
Su, Hsuan
Chen, Shang-Tse
contents Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.
format Preprint
id arxiv_https___arxiv_org_abs_2502_01154
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Jailbreaking with Universal Multi-Prompts
Hsu, Yu-Ling
Su, Hsuan
Chen, Shang-Tse
Computation and Language
Artificial Intelligence
Cryptography and Security
Machine Learning
Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.
title Jailbreaking with Universal Multi-Prompts
topic Computation and Language
Artificial Intelligence
Cryptography and Security
Machine Learning
url https://arxiv.org/abs/2502.01154