Saved in:
| Main Authors: | Liao, Zeyi, Sun, Huan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.07921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024)
by: Kumar, Vishal, et al.
Published: (2024)
Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
by: Mu, Junjie, et al.
Published: (2025)
by: Mu, Junjie, et al.
Published: (2025)
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
by: Wang, Zijun, et al.
Published: (2024)
by: Wang, Zijun, et al.
Published: (2024)
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
by: Liu, Hongfu, et al.
Published: (2024)
by: Liu, Hongfu, et al.
Published: (2024)
Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward
by: Soor, Sampriti, et al.
Published: (2025)
by: Soor, Sampriti, et al.
Published: (2025)
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
by: Ball, Sarah, et al.
Published: (2025)
by: Ball, Sarah, et al.
Published: (2025)
Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation
by: Kim, Minkyoung, et al.
Published: (2024)
by: Kim, Minkyoung, et al.
Published: (2024)
The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)
by: Tan, Yuting, et al.
Published: (2025)
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation
by: Soor, Sampriti, et al.
Published: (2025)
by: Soor, Sampriti, et al.
Published: (2025)
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
by: Mo, Lingbo, et al.
Published: (2024)
by: Mo, Lingbo, et al.
Published: (2024)
Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models
by: Li, Xiao, et al.
Published: (2024)
by: Li, Xiao, et al.
Published: (2024)
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
by: Basani, Advik Raj, et al.
Published: (2024)
by: Basani, Advik Raj, et al.
Published: (2024)
AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)
by: Li, Yifei, et al.
Published: (2024)
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
by: Liao, Zeyi, et al.
Published: (2025)
by: Liao, Zeyi, et al.
Published: (2025)
Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models
by: Eddoubi, Hicham, et al.
Published: (2026)
by: Eddoubi, Hicham, et al.
Published: (2026)
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
by: Li, Linbao, et al.
Published: (2025)
by: Li, Linbao, et al.
Published: (2025)
GCG Attack On A Diffusion LLM
by: Neyroud, Ruben, et al.
Published: (2025)
by: Neyroud, Ruben, et al.
Published: (2025)
Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models
by: Liang, Haoyu, et al.
Published: (2025)
by: Liang, Haoyu, et al.
Published: (2025)
Autonomous Continual Learning for Environment Adaptation of Computer-Use Agents
by: Xue, Tianci, et al.
Published: (2026)
by: Xue, Tianci, et al.
Published: (2026)
Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages
by: El-Naggar, Nadine, et al.
Published: (2025)
by: El-Naggar, Nadine, et al.
Published: (2025)
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
by: Liu, Fan, et al.
Published: (2024)
by: Liu, Fan, et al.
Published: (2024)
Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
by: Bisconti, Piercosma, et al.
Published: (2025)
by: Bisconti, Piercosma, et al.
Published: (2025)
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
by: Guo, Xingang, et al.
Published: (2024)
by: Guo, Xingang, et al.
Published: (2024)
Open Sesame! Universal Black Box Jailbreaking of Large Language Models
by: Lapid, Raz, et al.
Published: (2023)
by: Lapid, Raz, et al.
Published: (2023)
Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs
by: Li, Xiaoxia, et al.
Published: (2024)
by: Li, Xiaoxia, et al.
Published: (2024)
Adaptive Content Restriction for Large Language Models via Suffix Optimization
by: Li, Yige, et al.
Published: (2025)
by: Li, Yige, et al.
Published: (2025)
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
by: Li, Ran, et al.
Published: (2025)
by: Li, Ran, et al.
Published: (2025)
TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking
by: Du, Mengyao, et al.
Published: (2026)
by: Du, Mengyao, et al.
Published: (2026)
Suffix-Constrained Greedy Search Algorithms for Causal Language Models
by: Hammal, Ayoub, et al.
Published: (2026)
by: Hammal, Ayoub, et al.
Published: (2026)
Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
by: Tang, Haochun, et al.
Published: (2026)
by: Tang, Haochun, et al.
Published: (2026)
Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
by: Atil, Berk, et al.
Published: (2025)
by: Atil, Berk, et al.
Published: (2025)
DPad: Efficient Diffusion Language Models with Suffix Dropout
by: Chen, Xinhua, et al.
Published: (2025)
by: Chen, Xinhua, et al.
Published: (2025)
A Closer Look at Adversarial Suffix Learning for Jailbreaking LLMs: Augmented Adversarial Trigger Learning
by: Wang, Zhe, et al.
Published: (2025)
by: Wang, Zhe, et al.
Published: (2025)
AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
by: Lee, Yejin, et al.
Published: (2025)
by: Lee, Yejin, et al.
Published: (2025)
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
by: Han, Seungju, et al.
Published: (2024)
by: Han, Seungju, et al.
Published: (2024)
ROSA-Tuning: Enhancing Long-Context Modeling via Suffix Matching
by: Zheng, Yunao, et al.
Published: (2026)
by: Zheng, Yunao, et al.
Published: (2026)
Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation
by: Kim, Hee-Seon, et al.
Published: (2024)
by: Kim, Hee-Seon, et al.
Published: (2024)
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
by: Xu, Chejian, et al.
Published: (2024)
by: Xu, Chejian, et al.
Published: (2024)
Similar Items
-
AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024) -
Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
by: Mu, Junjie, et al.
Published: (2025) -
AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
by: Wang, Zijun, et al.
Published: (2024) -
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
by: Liu, Hongfu, et al.
Published: (2024) -
Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward
by: Soor, Sampriti, et al.
Published: (2025)