:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liao, Zeyi, Sun, Huan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2404.07921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AmpleGCG-Plus: A Strong Generative Model of Adversarial Suffixes to Jailbreak LLMs with Higher Success Rates in Fewer Attempts
by: Kumar, Vishal, et al.
Published: (2024)

Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks?
by: Mu, Junjie, et al.
Published: (2025)

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
by: Wang, Zijun, et al.
Published: (2024)

Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
by: Liu, Hongfu, et al.
Published: (2024)

Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward
by: Soor, Sampriti, et al.
Published: (2025)

Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
by: Ball, Sarah, et al.
Published: (2025)

Mitigating Adversarial Attacks in LLMs through Defensive Suffix Generation
by: Kim, Minkyoung, et al.
Published: (2024)

The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)

ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings
by: Wang, Hao, et al.
Published: (2024)

Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation
by: Soor, Sampriti, et al.
Published: (2025)

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
by: Mo, Lingbo, et al.
Published: (2024)

Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models
by: Li, Xiao, et al.
Published: (2024)

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
by: Basani, Advik Raj, et al.
Published: (2024)

AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
by: Liao, Zeyi, et al.
Published: (2025)

Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models
by: Eddoubi, Hicham, et al.
Published: (2026)

One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
by: Li, Linbao, et al.
Published: (2025)

GCG Attack On A Diffusion LLM
by: Neyroud, Ruben, et al.
Published: (2025)

Jailbreaking LLMs' Safeguard with Universal Magic Words for Text Embedding Models
by: Liang, Haoyu, et al.
Published: (2025)

Autonomous Continual Learning for Environment Adaptation of Computer-Use Agents
by: Xue, Tianci, et al.
Published: (2026)

Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages
by: El-Naggar, Nadine, et al.
Published: (2025)

Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
by: Liu, Fan, et al.
Published: (2024)

Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models
by: Bisconti, Piercosma, et al.
Published: (2025)

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
by: Guo, Xingang, et al.
Published: (2024)

Open Sesame! Universal Black Box Jailbreaking of Large Language Models
by: Lapid, Raz, et al.
Published: (2023)

Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs
by: Li, Xiaoxia, et al.
Published: (2024)

Adaptive Content Restriction for Large Language Models via Suffix Optimization
by: Li, Yige, et al.
Published: (2025)

LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
by: Li, Ran, et al.
Published: (2025)

TrapSuffix: Proactive Defense Against Adversarial Suffixes in Jailbreaking
by: Du, Mengyao, et al.
Published: (2026)

Suffix-Constrained Greedy Search Algorithms for Causal Language Models
by: Hammal, Ayoub, et al.
Published: (2026)

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
by: Tang, Haochun, et al.
Published: (2026)

Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
by: Li, Xiang, et al.
Published: (2025)

Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
by: Atil, Berk, et al.
Published: (2025)

DPad: Efficient Diffusion Language Models with Suffix Dropout
by: Chen, Xinhua, et al.
Published: (2025)

A Closer Look at Adversarial Suffix Learning for Jailbreaking LLMs: Augmented Adversarial Trigger Learning
by: Wang, Zhe, et al.
Published: (2025)

AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
by: Lee, Yejin, et al.
Published: (2025)

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
by: Han, Seungju, et al.
Published: (2024)

ROSA-Tuning: Enhancing Long-Context Modeling via Suffix Matching
by: Zheng, Yunao, et al.
Published: (2026)

Doubly-Universal Adversarial Perturbations: Deceiving Vision-Language Models Across Both Images and Text with a Single Perturbation
by: Kim, Hee-Seon, et al.
Published: (2024)

AdvAgent: Controllable Blackbox Red-teaming on Web Agents
by: Xu, Chejian, et al.
Published: (2024)