:: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Moss, Robert J.
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2408.08899
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
by: Mehrotra, Anay, et al.
Published: (2023)

PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024)

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024)

Exploiting Class Probabilities for Black-box Sentence-level Attacks
by: Moraffah, Raha, et al.
Published: (2024)

BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
by: Wang, Yifei, et al.
Published: (2024)

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2025)

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2026)

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening
by: Zhang, Mohan, et al.
Published: (2026)

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis
by: Zhang, Haoyu, et al.
Published: (2026)

Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
by: Tang, Haochun, et al.
Published: (2026)

BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
by: Wang, Xinyuan, et al.
Published: (2024)

Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)

Effective and Efficient Jailbreaks of Black-Box LLMs with Cross-Behavior Attacks
by: Gohil, Vasudev
Published: (2025)

Membership Inference Attacks on LLM-based Recommender Systems
by: He, Jiajie, et al.
Published: (2025)

Formalizing and Benchmarking Prompt Injection Attacks and Defenses
by: Liu, Yupei, et al.
Published: (2023)

PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)
by: Mancera, Gonzalo, et al.
Published: (2025)

BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints
by: Gill, Waris, et al.
Published: (2025)

The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)

SurvAttack: Black-Box Attack On Survival Models through Ontology-Informed EHR Perturbation
by: Kerdabadi, Mohsen Nayebi, et al.
Published: (2024)

SoK: Pitfalls in Evaluating Black-Box Attacks
by: Suya, Fnu, et al.
Published: (2023)

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
by: Baumgärtner, Tim, et al.
Published: (2024)

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)

HSF: Defending against Jailbreak Attacks with Hidden State Filtering
by: Qian, Cheng, et al.
Published: (2024)

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
by: Fang, Zhicheng, et al.
Published: (2026)

Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026)

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)

On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)

Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
by: Poppi, Samuele, et al.
Published: (2024)

Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation
by: Shahariar, G M, et al.
Published: (2024)

On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks
by: Liu, Zesen, et al.
Published: (2024)

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
by: Liao, Zeyi, et al.
Published: (2024)

Jailbreak Attacks and Defenses Against Large Language Models: A Survey
by: Yi, Sibo, et al.
Published: (2024)

Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks
by: Cheng, Yixin, et al.
Published: (2025)

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
by: Chen, Taiye, et al.
Published: (2025)

Attack and defense techniques in large language models: A survey and new perspectives
by: Liao, Zhiyu, et al.
Published: (2025)

MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
by: Jiang, Weisen, et al.
Published: (2025)

Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses
by: Yang, Xiaoxue, et al.
Published: (2025)

Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
by: Chen, Zhuo, et al.
Published: (2024)