Saved in:
| Main Author: | Moss, Robert J. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.08899 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
by: Mehrotra, Anay, et al.
Published: (2023)
by: Mehrotra, Anay, et al.
Published: (2023)
PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024)
by: Sitawarin, Chawin, et al.
Published: (2024)
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024)
by: Gubri, Martin, et al.
Published: (2024)
Exploiting Class Probabilities for Black-box Sentence-level Attacks
by: Moraffah, Raha, et al.
Published: (2024)
by: Moraffah, Raha, et al.
Published: (2024)
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
by: Wang, Yifei, et al.
Published: (2024)
by: Wang, Yifei, et al.
Published: (2024)
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2025)
by: Liang, Buyun, et al.
Published: (2025)
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2026)
by: Liang, Buyun, et al.
Published: (2026)
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)
by: Cao, Bochuan, et al.
Published: (2023)
Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening
by: Zhang, Mohan, et al.
Published: (2026)
by: Zhang, Mohan, et al.
Published: (2026)
Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis
by: Zhang, Haoyu, et al.
Published: (2026)
by: Zhang, Haoyu, et al.
Published: (2026)
Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization
by: Tang, Haochun, et al.
Published: (2026)
by: Tang, Haochun, et al.
Published: (2026)
BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
by: Wang, Xinyuan, et al.
Published: (2024)
by: Wang, Xinyuan, et al.
Published: (2024)
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
Effective and Efficient Jailbreaks of Black-Box LLMs with Cross-Behavior Attacks
by: Gohil, Vasudev
Published: (2025)
by: Gohil, Vasudev
Published: (2025)
Membership Inference Attacks on LLM-based Recommender Systems
by: He, Jiajie, et al.
Published: (2025)
by: He, Jiajie, et al.
Published: (2025)
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
by: Liu, Yupei, et al.
Published: (2023)
by: Liu, Yupei, et al.
Published: (2023)
PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)
by: Mancera, Gonzalo, et al.
Published: (2025)
by: Mancera, Gonzalo, et al.
Published: (2025)
BinaryShield: Cross-Service Threat Intelligence in LLM Services using Privacy-Preserving Fingerprints
by: Gill, Waris, et al.
Published: (2025)
by: Gill, Waris, et al.
Published: (2025)
The Resurgence of GCG Adversarial Attacks on Large Language Models
by: Tan, Yuting, et al.
Published: (2025)
by: Tan, Yuting, et al.
Published: (2025)
SurvAttack: Black-Box Attack On Survival Models through Ontology-Informed EHR Perturbation
by: Kerdabadi, Mohsen Nayebi, et al.
Published: (2024)
by: Kerdabadi, Mohsen Nayebi, et al.
Published: (2024)
SoK: Pitfalls in Evaluating Black-Box Attacks
by: Suya, Fnu, et al.
Published: (2023)
by: Suya, Fnu, et al.
Published: (2023)
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
by: Baumgärtner, Tim, et al.
Published: (2024)
by: Baumgärtner, Tim, et al.
Published: (2024)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)
by: Shao, Zedian, et al.
Published: (2024)
HSF: Defending against Jailbreak Attacks with Hidden State Filtering
by: Qian, Cheng, et al.
Published: (2024)
by: Qian, Cheng, et al.
Published: (2024)
Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking
by: Fang, Zhicheng, et al.
Published: (2026)
by: Fang, Zhicheng, et al.
Published: (2026)
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026)
by: Struppek, Lukas, et al.
Published: (2026)
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)
by: Vega, Jason, et al.
Published: (2023)
On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)
by: Turbal, Bohdan, et al.
Published: (2024)
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
by: Poppi, Samuele, et al.
Published: (2024)
by: Poppi, Samuele, et al.
Published: (2024)
Adversarial Attacks on Parts of Speech: An Empirical Study in Text-to-Image Generation
by: Shahariar, G M, et al.
Published: (2024)
by: Shahariar, G M, et al.
Published: (2024)
On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks
by: Liu, Zesen, et al.
Published: (2024)
by: Liu, Zesen, et al.
Published: (2024)
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
by: Liao, Zeyi, et al.
Published: (2024)
by: Liao, Zeyi, et al.
Published: (2024)
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
by: Yi, Sibo, et al.
Published: (2024)
by: Yi, Sibo, et al.
Published: (2024)
Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks
by: Cheng, Yixin, et al.
Published: (2025)
by: Cheng, Yixin, et al.
Published: (2025)
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
by: Chen, Taiye, et al.
Published: (2025)
by: Chen, Taiye, et al.
Published: (2025)
Attack and defense techniques in large language models: A survey and new perspectives
by: Liao, Zhiyu, et al.
Published: (2025)
by: Liao, Zhiyu, et al.
Published: (2025)
MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
by: Jiang, Weisen, et al.
Published: (2025)
by: Jiang, Weisen, et al.
Published: (2025)
Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses
by: Yang, Xiaoxue, et al.
Published: (2025)
by: Yang, Xiaoxue, et al.
Published: (2025)
Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
by: Chen, Zhuo, et al.
Published: (2024)
by: Chen, Zhuo, et al.
Published: (2024)
Similar Items
-
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
by: Mehrotra, Anay, et al.
Published: (2023) -
PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024) -
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024) -
Exploiting Class Probabilities for Black-box Sentence-level Attacks
by: Moraffah, Raha, et al.
Published: (2024) -
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
by: Wang, Yifei, et al.
Published: (2024)