Saved in:
| Main Authors: | Ghosh, Shatarupa, Rusert, Jonathan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.10617 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Black-Box Guardrail Reverse-engineering Attack
by: Yao, Hongwei, et al.
Published: (2025)
by: Yao, Hongwei, et al.
Published: (2025)
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)
Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
by: Ganesan, Gokul
Published: (2025)
by: Ganesan, Gokul
Published: (2025)
FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks
by: Chen, Bocheng, et al.
Published: (2024)
by: Chen, Bocheng, et al.
Published: (2024)
Effective and Efficient Jailbreaks of Black-Box LLMs with Cross-Behavior Attacks
by: Gohil, Vasudev
Published: (2025)
by: Gohil, Vasudev
Published: (2025)
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
by: Mehrotra, Anay, et al.
Published: (2023)
by: Mehrotra, Anay, et al.
Published: (2023)
Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
by: Chen, Zhuo, et al.
Published: (2024)
by: Chen, Zhuo, et al.
Published: (2024)
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)
by: Zhang, Chiyu, et al.
Published: (2025)
PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024)
by: Sitawarin, Chawin, et al.
Published: (2024)
"Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks
by: Wang, Libo
Published: (2024)
by: Wang, Libo
Published: (2024)
EvoDefense: Co-Evolving Black-Box Defense with Large Language Models
by: Li, Yu, et al.
Published: (2026)
by: Li, Yu, et al.
Published: (2026)
PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization
by: Jawad, Huseein, et al.
Published: (2025)
by: Jawad, Huseein, et al.
Published: (2025)
AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs
by: Kim, Jaehee, et al.
Published: (2026)
by: Kim, Jaehee, et al.
Published: (2026)
Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach
by: Zhang, Chong, et al.
Published: (2025)
by: Zhang, Chong, et al.
Published: (2025)
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
by: Zhu, Xiaoyuan, et al.
Published: (2025)
by: Zhu, Xiaoyuan, et al.
Published: (2025)
A Watermark for Black-Box Language Models
by: Bahri, Dara, et al.
Published: (2024)
by: Bahri, Dara, et al.
Published: (2024)
RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection
by: Zhu, He, et al.
Published: (2026)
by: Zhu, He, et al.
Published: (2026)
Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework
by: Wang, Zhuoshang, et al.
Published: (2026)
by: Wang, Zhuoshang, et al.
Published: (2026)
Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search
by: Moss, Robert J.
Published: (2024)
by: Moss, Robert J.
Published: (2024)
Auto-Tuning Safety Guardrails for Black-Box Large Language Models
by: Abdulkadir, Perry
Published: (2025)
by: Abdulkadir, Perry
Published: (2025)
Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
by: Chowdhury, Arijit Ghosh, et al.
Published: (2024)
by: Chowdhury, Arijit Ghosh, et al.
Published: (2024)
TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking
by: Yoon, Sung-Hoon, et al.
Published: (2026)
by: Yoon, Sung-Hoon, et al.
Published: (2026)
BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models
by: Wu, Zhengxian, et al.
Published: (2025)
by: Wu, Zhengxian, et al.
Published: (2025)
Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection
by: Rahman, Md Abdur, et al.
Published: (2024)
by: Rahman, Md Abdur, et al.
Published: (2024)
Black-Box Adversarial Attacks on LLM-Based Code Completion
by: Jenko, Slobodan, et al.
Published: (2024)
by: Jenko, Slobodan, et al.
Published: (2024)
Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models
by: Huang, Chung-ju, et al.
Published: (2026)
by: Huang, Chung-ju, et al.
Published: (2026)
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024)
by: Gubri, Martin, et al.
Published: (2024)
Quantifying the Risk of Transferred Black Box Attacks
by: Cox, Disesdi Susanna, et al.
Published: (2025)
by: Cox, Disesdi Susanna, et al.
Published: (2025)
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
by: Na, CheolWon, et al.
Published: (2025)
by: Na, CheolWon, et al.
Published: (2025)
Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods
by: Dey, Roopkatha, et al.
Published: (2024)
by: Dey, Roopkatha, et al.
Published: (2024)
Membership Inference Attacks Against In-Context Learning
by: Wen, Rui, et al.
Published: (2024)
by: Wen, Rui, et al.
Published: (2024)
Task-Agnostic Detector for Insertion-Based Backdoor Attacks
by: Lyu, Weimin, et al.
Published: (2024)
by: Lyu, Weimin, et al.
Published: (2024)
Security Attacks on LLM-based Code Completion Tools
by: Cheng, Wen, et al.
Published: (2024)
by: Cheng, Wen, et al.
Published: (2024)
Prompt Stealing Attacks Against Large Language Models
by: Sha, Zeyang, et al.
Published: (2024)
by: Sha, Zeyang, et al.
Published: (2024)
Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings
by: Zhang, Yuanhe, et al.
Published: (2024)
by: Zhang, Yuanhe, et al.
Published: (2024)
A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
by: Wu, Yihan, et al.
Published: (2023)
by: Wu, Yihan, et al.
Published: (2023)
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
by: Aqrawi, Alan, et al.
Published: (2024)
by: Aqrawi, Alan, et al.
Published: (2024)
Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
by: Peng, Yuefeng, et al.
Published: (2024)
by: Peng, Yuefeng, et al.
Published: (2024)
Denial-of-Service Poisoning Attacks against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2024)
by: Gao, Kuofeng, et al.
Published: (2024)
Privacy in Large Language Models: Attacks, Defenses and Future Directions
by: Li, Haoran, et al.
Published: (2023)
by: Li, Haoran, et al.
Published: (2023)
Similar Items
-
Black-Box Guardrail Reverse-engineering Attack
by: Yao, Hongwei, et al.
Published: (2025) -
Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025) -
Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
by: Ganesan, Gokul
Published: (2025) -
FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks
by: Chen, Bocheng, et al.
Published: (2024) -
Effective and Efficient Jailbreaks of Black-Box LLMs with Cross-Behavior Attacks
by: Gohil, Vasudev
Published: (2025)