:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ghosh, Shatarupa, Rusert, Jonathan
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security Computation and Language
Online Access:	https://arxiv.org/abs/2412.10617
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Black-Box Guardrail Reverse-engineering Attack
by: Yao, Hongwei, et al.
Published: (2025)

Graph of Attacks: Improved Black-Box and Interpretable Jailbreaks for LLMs
by: Akbar-Tajari, Mohammad, et al.
Published: (2025)

Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
by: Ganesan, Gokul
Published: (2025)

FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks
by: Chen, Bocheng, et al.
Published: (2024)

Effective and Efficient Jailbreaks of Black-Box LLMs with Cross-Behavior Attacks
by: Gohil, Vasudev
Published: (2025)

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
by: Mehrotra, Anay, et al.
Published: (2023)

Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
by: Chen, Zhuo, et al.
Published: (2024)

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
by: Zhang, Chiyu, et al.
Published: (2025)

PAL: Proxy-Guided Black-Box Attack on Large Language Models
by: Sitawarin, Chawin, et al.
Published: (2024)

"Moralized" Multi-Step Jailbreak Prompts: Black-Box Testing of Guardrails in Large Language Models for Verbal Attacks
by: Wang, Libo
Published: (2024)

EvoDefense: Co-Evolving Black-Box Defense with Large Language Models
by: Li, Yu, et al.
Published: (2026)

PSM: Prompt Sensitivity Minimization via LLM-Guided Black-Box Optimization
by: Jawad, Huseein, et al.
Published: (2025)

AlienLM: Alienization of Language for API-Boundary Privacy in Black-Box LLMs
by: Kim, Jaehee, et al.
Published: (2026)

Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach
by: Zhang, Chong, et al.
Published: (2025)

Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
by: Zhu, Xiaoyuan, et al.
Published: (2025)

A Watermark for Black-Box Language Models
by: Bahri, Dara, et al.
Published: (2024)

RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection
by: Zhu, He, et al.
Published: (2026)

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework
by: Wang, Zhuoshang, et al.
Published: (2026)

Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search
by: Moss, Robert J.
Published: (2024)

Auto-Tuning Safety Guardrails for Black-Box Large Language Models
by: Abdulkadir, Perry
Published: (2025)

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models
by: Chowdhury, Arijit Ghosh, et al.
Published: (2024)

TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking
by: Yoon, Sung-Hoon, et al.
Published: (2026)

BadApex: Backdoor Attack Based on Adaptive Optimization Mechanism of Black-box Large Language Models
by: Wu, Zhengxian, et al.
Published: (2025)

Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection
by: Rahman, Md Abdur, et al.
Published: (2024)

Black-Box Adversarial Attacks on LLM-Based Code Completion
by: Jenko, Slobodan, et al.
Published: (2024)

Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models
by: Huang, Chung-ju, et al.
Published: (2026)

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024)

Quantifying the Risk of Transferred Black Box Attacks
by: Cox, Disesdi Susanna, et al.
Published: (2025)

Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
by: Na, CheolWon, et al.
Published: (2025)

Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods
by: Dey, Roopkatha, et al.
Published: (2024)

Membership Inference Attacks Against In-Context Learning
by: Wen, Rui, et al.
Published: (2024)

Task-Agnostic Detector for Insertion-Based Backdoor Attacks
by: Lyu, Weimin, et al.
Published: (2024)

Security Attacks on LLM-based Code Completion Tools
by: Cheng, Wen, et al.
Published: (2024)

Prompt Stealing Attacks Against Large Language Models
by: Sha, Zeyang, et al.
Published: (2024)

Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings
by: Zhang, Yuanhe, et al.
Published: (2024)

A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
by: Wu, Yihan, et al.
Published: (2023)

Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA)
by: Aqrawi, Alan, et al.
Published: (2024)

Data Extraction Attacks in Retrieval-Augmented Generation via Backdoors
by: Peng, Yuefeng, et al.
Published: (2024)

Denial-of-Service Poisoning Attacks against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2024)

Privacy in Large Language Models: Attacks, Defenses and Future Directions
by: Li, Haoran, et al.
Published: (2023)