Saved in:
| Main Author: | Lemkin, Benjamin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.04769 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)
by: Vega, Jason, et al.
Published: (2023)
Low-Resource Languages Jailbreak GPT-4
by: Yong, Zheng-Xin, et al.
Published: (2023)
by: Yong, Zheng-Xin, et al.
Published: (2023)
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025)
by: Ahmed, Mohamed, et al.
Published: (2025)
LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
by: Spracklen, Joseph, et al.
Published: (2026)
by: Spracklen, Joseph, et al.
Published: (2026)
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2025)
by: Liang, Buyun, et al.
Published: (2025)
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2026)
by: Liang, Buyun, et al.
Published: (2026)
HSF: Defending against Jailbreak Attacks with Hidden State Filtering
by: Qian, Cheng, et al.
Published: (2024)
by: Qian, Cheng, et al.
Published: (2024)
Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
Exploiting Novel GPT-4 APIs
by: Pelrine, Kellin, et al.
Published: (2023)
by: Pelrine, Kellin, et al.
Published: (2023)
IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems
by: Wu, Yuhao, et al.
Published: (2024)
by: Wu, Yuhao, et al.
Published: (2024)
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
by: Iqbal, Umar, et al.
Published: (2023)
by: Iqbal, Umar, et al.
Published: (2023)
A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem
by: Ogundoyin, Sunday Oyinlola, et al.
Published: (2025)
by: Ogundoyin, Sunday Oyinlola, et al.
Published: (2025)
Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models
by: Pelofske, Elijah, et al.
Published: (2024)
by: Pelofske, Elijah, et al.
Published: (2024)
PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline
by: Rao, Akshaj Prashanth, et al.
Published: (2025)
by: Rao, Akshaj Prashanth, et al.
Published: (2025)
LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis
by: Alhazbi, Saeif, et al.
Published: (2025)
by: Alhazbi, Saeif, et al.
Published: (2025)
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
by: Wahréus, Johan, et al.
Published: (2025)
by: Wahréus, Johan, et al.
Published: (2025)
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
by: Benjamin, Victoria, et al.
Published: (2024)
by: Benjamin, Victoria, et al.
Published: (2024)
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
by: Ramesh, Govind, et al.
Published: (2024)
by: Ramesh, Govind, et al.
Published: (2024)
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
by: Zhu, Sicheng, et al.
Published: (2024)
by: Zhu, Sicheng, et al.
Published: (2024)
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
by: Baumgärtner, Tim, et al.
Published: (2024)
by: Baumgärtner, Tim, et al.
Published: (2024)
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
by: Halawi, Danny, et al.
Published: (2024)
by: Halawi, Danny, et al.
Published: (2024)
TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024)
by: Gubri, Martin, et al.
Published: (2024)
Stealing User Prompts from Mixture of Experts
by: Yona, Itay, et al.
Published: (2024)
by: Yona, Itay, et al.
Published: (2024)
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
by: Mo, Yichuan, et al.
Published: (2024)
by: Mo, Yichuan, et al.
Published: (2024)
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Detecting Training Data of Large Language Models via Expectation Maximization
by: Kim, Gyuwan, et al.
Published: (2024)
by: Kim, Gyuwan, et al.
Published: (2024)
Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)
by: Yao, Jin, et al.
Published: (2024)
Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
by: Miranda, Michele, et al.
Published: (2024)
by: Miranda, Michele, et al.
Published: (2024)
DP-MemArc: Differential Privacy Transfer Learning for Memory Efficient Language Models
by: Liu, Yanming, et al.
Published: (2024)
by: Liu, Yanming, et al.
Published: (2024)
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
by: Poppi, Samuele, et al.
Published: (2024)
by: Poppi, Samuele, et al.
Published: (2024)
Rethinking How to Evaluate Language Model Jailbreak
by: Cai, Hongyu, et al.
Published: (2024)
by: Cai, Hongyu, et al.
Published: (2024)
JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)
by: Chu, Junjie, et al.
Published: (2024)
Textual Unlearning Gives a False Sense of Unlearning
by: Du, Jiacheng, et al.
Published: (2024)
by: Du, Jiacheng, et al.
Published: (2024)
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
by: Wu, Tong, et al.
Published: (2024)
by: Wu, Tong, et al.
Published: (2024)
SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains
by: Saiem, Bijoy Ahmed, et al.
Published: (2024)
by: Saiem, Bijoy Ahmed, et al.
Published: (2024)
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
by: Lin, Shi, et al.
Published: (2024)
by: Lin, Shi, et al.
Published: (2024)
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)
by: Shao, Zedian, et al.
Published: (2024)
SVIP: Towards Verifiable Inference of Open-source Large Language Models
by: Sun, Yifan, et al.
Published: (2024)
by: Sun, Yifan, et al.
Published: (2024)
Large Language Models in Cybersecurity: State-of-the-Art
by: Motlagh, Farzad Nourmohammadzadeh, et al.
Published: (2024)
by: Motlagh, Farzad Nourmohammadzadeh, et al.
Published: (2024)
Similar Items
-
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023) -
Low-Resource Languages Jailbreak GPT-4
by: Yong, Zheng-Xin, et al.
Published: (2023) -
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
by: Huang, Tiansheng, et al.
Published: (2025) -
Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025) -
LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
by: Spracklen, Joseph, et al.
Published: (2026)