:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Lemkin, Benjamin
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2403.04769
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
by: Vega, Jason, et al.
Published: (2023)

Low-Resource Languages Jailbreak GPT-4
by: Yong, Zheng-Xin, et al.
Published: (2023)

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation
by: Huang, Tiansheng, et al.
Published: (2025)

Advancing Jailbreak Strategies: A Hybrid Approach to Exploiting LLM Vulnerabilities and Bypassing Modern Defenses
by: Ahmed, Mohamed, et al.
Published: (2025)

LLM Ghostbusters: Surgical Hallucination Suppression via Adaptive Unlearning
by: Spracklen, Joseph, et al.
Published: (2026)

SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2025)

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
by: Liang, Buyun, et al.
Published: (2026)

HSF: Defending against Jailbreak Attacks with Hidden State Filtering
by: Qian, Cheng, et al.
Published: (2024)

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models
by: Chu, Junjie, et al.
Published: (2024)

Exploiting Novel GPT-4 APIs
by: Pelrine, Kellin, et al.
Published: (2023)

IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems
by: Wu, Yuhao, et al.
Published: (2024)

LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
by: Iqbal, Umar, et al.
Published: (2023)

A Large-Scale Empirical Analysis of Custom GPTs' Vulnerabilities in the OpenAI Ecosystem
by: Ogundoyin, Sunday Oyinlola, et al.
Published: (2025)

Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models
by: Pelofske, Elijah, et al.
Published: (2024)

PromptScreen: Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline
by: Rao, Akshaj Prashanth, et al.
Published: (2025)

LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis
by: Alhazbi, Saeif, et al.
Published: (2025)

Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
by: Wahréus, Johan, et al.
Published: (2025)

Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures
by: Benjamin, Victoria, et al.
Published: (2024)

GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation
by: Ramesh, Govind, et al.
Published: (2024)

AdvPrefix: An Objective for Nuanced LLM Jailbreaks
by: Zhu, Sicheng, et al.
Published: (2024)

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
by: Baumgärtner, Tim, et al.
Published: (2024)

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
by: Halawi, Danny, et al.
Published: (2024)

TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification
by: Gubri, Martin, et al.
Published: (2024)

Stealing User Prompts from Mixture of Experts
by: Yona, Itay, et al.
Published: (2024)

Fight Back Against Jailbreaking via Prompt Adversarial Tuning
by: Mo, Yichuan, et al.
Published: (2024)

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
by: Hu, Xiaomeng, et al.
Published: (2024)

Detecting Training Data of Large Language Models via Expectation Maximization
by: Kim, Gyuwan, et al.
Published: (2024)

Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions
by: Miranda, Michele, et al.
Published: (2024)

DP-MemArc: Differential Privacy Transfer Learning for Memory Efficient Language Models
by: Liu, Yanming, et al.
Published: (2024)

Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks
by: Poppi, Samuele, et al.
Published: (2024)

Rethinking How to Evaluate Language Model Jailbreak
by: Cai, Hongyu, et al.
Published: (2024)

JailbreakRadar: Comprehensive Assessment of Jailbreak Attacks Against LLMs
by: Chu, Junjie, et al.
Published: (2024)

Textual Unlearning Gives a False Sense of Unlearning
by: Du, Jiacheng, et al.
Published: (2024)

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
by: Wu, Tong, et al.
Published: (2024)

SequentialBreak: Large Language Models Can be Fooled by Embedding Jailbreak Prompts into Sequential Prompt Chains
by: Saiem, Bijoy Ahmed, et al.
Published: (2024)

LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
by: Lin, Shi, et al.
Published: (2024)

Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment
by: Shao, Zedian, et al.
Published: (2024)

SVIP: Towards Verifiable Inference of Open-source Large Language Models
by: Sun, Yifan, et al.
Published: (2024)

Large Language Models in Cybersecurity: State-of-the-Art
by: Motlagh, Farzad Nourmohammadzadeh, et al.
Published: (2024)