Enregistré dans:
| Auteurs principaux: | Rayhan, Naheed, Jahan, Sohely |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2604.21860 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
par: Nihal, Ragib Amin, et autres
Publié: (2025)
par: Nihal, Ragib Amin, et autres
Publié: (2025)
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
par: Gibbs, Tom, et autres
Publié: (2024)
par: Gibbs, Tom, et autres
Publié: (2024)
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
par: Narula, Sidhant, et autres
Publié: (2025)
par: Narula, Sidhant, et autres
Publié: (2025)
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
par: Li, Pengcheng, et autres
Publié: (2026)
par: Li, Pengcheng, et autres
Publié: (2026)
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
par: Sahoo, Devanshu, et autres
Publié: (2025)
par: Sahoo, Devanshu, et autres
Publié: (2025)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models
par: Ying, Zonghao, et autres
Publié: (2025)
par: Ying, Zonghao, et autres
Publié: (2025)
Tempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search
par: Zhou, Andy, et autres
Publié: (2025)
par: Zhou, Andy, et autres
Publié: (2025)
ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models
par: Owiredu-Ashley, Harry
Publié: (2026)
par: Owiredu-Ashley, Harry
Publié: (2026)
The Echo Chamber Multi-Turn LLM Jailbreak
par: Alobaid, Ahmad, et autres
Publié: (2026)
par: Alobaid, Ahmad, et autres
Publié: (2026)
A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks
par: Bullwinkel, Blake, et autres
Publié: (2025)
par: Bullwinkel, Blake, et autres
Publié: (2025)
ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?
par: Wang, Zhun, et autres
Publié: (2026)
par: Wang, Zhun, et autres
Publié: (2026)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
par: Reddy, Aashray, et autres
Publié: (2025)
par: Reddy, Aashray, et autres
Publié: (2025)
ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack
par: Lin, Xingwei, et autres
Publié: (2026)
par: Lin, Xingwei, et autres
Publié: (2026)
Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks
par: You, Doohee
Publié: (2026)
par: You, Doohee
Publié: (2026)
Turning Generative Models Degenerate: The Power of Data Poisoning Attacks
par: Jiang, Shuli, et autres
Publié: (2024)
par: Jiang, Shuli, et autres
Publié: (2024)
One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue
par: Shen, Xinjie, et autres
Publié: (2026)
par: Shen, Xinjie, et autres
Publié: (2026)
CSC: Turning the Adversary's Poison against Itself
par: Shi, Yuchen, et autres
Publié: (2026)
par: Shi, Yuchen, et autres
Publié: (2026)
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
par: Wu, ChenYu, et autres
Publié: (2025)
par: Wu, ChenYu, et autres
Publié: (2025)
Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection
par: Kulkarni, Prashant
Publié: (2026)
par: Kulkarni, Prashant
Publié: (2026)
MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks
par: Zhang, Xinkai, et autres
Publié: (2026)
par: Zhang, Xinkai, et autres
Publié: (2026)
Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
par: Russinovich, Mark, et autres
Publié: (2024)
par: Russinovich, Mark, et autres
Publié: (2024)
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
par: Asl, Javad Rafiei, et autres
Publié: (2025)
par: Asl, Javad Rafiei, et autres
Publié: (2025)
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
par: Tong, Haibo, et autres
Publié: (2025)
par: Tong, Haibo, et autres
Publié: (2025)
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
par: Ruzzetti, Elena Sofia, et autres
Publié: (2025)
par: Ruzzetti, Elena Sofia, et autres
Publié: (2025)
MARVEL: Multi-Agent RTL Vulnerability Extraction using Large Language Models
par: Collini, Luca, et autres
Publié: (2025)
par: Collini, Luca, et autres
Publié: (2025)
Beyond Single Bugs: Benchmarking Large Language Models for Multi-Vulnerability Detection
par: Pushkar, Chinmay, et autres
Publié: (2025)
par: Pushkar, Chinmay, et autres
Publié: (2025)
VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
par: Liu, Aofan, et autres
Publié: (2025)
par: Liu, Aofan, et autres
Publié: (2025)
Evaluation of Prompt Injection Defenses in Large Language Models
par: Deep, Priyal, et autres
Publié: (2026)
par: Deep, Priyal, et autres
Publié: (2026)
Atomicity for Agents: Exposing, Exploiting, and Mitigating TOCTOU Vulnerabilities in Browser-Use Agents
par: Jiang, Linxi, et autres
Publié: (2026)
par: Jiang, Linxi, et autres
Publié: (2026)
Evaluating Adversarial Vulnerabilities in Modern Large Language Models
par: Perel, Tom
Publié: (2025)
par: Perel, Tom
Publié: (2025)
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
par: Wahed, Muntasir, et autres
Publié: (2025)
par: Wahed, Muntasir, et autres
Publié: (2025)
Exploring Membership Inference Vulnerabilities in Clinical Large Language Models
par: Nemecek, Alexander, et autres
Publié: (2025)
par: Nemecek, Alexander, et autres
Publié: (2025)
Prompt Injection as an Emerging Threat: Evaluating the Resilience of Large Language Models
par: Ganiuly, Daniyal, et autres
Publié: (2025)
par: Ganiuly, Daniyal, et autres
Publié: (2025)
Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: An Empirical Study on Popular Open-Source Projects
par: Wang, Yuxuan, et autres
Publié: (2025)
par: Wang, Yuxuan, et autres
Publié: (2025)
Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures
par: Schwarz, Dominik
Publié: (2025)
par: Schwarz, Dominik
Publié: (2025)
How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition
par: Dziemian, Mateusz, et autres
Publié: (2026)
par: Dziemian, Mateusz, et autres
Publié: (2026)
AVIATOR: Towards AI-Agentic Vulnerability Injection Workflow for High-Fidelity, Large-Scale Code Security Dataset
par: Lbath, Amine, et autres
Publié: (2025)
par: Lbath, Amine, et autres
Publié: (2025)
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
par: Li, Xu, et autres
Publié: (2026)
par: Li, Xu, et autres
Publié: (2026)
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models
par: Le, Tan Khang, et autres
Publié: (2024)
par: Le, Tan Khang, et autres
Publié: (2024)
Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
par: Park, Junyoung, et autres
Publié: (2026)
par: Park, Junyoung, et autres
Publié: (2026)
Documents similaires
-
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
par: Nihal, Ragib Amin, et autres
Publié: (2025) -
Emerging Vulnerabilities in Frontier Models: Multi-Turn Jailbreak Attacks
par: Gibbs, Tom, et autres
Publié: (2024) -
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
par: Narula, Sidhant, et autres
Publié: (2025) -
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
par: Li, Pengcheng, et autres
Publié: (2026) -
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
par: Sahoo, Devanshu, et autres
Publié: (2025)