:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Jenny, Maël, Dentan, Jérémie, Vanier, Sonia, Krajecki, Michaël
Format:	Preprint
Publié:	2026
Sujets:	Cryptography and Security
Accès en ligne:	https://arxiv.org/abs/2603.14278
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

Predicting memorization within Large Language Models fine-tuned for classification
par: Dentan, Jérémie, et autres
Publié: (2024)

Reconstructing training data from document understanding models
par: Dentan, Jérémie, et autres
Publié: (2024)

RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs
par: Chen, Xuan, et autres
Publié: (2024)

Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs
par: Dentan, Jérémie, et autres
Publié: (2025)

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
par: Jaiswal, Piyush, et autres
Publié: (2026)

Enhancing Jailbreak Attacks on LLMs via Persona Prompts
par: Zhang, Zheng, et autres
Publié: (2025)

PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
par: Nakka, Krishna Kanth, et autres
Publié: (2025)

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks
par: Xiong, Chen, et autres
Publié: (2024)

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
par: Zhang, Chiyu, et autres
Publié: (2025)

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
par: Chen, Yunhao, et autres
Publié: (2025)

Touch to Pair: Secure and Usable IoT Pairing without Information Loss
par: Wu, Chuxiong, et autres
Publié: (2024)

PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs
par: Wang, Xinkai, et autres
Publié: (2025)

One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
par: Li, Linbao, et autres
Publié: (2025)

JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
par: Li, Hongyi, et autres
Publié: (2024)

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis
par: Xie, Yueqi, et autres
Publié: (2024)

Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
par: Li, Xiang, et autres
Publié: (2025)

White-box Membership Inference Attacks against Diffusion Models
par: Pang, Yan, et autres
Publié: (2023)

Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
par: Jeong, Joonhyun, et autres
Publié: (2025)

Involuntary Jailbreak: On Self-Prompting Attacks
par: Guo, Yangyang, et autres
Publié: (2025)

The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection
par: Yang, Yi, et autres
Publié: (2024)

Jailbreaking LLMs & VLMs: Mechanisms, Evaluation, and Unified Defense
par: Chen, Zejian, et autres
Publié: (2026)

Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
par: Pathade, Chetan
Publié: (2025)

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
par: Kirch, Nathalie, et autres
Publié: (2024)

MacPrompt: Maraconic-guided Jailbreak against Text-to-Image Models
par: Ye, Xi, et autres
Publié: (2026)

Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
par: Hu, Xiaomeng, et autres
Publié: (2024)

TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
par: Wang, Yanting, et autres
Publié: (2025)

Ellipsoid Control: A White-list Jailbreak Defense via Benign Latent Modeling
par: Chen, Luoyu, et autres
Publié: (2026)

Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning
par: Ostermann, Simon, et autres
Publié: (2024)

Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security
par: Kim, Hajun, et autres
Publié: (2025)

Mitigating Jailbreaks with Intent-Aware LLMs
par: Yeo, Wei Jie, et autres
Publié: (2025)

The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
par: Liu, Mingrui, et autres
Publié: (2025)

Ariadne: a Privacy-Preserving Communication Protocol
par: Fressancourt, Antoine, et autres
Publié: (2024)

Defending Jailbreak Prompts via In-Context Adversarial Game
par: Zhou, Yujun, et autres
Publié: (2024)

LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs
par: Jha, Piyush, et autres
Publié: (2024)

FlipAttack: Jailbreak LLMs via Flipping
par: Liu, Yue, et autres
Publié: (2024)

Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
par: Xie, Zhixin, et autres
Publié: (2025)

"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
par: Shen, Xinyue, et autres
Publié: (2023)

Investigating the Effect of Misalignment on Membership Privacy in the White-box Setting
par: Cretu, Ana-Maria, et autres
Publié: (2023)

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation
par: Oh, Giyeong, et autres
Publié: (2025)

Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
par: Wang, Zhaoqi, et autres
Publié: (2025)