Saved in:
| Main Authors: | Zhang, Yunyi, Cui, Shibo, Liu, Baojun, Yu, Jingkai, Zhang, Min, Shi, Fan, Zheng, Han |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17874 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries
by: Hung, Ki Sen, et al.
Published: (2026)
by: Hung, Ki Sen, et al.
Published: (2026)
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense
by: Mai, Wuyuao, et al.
Published: (2025)
by: Mai, Wuyuao, et al.
Published: (2025)
Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation
by: Chen, Pei, et al.
Published: (2026)
by: Chen, Pei, et al.
Published: (2026)
Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
by: Wu, Yu-Hang, et al.
Published: (2025)
by: Wu, Yu-Hang, et al.
Published: (2025)
NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models
by: Zhang, Chuhan, et al.
Published: (2025)
by: Zhang, Chuhan, et al.
Published: (2025)
Sparse Autoencoders are Capable LLM Jailbreak Mitigators
by: Assogba, Yannick, et al.
Published: (2026)
by: Assogba, Yannick, et al.
Published: (2026)
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
by: Xiang, Shiyu, et al.
Published: (2025)
by: Xiang, Shiyu, et al.
Published: (2025)
Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning
by: Lyu, Xiaoting, et al.
Published: (2024)
by: Lyu, Xiaoting, et al.
Published: (2024)
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
by: Zhou, Yukai, et al.
Published: (2025)
by: Zhou, Yukai, et al.
Published: (2025)
Beyond Fixed and Dynamic Prompts: Embedded Jailbreak Templates for Advancing LLM Security
by: Kim, Hajun, et al.
Published: (2025)
by: Kim, Hajun, et al.
Published: (2025)
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
by: He, Zeqing, et al.
Published: (2024)
by: He, Zeqing, et al.
Published: (2024)
From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem
by: Mao, Yanxu, et al.
Published: (2025)
by: Mao, Yanxu, et al.
Published: (2025)
The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring
by: Hossain, Ismail, et al.
Published: (2026)
by: Hossain, Ismail, et al.
Published: (2026)
Evolving Skill-Structured Attack Memory Enhances LLM Jailbreaking
by: Zhang, Junke, et al.
Published: (2026)
by: Zhang, Junke, et al.
Published: (2026)
AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens
by: Lu, Lin, et al.
Published: (2024)
by: Lu, Lin, et al.
Published: (2024)
Beyond Jailbreaking: Auditing Contextual Privacy in LLM Agents
by: Das, Saswat, et al.
Published: (2025)
by: Das, Saswat, et al.
Published: (2025)
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
by: Guo, Weiyang, et al.
Published: (2026)
by: Guo, Weiyang, et al.
Published: (2026)
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
by: Zhao, Weixiang, et al.
Published: (2025)
by: Zhao, Weixiang, et al.
Published: (2025)
SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails
by: Mou, Zhiyi, et al.
Published: (2026)
by: Mou, Zhiyi, et al.
Published: (2026)
Proactive defense against LLM Jailbreak
by: Zhao, Weiliang, et al.
Published: (2025)
by: Zhao, Weiliang, et al.
Published: (2025)
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search
by: Chen, Xuan, et al.
Published: (2024)
by: Chen, Xuan, et al.
Published: (2024)
Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
by: Wang, Zhaoqi, et al.
Published: (2025)
by: Wang, Zhaoqi, et al.
Published: (2025)
Unveiling Privacy Risks in LLM Agent Memory
by: Wang, Bo, et al.
Published: (2025)
by: Wang, Bo, et al.
Published: (2025)
Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation
by: Zhang, Jiankun, et al.
Published: (2025)
by: Zhang, Jiankun, et al.
Published: (2025)
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
by: Li, Yu, et al.
Published: (2025)
by: Li, Yu, et al.
Published: (2025)
ASTRA: An Automated Framework for Strategy Discovery, Retrieval, and Evolution for Jailbreaking LLMs
by: Liu, Xu, et al.
Published: (2025)
by: Liu, Xu, et al.
Published: (2025)
LITMUS: Benchmarking Behavioral Jailbreaks of LLM Agents in Real OS Environments
by: Zhang, Chiyu, et al.
Published: (2026)
by: Zhang, Chiyu, et al.
Published: (2026)
Bleeding Pathways: Vanishing Discriminability in LLM Hidden States Fuels Jailbreak Attacks
by: Zhang, Yingjie, et al.
Published: (2025)
by: Zhang, Yingjie, et al.
Published: (2025)
Beyond Model Jailbreak: Systematic Dissection of the "Ten DeadlySins" in Embodied Intelligence
by: Huang, Yuhang, et al.
Published: (2025)
by: Huang, Yuhang, et al.
Published: (2025)
Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents
by: Chen, Jiahao, et al.
Published: (2026)
by: Chen, Jiahao, et al.
Published: (2026)
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
by: Deng, Gelei, et al.
Published: (2024)
by: Deng, Gelei, et al.
Published: (2024)
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
TPM2.0-Supported Runtime Customizable TEE on FPGA-SoC with User-Controllable vTPM
by: Mao, Jingkai, et al.
Published: (2025)
by: Mao, Jingkai, et al.
Published: (2025)
PDRIMA: A Policy-Driven Runtime Integrity Measurement and Attestation Approach for ARM TrustZone-based TEE
by: Mao, Jingkai, et al.
Published: (2025)
by: Mao, Jingkai, et al.
Published: (2025)
How Real is Your Jailbreak? Fine-grained Jailbreak Evaluation with Anchored Reference
by: Liu, Songyang, et al.
Published: (2026)
by: Liu, Songyang, et al.
Published: (2026)
FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models
by: Yao, Dongyu, et al.
Published: (2023)
by: Yao, Dongyu, et al.
Published: (2023)
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
by: Yu, Miao, et al.
Published: (2024)
by: Yu, Miao, et al.
Published: (2024)
The Human-Machine Identity Blur: A Unified Framework for Cybersecurity Risk Management in 2025
by: Janani, Kush
Published: (2025)
by: Janani, Kush
Published: (2025)
Geneshift: Impact of different scenario shift on Jailbreaking LLM
by: Wu, Tianyi, et al.
Published: (2025)
by: Wu, Tianyi, et al.
Published: (2025)
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
Similar Items
-
Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries
by: Hung, Ki Sen, et al.
Published: (2026) -
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense
by: Mai, Wuyuao, et al.
Published: (2025) -
Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation
by: Chen, Pei, et al.
Published: (2026) -
Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
by: Wu, Yu-Hang, et al.
Published: (2025) -
NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models
by: Zhang, Chuhan, et al.
Published: (2025)