Saved in:
| Main Authors: | Lu, Yiyang, He, Jinwen, Zhao, Yue, Chen, Kai, Liang, Ruigang, Hong, Cheng, Zhang, Yingjun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.14340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
by: He, Jinwen, et al.
Published: (2025)
by: He, Jinwen, et al.
Published: (2025)
Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
by: Coalson, Zachary, et al.
Published: (2026)
by: Coalson, Zachary, et al.
Published: (2026)
MEA-Defender: A Robust Watermark against Model Extraction Attack
by: Lv, Peizhuo, et al.
Published: (2024)
by: Lv, Peizhuo, et al.
Published: (2024)
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
by: Sivapiromrat, Sanhanat, et al.
Published: (2025)
by: Sivapiromrat, Sanhanat, et al.
Published: (2025)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
by: Reddy, Aashray, et al.
Published: (2025)
by: Reddy, Aashray, et al.
Published: (2025)
Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
by: Li, Songze, et al.
Published: (2026)
by: Li, Songze, et al.
Published: (2026)
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks
by: Li, Yige, et al.
Published: (2024)
by: Li, Yige, et al.
Published: (2024)
Hardware-Triggered Backdoors
by: Möller, Jonas, et al.
Published: (2026)
by: Möller, Jonas, et al.
Published: (2026)
ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs
by: Yan, Lu, et al.
Published: (2024)
by: Yan, Lu, et al.
Published: (2024)
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
by: Price, Sara, et al.
Published: (2024)
by: Price, Sara, et al.
Published: (2024)
STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling
by: Wang, Kun, et al.
Published: (2026)
by: Wang, Kun, et al.
Published: (2026)
A Channel-Triggered Backdoor Attack on Wireless Semantic Image Reconstruction
by: Wan, Jialin, et al.
Published: (2025)
by: Wan, Jialin, et al.
Published: (2025)
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
by: Tong, Haibo, et al.
Published: (2025)
by: Tong, Haibo, et al.
Published: (2025)
Turning Federated Learning Systems Into Covert Channels
by: Costa, Gabriele, et al.
Published: (2021)
by: Costa, Gabriele, et al.
Published: (2021)
Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs
by: Wang, Yifei, et al.
Published: (2026)
by: Wang, Yifei, et al.
Published: (2026)
Hidden Ads: Behavior Triggered Semantic Backdoors for Advertisement Injection in Vision Language Models
by: Yao, Duanyi, et al.
Published: (2026)
by: Yao, Duanyi, et al.
Published: (2026)
CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer
by: Zhao, Yue, et al.
Published: (2026)
by: Zhao, Yue, et al.
Published: (2026)
A Practical Trigger-Free Backdoor Attack on Neural Networks
by: Wang, Jiahao, et al.
Published: (2024)
by: Wang, Jiahao, et al.
Published: (2024)
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
by: Wahed, Muntasir, et al.
Published: (2025)
by: Wahed, Muntasir, et al.
Published: (2025)
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
by: Reddy, Aashray, et al.
Published: (2025)
by: Reddy, Aashray, et al.
Published: (2025)
Stealthy and Adjustable Text-Guided Backdoor Attacks on Multimodal Pretrained Models
by: Zhang, Yiyang, et al.
Published: (2026)
by: Zhang, Yiyang, et al.
Published: (2026)
Backdoors in DRL: Four Environments Focusing on In-distribution Triggers
by: Ashcraft, Chace, et al.
Published: (2025)
by: Ashcraft, Chace, et al.
Published: (2025)
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)
by: Li, Nathaniel, et al.
Published: (2024)
Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning
by: Liu, Yige, et al.
Published: (2026)
by: Liu, Yige, et al.
Published: (2026)
Turning Black Box into White Box: Dataset Distillation Leaks
by: Chen, Huajie, et al.
Published: (2026)
by: Chen, Huajie, et al.
Published: (2026)
I Don't Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors
by: Lin, Zijin, et al.
Published: (2024)
by: Lin, Zijin, et al.
Published: (2024)
Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers
by: Liu, Dongyi, et al.
Published: (2025)
by: Liu, Dongyi, et al.
Published: (2025)
FilterFL: Knowledge Filtering-based Data-Free Backdoor Defense for Federated Learning
by: Yang, Yanxin, et al.
Published: (2023)
by: Yang, Yanxin, et al.
Published: (2023)
Robust Backdoor Removal by Reconstructing Trigger-Activated Changes in Latent Representation
by: Iwahana, Kazuki, et al.
Published: (2025)
by: Iwahana, Kazuki, et al.
Published: (2025)
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
by: Yang, Xikang, et al.
Published: (2024)
by: Yang, Xikang, et al.
Published: (2024)
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents
by: Li, Xu, et al.
Published: (2026)
by: Li, Xu, et al.
Published: (2026)
Backdoor Contrastive Learning via Bi-level Trigger Optimization
by: Sun, Weiyu, et al.
Published: (2024)
by: Sun, Weiyu, et al.
Published: (2024)
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
by: Rahman, Salman, et al.
Published: (2025)
by: Rahman, Salman, et al.
Published: (2025)
WARDEN: Multi-Directional Backdoor Watermarks for Embedding-as-a-Service Copyright Protection
by: Shetty, Anudeex, et al.
Published: (2024)
by: Shetty, Anudeex, et al.
Published: (2024)
Securing Multi-turn Conversational Language Models From Distributed Backdoor Triggers
by: Tong, Terry, et al.
Published: (2024)
by: Tong, Terry, et al.
Published: (2024)
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
by: Jiang, Yifan, et al.
Published: (2024)
by: Jiang, Yifan, et al.
Published: (2024)
Instruction Backdoor Attacks Against Customized LLMs
by: Zhang, Rui, et al.
Published: (2024)
by: Zhang, Rui, et al.
Published: (2024)
Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation
by: De Muri, Giovanni, et al.
Published: (2025)
by: De Muri, Giovanni, et al.
Published: (2025)
Defending against Backdoor Attack on Deep Neural Networks
by: Cheng, Hao, et al.
Published: (2020)
by: Cheng, Hao, et al.
Published: (2020)
Krait: A Backdoor Attack Against Graph Prompt Tuning
by: Song, Ying, et al.
Published: (2024)
by: Song, Ying, et al.
Published: (2024)
Similar Items
-
PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
by: He, Jinwen, et al.
Published: (2025) -
Asking Forever: Universal Activations Behind Turn Amplification in Conversational LLMs
by: Coalson, Zachary, et al.
Published: (2026) -
MEA-Defender: A Robust Watermark against Model Extraction Attack
by: Lv, Peizhuo, et al.
Published: (2024) -
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
by: Sivapiromrat, Sanhanat, et al.
Published: (2025) -
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
by: Reddy, Aashray, et al.
Published: (2025)