Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lu, Yiyang, He, Jinwen, Zhao, Yue, Chen, Kai, Liang, Ruigang, Hong, Cheng, Zhang, Yingjun
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Machine Learning
Online Access:	https://arxiv.org/abs/2601.14340
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. This creates a structure-conditioned reliability risk: a backdoored model can pass prompt-centric checks and standard utility evaluations, yet execute attacker-specified behaviors at selected dialogue positions without any trigger in the user input. Across four open-source LLM families, TST achieves a 99.52% average ASR while largely preserving non-triggered utility, and remains effective across unseen dialogue datasets and representative defenses. These results reveal dialogue structure as an overlooked attack surface and motivate structure-aware multi-turn auditing beyond prompt inspection.

Similar Items