Saved in:
Bibliographic Details
Main Authors: Lu, Yiyang, He, Jinwen, Zhao, Yue, Chen, Kai, Liang, Ruigang, Hong, Cheng, Zhang, Yingjun
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.14340
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. This creates a structure-conditioned reliability risk: a backdoored model can pass prompt-centric checks and standard utility evaluations, yet execute attacker-specified behaviors at selected dialogue positions without any trigger in the user input. Across four open-source LLM families, TST achieves a 99.52% average ASR while largely preserving non-triggered utility, and remains effective across unseen dialogue datasets and representative defenses. These results reveal dialogue structure as an overlooked attack surface and motivate structure-aware multi-turn auditing beyond prompt inspection.