Enregistré dans:
| Auteurs principaux: | Sun, Zehan, Chen, Dingfan, Li, Songze |
|---|---|
| Format: | Preprint |
| Publié: |
2026
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2605.17288 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems
par: Banerjee, Sarbartha, et autres
Publié: (2026)
par: Banerjee, Sarbartha, et autres
Publié: (2026)
When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
par: Chen, Shenyang, et autres
Publié: (2026)
par: Chen, Shenyang, et autres
Publié: (2026)
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
par: Li, Xiaohu, et autres
Publié: (2025)
par: Li, Xiaohu, et autres
Publié: (2025)
Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers
par: Wei, Jiali, et autres
Publié: (2026)
par: Wei, Jiali, et autres
Publié: (2026)
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
par: Xin, Yuan, et autres
Publié: (2025)
par: Xin, Yuan, et autres
Publié: (2025)
Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
par: Xu, Rongwu, et autres
Publié: (2024)
par: Xu, Rongwu, et autres
Publié: (2024)
Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs
par: Wang, Yifei, et autres
Publié: (2026)
par: Wang, Yifei, et autres
Publié: (2026)
When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output
par: Zhang, Shuoming, et autres
Publié: (2025)
par: Zhang, Shuoming, et autres
Publié: (2025)
Scam Shield: Multi-Model Voting and Fine-Tuned LLMs Against Adversarial Attacks
par: Chang, Chen-Wei, et autres
Publié: (2025)
par: Chang, Chen-Wei, et autres
Publié: (2025)
Backdoor Attack with Invisible Triggers Based on Model Architecture Modification
par: Ma, Yuan, et autres
Publié: (2024)
par: Ma, Yuan, et autres
Publié: (2024)
CL-Attack: Textual Backdoor Attacks via Cross-Lingual Triggers
par: Zheng, Jingyi, et autres
Publié: (2024)
par: Zheng, Jingyi, et autres
Publié: (2024)
DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification
par: Kang, Mintong, et autres
Publié: (2023)
par: Kang, Mintong, et autres
Publié: (2023)
ACIArena: Toward Unified Evaluation for Agent Cascading Injection
par: An, Hengyu, et autres
Publié: (2026)
par: An, Hengyu, et autres
Publié: (2026)
Persistent Backdoor Attacks under Continual Fine-Tuning of LLMs
par: Cui, Jing, et autres
Publié: (2025)
par: Cui, Jing, et autres
Publié: (2025)
Adversarial Tuning: Defending Against Jailbreak Attacks for LLMs
par: Liu, Fan, et autres
Publié: (2024)
par: Liu, Fan, et autres
Publié: (2024)
Revisiting Training-Inference Trigger Intensity in Backdoor Attacks
par: Lin, Chenhao, et autres
Publié: (2025)
par: Lin, Chenhao, et autres
Publié: (2025)
Invisible Textual Backdoor Attacks based on Dual-Trigger
par: Hou, Yang, et autres
Publié: (2024)
par: Hou, Yang, et autres
Publié: (2024)
DDSA: Dual-Domain Strategic Attack for Spatial-Temporal Efficiency in Adversarial Robustness Testing
par: Hu, Jinwei, et autres
Publié: (2026)
par: Hu, Jinwei, et autres
Publié: (2026)
Enhancing Adversarial Resistance in LLMs with Recursion
par: Li, Bryan, et autres
Publié: (2024)
par: Li, Bryan, et autres
Publié: (2024)
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
par: Ma, Jiachen, et autres
Publié: (2024)
par: Ma, Jiachen, et autres
Publié: (2024)
Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
par: Fastowski, Alina, et autres
Publié: (2025)
par: Fastowski, Alina, et autres
Publié: (2025)
ME: Trigger Element Combination Backdoor Attack on Copyright Infringement
par: Yang, Feiyu, et autres
Publié: (2025)
par: Yang, Feiyu, et autres
Publié: (2025)
CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems
par: Turgut, İpek Abasıkeleş, et autres
Publié: (2026)
par: Turgut, İpek Abasıkeleş, et autres
Publié: (2026)
Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection
par: Fok, Jan Lum, et autres
Publié: (2025)
par: Fok, Jan Lum, et autres
Publié: (2025)
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
par: Liu, Shuaitong, et autres
Publié: (2025)
par: Liu, Shuaitong, et autres
Publié: (2025)
AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding
par: Ma, Haokai, et autres
Publié: (2025)
par: Ma, Haokai, et autres
Publié: (2025)
Re-Triggering Safeguards within LLMs for Jailbreak Detection
par: Lin, Zheng, et autres
Publié: (2026)
par: Lin, Zheng, et autres
Publié: (2026)
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
par: Panfilov, Alexander, et autres
Publié: (2026)
par: Panfilov, Alexander, et autres
Publié: (2026)
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
par: Yueh-Han, Chen, et autres
Publié: (2025)
par: Yueh-Han, Chen, et autres
Publié: (2025)
Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
par: Chen, Mingjie, et autres
Publié: (2025)
par: Chen, Mingjie, et autres
Publié: (2025)
Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models
par: Xue, Xiaoyu, et autres
Publié: (2025)
par: Xue, Xiaoyu, et autres
Publié: (2025)
Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems
par: Chang, Jung-Woo, et autres
Publié: (2023)
par: Chang, Jung-Woo, et autres
Publié: (2023)
A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers
par: Wu, Zhixiao, et autres
Publié: (2025)
par: Wu, Zhixiao, et autres
Publié: (2025)
Integrated Simulation Framework for Adversarial Attacks on Autonomous Vehicles
par: Anagnostopoulos, Christos, et autres
Publié: (2025)
par: Anagnostopoulos, Christos, et autres
Publié: (2025)
Adversarial Machine Learning: Attacks, Defenses, and Open Challenges
par: Jha, Pranav K
Publié: (2025)
par: Jha, Pranav K
Publié: (2025)
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
par: Luo, Mingyu, et autres
Publié: (2026)
par: Luo, Mingyu, et autres
Publié: (2026)
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
par: Zhang, Wentao, et autres
Publié: (2026)
par: Zhang, Wentao, et autres
Publié: (2026)
HauntAttack: When Attack Follows Reasoning as a Shadow
par: Ma, Jingyuan, et autres
Publié: (2025)
par: Ma, Jingyuan, et autres
Publié: (2025)
An Attack Method for Medical Insurance Claim Fraud Detection based on Generative Adversarial Network
par: Pang, Yining, et autres
Publié: (2025)
par: Pang, Yining, et autres
Publié: (2025)
Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents
par: Ding, Renhua, et autres
Publié: (2025)
par: Ding, Renhua, et autres
Publié: (2025)
Documents similaires
-
Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems
par: Banerjee, Sarbartha, et autres
Publié: (2026) -
When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
par: Chen, Shenyang, et autres
Publié: (2026) -
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
par: Li, Xiaohu, et autres
Publié: (2025) -
Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers
par: Wei, Jiali, et autres
Publié: (2026) -
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race?
par: Xin, Yuan, et autres
Publié: (2025)