Enregistré dans:
| Auteurs principaux: | Guan, Shaowei, Zhai, Yu, Zhang, Zhengyu, Wang, Yanze, Kwok, Hin Chi |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2511.13771 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions
par: Guan, Shaowei, et autres
Publié: (2025)
par: Guan, Shaowei, et autres
Publié: (2025)
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
par: Liu, Shuaitong, et autres
Publié: (2025)
par: Liu, Shuaitong, et autres
Publié: (2025)
GuardReasoner: Towards Reasoning-based LLM Safeguards
par: Liu, Yue, et autres
Publié: (2025)
par: Liu, Yue, et autres
Publié: (2025)
Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts
par: Hasan, Md. Mehedi, et autres
Publié: (2025)
par: Hasan, Md. Mehedi, et autres
Publié: (2025)
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
par: Zolkowski, Artur, et autres
Publié: (2025)
par: Zolkowski, Artur, et autres
Publié: (2025)
A Method for Enhancing the Safety of Large Model Generation Based on Multi-dimensional Attack and Defense
par: Zhai, Keke
Publié: (2024)
par: Zhai, Keke
Publié: (2024)
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
par: Liu, Yue, et autres
Publié: (2025)
par: Liu, Yue, et autres
Publié: (2025)
CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models
par: Ren, Zhenzhen, et autres
Publié: (2025)
par: Ren, Zhenzhen, et autres
Publié: (2025)
Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought
par: Zeng, Mingfei, et autres
Publié: (2025)
par: Zeng, Mingfei, et autres
Publié: (2025)
FPT-Noise: Dynamic Scene-Aware Counterattack for Test-Time Adversarial Defense in Vision-Language Models
par: Deng, Jia, et autres
Publié: (2025)
par: Deng, Jia, et autres
Publié: (2025)
Imitation Game for Adversarial Disillusion with Chain-of-Thought Reasoning in Generative AI
par: Chang, Ching-Chun, et autres
Publié: (2025)
par: Chang, Ching-Chun, et autres
Publié: (2025)
From Thinking to Output: Chain-of-Thought and Text Generation Characteristics in Reasoning Language Models
par: Liu, Junhao, et autres
Publié: (2025)
par: Liu, Junhao, et autres
Publié: (2025)
Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)
par: Olatunji, Iyiola E., et autres
Publié: (2025)
par: Olatunji, Iyiola E., et autres
Publié: (2025)
MCP-Guard: A Multi-Stage Defense-in-Depth Framework for Securing Model Context Protocol in Agentic AI
par: Xing, Wenpeng, et autres
Publié: (2025)
par: Xing, Wenpeng, et autres
Publié: (2025)
Thought Purity: A Defense Framework For Chain-of-Thought Attack
par: Xue, Zihao, et autres
Publié: (2025)
par: Xue, Zihao, et autres
Publié: (2025)
Preemptive Answer "Attacks" on Chain-of-Thought Reasoning
par: Xu, Rongwu, et autres
Publié: (2024)
par: Xu, Rongwu, et autres
Publié: (2024)
Evaluation of Prompt Injection Defenses in Large Language Models
par: Deep, Priyal, et autres
Publié: (2026)
par: Deep, Priyal, et autres
Publié: (2026)
Mitigating the Structural Bias in Graph Adversarial Defenses
par: Fang, Junyuan, et autres
Publié: (2025)
par: Fang, Junyuan, et autres
Publié: (2025)
Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models
par: Truong, Vu Tuan, et autres
Publié: (2026)
par: Truong, Vu Tuan, et autres
Publié: (2026)
Adversarial Text Purification: A Large Language Model Approach for Defense
par: Moraffah, Raha, et autres
Publié: (2024)
par: Moraffah, Raha, et autres
Publié: (2024)
A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory
par: Wei, Qianshan, et autres
Publié: (2025)
par: Wei, Qianshan, et autres
Publié: (2025)
TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense
par: Liu, Cheng, et autres
Publié: (2026)
par: Liu, Cheng, et autres
Publié: (2026)
Recent Advances in Attack and Defense Approaches of Large Language Models
par: Cui, Jing, et autres
Publié: (2024)
par: Cui, Jing, et autres
Publié: (2024)
Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
par: Wang, Zefeng, et autres
Publié: (2024)
par: Wang, Zefeng, et autres
Publié: (2024)
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
par: MacDermott, Matt, et autres
Publié: (2025)
par: MacDermott, Matt, et autres
Publié: (2025)
RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse
par: Liu, Mingrui, et autres
Publié: (2026)
par: Liu, Mingrui, et autres
Publié: (2026)
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
par: Liu, Ziyao, et autres
Publié: (2024)
par: Liu, Ziyao, et autres
Publié: (2024)
DistillGuard: Evaluating Defenses Against LLM Knowledge Distillation
par: Jiang, Bo
Publié: (2026)
par: Jiang, Bo
Publié: (2026)
Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
par: Green, Tommaso, et autres
Publié: (2025)
par: Green, Tommaso, et autres
Publié: (2025)
DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing
par: Wang, Yi, et autres
Publié: (2025)
par: Wang, Yi, et autres
Publié: (2025)
Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs
par: Lu, Yu-An, et autres
Publié: (2026)
par: Lu, Yu-An, et autres
Publié: (2026)
SFCoT: Safer Chain-of-Thought via Active Safety Evaluation and Calibration
par: Pan, Yu, et autres
Publié: (2026)
par: Pan, Yu, et autres
Publié: (2026)
Hey GPT-OSS, Looks Like You Got It -- Now Walk Me Through It! An Assessment of the Reasoning Language Models Chain of Thought Mechanism for Digital Forensics
par: Michelet, Gaëtan, et autres
Publié: (2025)
par: Michelet, Gaëtan, et autres
Publié: (2025)
Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach
par: Sternak, Tvrtko, et autres
Publié: (2025)
par: Sternak, Tvrtko, et autres
Publié: (2025)
Chain-of-Thought Prompting of Large Language Models for Discovering and Fixing Software Vulnerabilities
par: Nong, Yu, et autres
Publié: (2024)
par: Nong, Yu, et autres
Publié: (2024)
R-CoT: A Reasoning-Layer Watermark via Redundant Chain-of-Thought in Large Language Models
par: Zhang, Ziming, et autres
Publié: (2026)
par: Zhang, Ziming, et autres
Publié: (2026)
Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework
par: Zhou, Jiling, et autres
Publié: (2026)
par: Zhou, Jiling, et autres
Publié: (2026)
PPMI: Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
par: Bae, Yubeen, et autres
Publié: (2025)
par: Bae, Yubeen, et autres
Publié: (2025)
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
par: Xu, Zihao, et autres
Publié: (2024)
par: Xu, Zihao, et autres
Publié: (2024)
Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor
par: Miah, Abdullah Arafat, et autres
Publié: (2024)
par: Miah, Abdullah Arafat, et autres
Publié: (2024)
Documents similaires
-
Privacy Challenges and Solutions in Retrieval-Augmented Generation-Enhanced LLMs for Healthcare Chatbots: A Review of Applications, Risks, and Future Directions
par: Guan, Shaowei, et autres
Publié: (2025) -
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
par: Liu, Shuaitong, et autres
Publié: (2025) -
GuardReasoner: Towards Reasoning-based LLM Safeguards
par: Liu, Yue, et autres
Publié: (2025) -
Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts
par: Hasan, Md. Mehedi, et autres
Publié: (2025) -
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
par: Zolkowski, Artur, et autres
Publié: (2025)