:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Zhibo, Hu, Tianze, Chen, Zaiye, Tang, Mingjie
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language Cryptography and Security
Online Access:	https://arxiv.org/abs/2512.06716
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents
by: Zhang, Yixiang, et al.
Published: (2026)

AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models
by: Zhang, Jinchuan, et al.
Published: (2025)

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)

From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agent and Expert System Architectures
by: Bonfanti, Chiara, et al.
Published: (2026)

ForgeDAN: An Evolutionary Framework for Jailbreaking Aligned Large Language Models
by: Cheng, Siyang, et al.
Published: (2025)

AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation
by: Roy, Joyjit, et al.
Published: (2026)

Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models
by: Liu, Xiao, et al.
Published: (2024)

Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
by: Chen, Jiaqi, et al.
Published: (2024)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models
by: Zhang, Yu, et al.
Published: (2025)

Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
by: Cao, Bochuan, et al.
Published: (2023)

Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs
by: Lau, Gregory Kang Ruey, et al.
Published: (2024)

Towards Understanding the Cognitive Habits of Large Reasoning Models
by: Dong, Jianshuo, et al.
Published: (2025)

Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection
by: Liang, Zi, et al.
Published: (2026)

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
by: Liu, Jinbo, et al.
Published: (2025)

CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)

SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
by: Lin, Xixun, et al.
Published: (2026)

AVISE: Framework for Evaluating the Security of AI Systems
by: Lempinen, Mikko, et al.
Published: (2026)

Universal and Context-Independent Triggers for Precise Control of LLM Outputs
by: Liang, Jiashuo, et al.
Published: (2024)

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation
by: Qiao, Yuxuan, et al.
Published: (2025)

Decoupled Alignment for Robust Plug-and-Play Adaptation
by: Luo, Haozheng, et al.
Published: (2024)

DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection
by: Yan, Yuliang, et al.
Published: (2025)

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack
by: Li, Hao, et al.
Published: (2026)

Can Reinforcement Learning Unlock the Hidden Dangers in Aligned Large Language Models?
by: Karkevandi, Mohammad Bahrami, et al.
Published: (2024)

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory
by: Wang, Yuhui, et al.
Published: (2026)

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
by: Yang, Wenkai, et al.
Published: (2024)

Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems
by: Wang, Xiaoqing, et al.
Published: (2025)

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
by: Wang, Fengxiang, et al.
Published: (2024)

Textual Unlearning Gives a False Sense of Unlearning
by: Du, Jiacheng, et al.
Published: (2024)

RobustKV: Defending Large Language Models against Jailbreak Attacks via KV Eviction
by: Jiang, Tanqiu, et al.
Published: (2024)

PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System
by: Munoz, Gary D. Lopez, et al.
Published: (2024)

NSmark: Null Space Based Black-box Watermarking Defense Framework for Language Models
by: Zhao, Haodong, et al.
Published: (2024)

Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models
by: Yu, Yongcan, et al.
Published: (2025)

SAMark: A Self-Anchored Text Watermarking with Paragraph-Level Paraphrase Robustness
by: Huo, Jiahao, et al.
Published: (2026)

QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language
by: Zou, Qingsong, et al.
Published: (2025)

LATTICE: Evaluating Decision Support Utility of Crypto Agents
by: Chan, Aaron, et al.
Published: (2026)

Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
by: Liang, Zi, et al.
Published: (2025)

Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models
by: Yuan, Hongbang, et al.
Published: (2024)

Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region
by: Leong, Chak Tou, et al.
Published: (2025)

LLMs Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions
by: Hu, Xuhao, et al.
Published: (2025)