:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Yifei, Li, Tianlin, Zhang, Xiaohan, Zhang, Xiaoyu, Ma, Wei, Cheng, Mingfei, Pan, Li
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Machine Learning I.2.7; K.6.5
Online Access:	https://arxiv.org/abs/2604.19790
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization
by: Gu, Yongtong, et al.
Published: (2026)

Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption
by: Morales, Jaime, et al.
Published: (2026)

Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
by: Qi, Jinhu, et al.
Published: (2026)

Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
by: Wang, Haochuan Kevin, et al.
Published: (2026)

Seeing Is No Longer Believing: Frontier Image Generation Models, Synthetic Visual Evidence, and Real-World Risk
by: Wu, Shuai, et al.
Published: (2026)

Robust Uncertainty Quantification for Factual Generation of Large Language Models
by: Zhang, Yuhao, et al.
Published: (2026)

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
by: Santillana, Juan S.
Published: (2026)

Countermind: A Multi-Layered Security Architecture for Large Language Models
by: Schwarz, Dominik
Published: (2025)

Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
by: Khatiwala, Jeel Piyushkumar, et al.
Published: (2026)

Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought
by: Zhang, Yuyi, et al.
Published: (2025)

Before the Last Token: Diagnosing Final-Token Safety Probe Failures
by: Doda, Shravan
Published: (2026)

Measuring Harmfulness of Computer-Using Agents
by: Tian, Aaron Xuxiang, et al.
Published: (2025)

Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
by: Othman, Refat
Published: (2026)

Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis
by: Zanbaghi, Shahin, et al.
Published: (2025)

JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
by: Chen, Renmiao, et al.
Published: (2025)

Send to which account? Evaluation of an LLM-based Scambaiting System
by: Siadati, Hossein, et al.
Published: (2025)

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
by: Chona, Alankrit, et al.
Published: (2026)

A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts
by: Young, Richard J., et al.
Published: (2026)

Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection
by: Adjei, Yaw Osei, et al.
Published: (2025)

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments
by: Keppler, Gustav, et al.
Published: (2026)

AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
by: Dawson, Ads, et al.
Published: (2025)

The Automation Advantage in AI Red Teaming
by: Mulla, Rob, et al.
Published: (2025)

Towards Modeling Cybersecurity Behavior of Humans in Organizations
by: Kürtz, Klaas Ole
Published: (2026)

Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)
by: Young, Richard J., et al.
Published: (2026)

RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code
by: Chen, Jiachi, et al.
Published: (2024)

Whisper Leak: a side-channel attack on Large Language Models
by: McDonald, Geoff, et al.
Published: (2025)

DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning
by: Zeng, Fanwei, et al.
Published: (2026)

SALLIE: Safeguarding Against Latent Language & Image Exploits
by: Azov, Guy, et al.
Published: (2026)

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks
by: Merves, Tyler H., et al.
Published: (2026)

VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription
by: Ranjan, Sumit, et al.
Published: (2026)

Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models
by: Syed, Mohammed Sameer, et al.
Published: (2026)

Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships
by: Oh, Myung Gyo, et al.
Published: (2024)

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification
by: Zhang, Tian, et al.
Published: (2026)

Toward Secure and Compliant AI: Organizational Standards and Protocols for NLP Model Lifecycle Management
by: Arora, Sunil, et al.
Published: (2025)

Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models
by: Ntais, Pavlos
Published: (2025)

$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy
by: Dang, Kieu, et al.
Published: (2025)

Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults
by: Usman, Rana Muhammad
Published: (2026)

AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework
by: Nathanson, Samuel, et al.
Published: (2025)

Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks
by: Hu, Saisai
Published: (2026)

DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
by: Wang, Jerry, et al.
Published: (2025)