Saved in:
| Main Authors: | Wang, Yifei, Li, Tianlin, Zhang, Xiaohan, Zhang, Xiaoyu, Ma, Wei, Cheng, Mingfei, Pan, Li |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.19790 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization
by: Gu, Yongtong, et al.
Published: (2026)
by: Gu, Yongtong, et al.
Published: (2026)
Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption
by: Morales, Jaime, et al.
Published: (2026)
by: Morales, Jaime, et al.
Published: (2026)
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
by: Qi, Jinhu, et al.
Published: (2026)
by: Qi, Jinhu, et al.
Published: (2026)
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
by: Wang, Haochuan Kevin, et al.
Published: (2026)
by: Wang, Haochuan Kevin, et al.
Published: (2026)
Seeing Is No Longer Believing: Frontier Image Generation Models, Synthetic Visual Evidence, and Real-World Risk
by: Wu, Shuai, et al.
Published: (2026)
by: Wu, Shuai, et al.
Published: (2026)
Robust Uncertainty Quantification for Factual Generation of Large Language Models
by: Zhang, Yuhao, et al.
Published: (2026)
by: Zhang, Yuhao, et al.
Published: (2026)
VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
by: Santillana, Juan S.
Published: (2026)
by: Santillana, Juan S.
Published: (2026)
Countermind: A Multi-Layered Security Architecture for Large Language Models
by: Schwarz, Dominik
Published: (2025)
by: Schwarz, Dominik
Published: (2025)
Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
by: Khatiwala, Jeel Piyushkumar, et al.
Published: (2026)
by: Khatiwala, Jeel Piyushkumar, et al.
Published: (2026)
Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought
by: Zhang, Yuyi, et al.
Published: (2025)
by: Zhang, Yuyi, et al.
Published: (2025)
Before the Last Token: Diagnosing Final-Token Safety Probe Failures
by: Doda, Shravan
Published: (2026)
by: Doda, Shravan
Published: (2026)
Measuring Harmfulness of Computer-Using Agents
by: Tian, Aaron Xuxiang, et al.
Published: (2025)
by: Tian, Aaron Xuxiang, et al.
Published: (2025)
Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
by: Othman, Refat
Published: (2026)
by: Othman, Refat
Published: (2026)
Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis
by: Zanbaghi, Shahin, et al.
Published: (2025)
by: Zanbaghi, Shahin, et al.
Published: (2025)
JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
by: Chen, Renmiao, et al.
Published: (2025)
by: Chen, Renmiao, et al.
Published: (2025)
Send to which account? Evaluation of an LLM-based Scambaiting System
by: Siadati, Hossein, et al.
Published: (2025)
by: Siadati, Hossein, et al.
Published: (2025)
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
by: Chona, Alankrit, et al.
Published: (2026)
by: Chona, Alankrit, et al.
Published: (2026)
A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts
by: Young, Richard J., et al.
Published: (2026)
by: Young, Richard J., et al.
Published: (2026)
Semantic Superiority vs. Forensic Efficiency: A Comparative Analysis of Deep Learning and Psycholinguistics for Business Email Compromise Detection
by: Adjei, Yaw Osei, et al.
Published: (2025)
by: Adjei, Yaw Osei, et al.
Published: (2025)
CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments
by: Keppler, Gustav, et al.
Published: (2026)
by: Keppler, Gustav, et al.
Published: (2026)
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
by: Dawson, Ads, et al.
Published: (2025)
by: Dawson, Ads, et al.
Published: (2025)
The Automation Advantage in AI Red Teaming
by: Mulla, Rob, et al.
Published: (2025)
by: Mulla, Rob, et al.
Published: (2025)
Towards Modeling Cybersecurity Behavior of Humans in Organizations
by: Kürtz, Klaas Ole
Published: (2026)
by: Kürtz, Klaas Ole
Published: (2026)
Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)
by: Young, Richard J., et al.
Published: (2026)
by: Young, Richard J., et al.
Published: (2026)
RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code
by: Chen, Jiachi, et al.
Published: (2024)
by: Chen, Jiachi, et al.
Published: (2024)
Whisper Leak: a side-channel attack on Large Language Models
by: McDonald, Geoff, et al.
Published: (2025)
by: McDonald, Geoff, et al.
Published: (2025)
DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning
by: Zeng, Fanwei, et al.
Published: (2026)
by: Zeng, Fanwei, et al.
Published: (2026)
SALLIE: Safeguarding Against Latent Language & Image Exploits
by: Azov, Guy, et al.
Published: (2026)
by: Azov, Guy, et al.
Published: (2026)
Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks
by: Merves, Tyler H., et al.
Published: (2026)
by: Merves, Tyler H., et al.
Published: (2026)
VoiceSHIELD-Small: Real-Time Malicious Speech Detection and Transcription
by: Ranjan, Sumit, et al.
Published: (2026)
by: Ranjan, Sumit, et al.
Published: (2026)
Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models
by: Syed, Mohammed Sameer, et al.
Published: (2026)
by: Syed, Mohammed Sameer, et al.
Published: (2026)
Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships
by: Oh, Myung Gyo, et al.
Published: (2024)
by: Oh, Myung Gyo, et al.
Published: (2024)
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification
by: Zhang, Tian, et al.
Published: (2026)
by: Zhang, Tian, et al.
Published: (2026)
Toward Secure and Compliant AI: Organizational Standards and Protocols for NLP Model Lifecycle Management
by: Arora, Sunil, et al.
Published: (2025)
by: Arora, Sunil, et al.
Published: (2025)
Jailbreak Mimicry: Automated Discovery of Narrative-Based Jailbreaks for Large Language Models
by: Ntais, Pavlos
Published: (2025)
by: Ntais, Pavlos
Published: (2025)
$δ$-STEAL: LLM Stealing Attack with Local Differential Privacy
by: Dang, Kieu, et al.
Published: (2025)
by: Dang, Kieu, et al.
Published: (2025)
Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults
by: Usman, Rana Muhammad
Published: (2026)
by: Usman, Rana Muhammad
Published: (2026)
AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework
by: Nathanson, Samuel, et al.
Published: (2025)
by: Nathanson, Samuel, et al.
Published: (2025)
Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks
by: Hu, Saisai
Published: (2026)
by: Hu, Saisai
Published: (2026)
DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
by: Wang, Jerry, et al.
Published: (2025)
by: Wang, Jerry, et al.
Published: (2025)
Similar Items
-
MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization
by: Gu, Yongtong, et al.
Published: (2026) -
Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption
by: Morales, Jaime, et al.
Published: (2026) -
Beyond Benchmark Islands: Toward Representative Trustworthiness Evaluation for Agentic AI
by: Qi, Jinhu, et al.
Published: (2026) -
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
by: Wang, Haochuan Kevin, et al.
Published: (2026) -
Seeing Is No Longer Believing: Frontier Image Generation Models, Synthetic Visual Evidence, and Real-World Risk
by: Wu, Shuai, et al.
Published: (2026)