:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nimase, Ojas, Chen, Zhe, Qi, Gengpei, Zhao, Yue, Hu, Xiyang
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.29107
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation
by: Chen, Qirui, et al.
Published: (2026)

AutoPenBench: Benchmarking Generative Agents for Penetration Testing
by: Gioacchini, Luca, et al.
Published: (2024)

SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code
by: Li, Xinghang, et al.
Published: (2025)

MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks
by: Zhang, Xinkai, et al.
Published: (2026)

DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis
by: Gong, Chen, et al.
Published: (2025)

Secure On-Device Video OOD Detection Without Backpropagation
by: Li, Shawn, et al.
Published: (2025)

AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding
by: Ma, Haokai, et al.
Published: (2025)

MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
by: Yang, Yixuan, et al.
Published: (2025)

SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories
by: Shen, Chihao, et al.
Published: (2025)

ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents
by: Lee, Seunghyun, et al.
Published: (2026)

SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors
by: Najt, Elle, et al.
Published: (2026)

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
by: Lim, Taein, et al.
Published: (2026)

ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
by: Liu, Xuxu, et al.
Published: (2025)

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
by: Alam, Md Tanvirul, et al.
Published: (2025)

SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
by: Jing, Pengfei, et al.
Published: (2024)

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
by: Liu, Jinbo, et al.
Published: (2025)

SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents
by: Yin, Sheng, et al.
Published: (2024)

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
by: Zhang, Hanrong, et al.
Published: (2024)

Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs
by: Li, Yige, et al.
Published: (2026)

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation
by: Schwartz, Daniel, et al.
Published: (2025)

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
by: Wang, Hao, et al.
Published: (2026)

MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
by: Zhang, Dongsen, et al.
Published: (2025)

OpenSage: Self-programming Agent Generation Engine
by: Li, Hongwei, et al.
Published: (2026)

AICCE: AI Driven Compliance Checker Engine
by: Rahman, Mohammad Wali Ur, et al.
Published: (2026)

DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy
by: Wang, Erchi, et al.
Published: (2026)

VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
by: Cao, Tri, et al.
Published: (2025)

FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence
by: Yan, Xinyu, et al.
Published: (2026)

SastBench: A Benchmark for Testing Agentic SAST Triage
by: Feiglin, Jake, et al.
Published: (2026)

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
by: Liu, Yinuo, et al.
Published: (2025)

Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
by: Kharlamova, Arina, et al.
Published: (2025)

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
by: Zhao, Wei, et al.
Published: (2025)

GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?
by: Chen, Chiyu, et al.
Published: (2025)

Bleeding Pathways: Vanishing Discriminability in LLM Hidden States Fuels Jailbreak Attacks
by: Zhang, Yingjie, et al.
Published: (2025)

SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
by: Li, Yue, et al.
Published: (2025)

Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks
by: Mu, Yanming, et al.
Published: (2026)

SecCodeBench-V2 Technical Report
by: Chen, Longfei, et al.
Published: (2026)

CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models
by: Wahréus, Johan, et al.
Published: (2025)

TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
by: Zheng, Jingyi, et al.
Published: (2025)

Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing
by: Gao, ZhenZhe, et al.
Published: (2024)

AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports
by: Sajadi, Amirali, et al.
Published: (2026)