Saved in:
| Main Authors: | Nimase, Ojas, Chen, Zhe, Qi, Gengpei, Zhao, Yue, Hu, Xiyang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29107 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation
by: Chen, Qirui, et al.
Published: (2026)
by: Chen, Qirui, et al.
Published: (2026)
AutoPenBench: Benchmarking Generative Agents for Penetration Testing
by: Gioacchini, Luca, et al.
Published: (2024)
by: Gioacchini, Luca, et al.
Published: (2024)
SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code
by: Li, Xinghang, et al.
Published: (2025)
by: Li, Xinghang, et al.
Published: (2025)
MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks
by: Zhang, Xinkai, et al.
Published: (2026)
by: Zhang, Xinkai, et al.
Published: (2026)
DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis
by: Gong, Chen, et al.
Published: (2025)
by: Gong, Chen, et al.
Published: (2025)
Secure On-Device Video OOD Detection Without Backpropagation
by: Li, Shawn, et al.
Published: (2025)
by: Li, Shawn, et al.
Published: (2025)
AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding
by: Ma, Haokai, et al.
Published: (2025)
by: Ma, Haokai, et al.
Published: (2025)
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
by: Yang, Yixuan, et al.
Published: (2025)
by: Yang, Yixuan, et al.
Published: (2025)
SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories
by: Shen, Chihao, et al.
Published: (2025)
by: Shen, Chihao, et al.
Published: (2025)
ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents
by: Lee, Seunghyun, et al.
Published: (2026)
by: Lee, Seunghyun, et al.
Published: (2026)
SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors
by: Najt, Elle, et al.
Published: (2026)
by: Najt, Elle, et al.
Published: (2026)
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios
by: Lim, Taein, et al.
Published: (2026)
by: Lim, Taein, et al.
Published: (2026)
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models
by: Liu, Xuxu, et al.
Published: (2025)
by: Liu, Xuxu, et al.
Published: (2025)
AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence
by: Alam, Md Tanvirul, et al.
Published: (2025)
by: Alam, Md Tanvirul, et al.
Published: (2025)
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
by: Jing, Pengfei, et al.
Published: (2024)
by: Jing, Pengfei, et al.
Published: (2024)
Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
by: Liu, Jinbo, et al.
Published: (2025)
by: Liu, Jinbo, et al.
Published: (2025)
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents
by: Yin, Sheng, et al.
Published: (2024)
by: Yin, Sheng, et al.
Published: (2024)
Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents
by: Zhang, Hanrong, et al.
Published: (2024)
by: Zhang, Hanrong, et al.
Published: (2024)
Backdoor4Good: Benchmarking Beneficial Uses of Backdoors in LLMs
by: Li, Yige, et al.
Published: (2026)
by: Li, Yige, et al.
Published: (2026)
Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation
by: Schwartz, Daniel, et al.
Published: (2025)
by: Schwartz, Daniel, et al.
Published: (2025)
Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
by: Zhang, Dongsen, et al.
Published: (2025)
by: Zhang, Dongsen, et al.
Published: (2025)
OpenSage: Self-programming Agent Generation Engine
by: Li, Hongwei, et al.
Published: (2026)
by: Li, Hongwei, et al.
Published: (2026)
AICCE: AI Driven Compliance Checker Engine
by: Rahman, Mohammad Wali Ur, et al.
Published: (2026)
by: Rahman, Mohammad Wali Ur, et al.
Published: (2026)
DPrivBench: Benchmarking LLMs' Reasoning for Differential Privacy
by: Wang, Erchi, et al.
Published: (2026)
by: Wang, Erchi, et al.
Published: (2026)
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
by: Cao, Tri, et al.
Published: (2025)
by: Cao, Tri, et al.
Published: (2025)
FraudBench: A Multimodal Benchmark for Detecting AI-Generated Fraudulent Refund Evidence
by: Yan, Xinyu, et al.
Published: (2026)
by: Yan, Xinyu, et al.
Published: (2026)
SastBench: A Benchmark for Testing Agentic SAST Triage
by: Feiglin, Jake, et al.
Published: (2026)
by: Feiglin, Jake, et al.
Published: (2026)
WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents
by: Liu, Yinuo, et al.
Published: (2025)
by: Liu, Yinuo, et al.
Published: (2025)
Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation
by: Kharlamova, Arina, et al.
Published: (2025)
by: Kharlamova, Arina, et al.
Published: (2025)
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security
by: Zhao, Wei, et al.
Published: (2025)
by: Zhao, Wei, et al.
Published: (2025)
GhostEI-Bench: Do Mobile Agents Resilience to Environmental Injection in Dynamic On-Device Environments?
by: Chen, Chiyu, et al.
Published: (2025)
by: Chen, Chiyu, et al.
Published: (2025)
Bleeding Pathways: Vanishing Discriminability in LLM Hidden States Fuels Jailbreak Attacks
by: Zhang, Yingjie, et al.
Published: (2025)
by: Zhang, Yingjie, et al.
Published: (2025)
SOLIDO: A Robust Watermarking Method for Speech Synthesis via Low-Rank Adaptation
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks
by: Mu, Yanming, et al.
Published: (2026)
by: Mu, Yanming, et al.
Published: (2026)
SecCodeBench-V2 Technical Report
by: Chen, Longfei, et al.
Published: (2026)
by: Chen, Longfei, et al.
Published: (2026)
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models
by: Wahréus, Johan, et al.
Published: (2025)
by: Wahréus, Johan, et al.
Published: (2025)
TH-Bench: Evaluating Evading Attacks via Humanizing AI Text on Machine-Generated Text Detectors
by: Zheng, Jingyi, et al.
Published: (2025)
by: Zheng, Jingyi, et al.
Published: (2025)
Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing
by: Gao, ZhenZhe, et al.
Published: (2024)
by: Gao, ZhenZhe, et al.
Published: (2024)
AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports
by: Sajadi, Amirali, et al.
Published: (2026)
by: Sajadi, Amirali, et al.
Published: (2026)
Similar Items
-
HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation
by: Chen, Qirui, et al.
Published: (2026) -
AutoPenBench: Benchmarking Generative Agents for Penetration Testing
by: Gioacchini, Luca, et al.
Published: (2024) -
SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code
by: Li, Xinghang, et al.
Published: (2025) -
MT-JailBench: A Modular Benchmark for Understanding Multi-Turn Jailbreak Attacks
by: Zhang, Xinkai, et al.
Published: (2026) -
DPImageBench: A Unified Benchmark for Differentially Private Image Synthesis
by: Gong, Chen, et al.
Published: (2025)