Saved in:
| Main Author: | Gowda, Ishrith |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.03482 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hidden in Memory: Sleeper Memory Poisoning in LLM Agents
by: Pulipaka, Sidharth, et al.
Published: (2026)
by: Pulipaka, Sidharth, et al.
Published: (2026)
Binary-30K: A Heterogeneous Dataset for Deep Learning in Binary Analysis and Malware Detection
by: Bommarito II, Michael J.
Published: (2025)
by: Bommarito II, Michael J.
Published: (2025)
Attacking interpretable NLP systems
by: Abdukhamidov, Eldor, et al.
Published: (2025)
by: Abdukhamidov, Eldor, et al.
Published: (2025)
AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework
by: Nathanson, Samuel, et al.
Published: (2025)
by: Nathanson, Samuel, et al.
Published: (2025)
Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
by: Othman, Refat
Published: (2026)
by: Othman, Refat
Published: (2026)
Refusal Evaluation in Coding LLMs and Code Agents: A Systematic Review of Thirteen Malicious-Code Prompt Corpora (2023-2025)
by: Young, Richard J., et al.
Published: (2026)
by: Young, Richard J., et al.
Published: (2026)
Detecting Prompt Injection Attacks Against Application Using Classifiers
by: Shaheer, Safwan, et al.
Published: (2025)
by: Shaheer, Safwan, et al.
Published: (2025)
Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks
by: Shaheer, Safwan, et al.
Published: (2025)
by: Shaheer, Safwan, et al.
Published: (2025)
Governance Architecture for Autonomous Agent Systems: Threats, Framework, and Engineering Practice
by: Ge, Yuxu
Published: (2026)
by: Ge, Yuxu
Published: (2026)
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification
by: Zhang, Tian, et al.
Published: (2026)
by: Zhang, Tian, et al.
Published: (2026)
Multilingual AI-Driven Password Strength Estimation with Similarity-Based Detection
by: Palaniappan, Nikitha M., et al.
Published: (2026)
by: Palaniappan, Nikitha M., et al.
Published: (2026)
AegisShield: Democratizing Cyber Threat Modeling with Generative AI
by: Grofsky, Matthew
Published: (2025)
by: Grofsky, Matthew
Published: (2025)
Evaluating the Reliability of Digital Forensic Evidence Discovered by Large Language Model: A Case Study
by: Khatiwala, Jeel Piyushkumar, et al.
Published: (2026)
by: Khatiwala, Jeel Piyushkumar, et al.
Published: (2026)
Towards Agentic Investigation of Security Alerts
by: Eilertsen, Even, et al.
Published: (2026)
by: Eilertsen, Even, et al.
Published: (2026)
Privately Fine-Tuned LLMs Preserve Temporal Dynamics in Tabular Data
by: Rosenblatt, Lucas, et al.
Published: (2026)
by: Rosenblatt, Lucas, et al.
Published: (2026)
Enabling Transparent Cyber Threat Intelligence Combining Large Language Models and Domain Ontologies
by: Cotti, Luca, et al.
Published: (2025)
by: Cotti, Luca, et al.
Published: (2025)
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
by: Williams-King, David, et al.
Published: (2025)
by: Williams-King, David, et al.
Published: (2025)
Quantifying Return on Security Controls in LLM Systems
by: Moulton, Richard Helder, et al.
Published: (2025)
by: Moulton, Richard Helder, et al.
Published: (2025)
Security Considerations for Multi-agent Systems
by: Nguyen, Tam, et al.
Published: (2026)
by: Nguyen, Tam, et al.
Published: (2026)
Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents
by: Sidik, Bronislav, et al.
Published: (2026)
by: Sidik, Bronislav, et al.
Published: (2026)
Tatemae: Detecting Alignment Faking via Tool Selection in LLMs
by: Leonesi, Matteo, et al.
Published: (2026)
by: Leonesi, Matteo, et al.
Published: (2026)
Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories
by: Bercovich, Ivan, et al.
Published: (2026)
by: Bercovich, Ivan, et al.
Published: (2026)
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models
by: Dawson, Ads, et al.
Published: (2025)
by: Dawson, Ads, et al.
Published: (2025)
The Automation Advantage in AI Red Teaming
by: Mulla, Rob, et al.
Published: (2025)
by: Mulla, Rob, et al.
Published: (2025)
SAND: A Self-supervised and Adaptive NAS-Driven Framework for Hardware Trojan Detection
by: Pan, Zhixin, et al.
Published: (2025)
by: Pan, Zhixin, et al.
Published: (2025)
Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs
by: Gupta, Aayush
Published: (2025)
by: Gupta, Aayush
Published: (2025)
Poison in the Well: Feature Embedding Disruption in Backdoor Attacks
by: Feng, Zhou, et al.
Published: (2025)
by: Feng, Zhou, et al.
Published: (2025)
David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning
by: Nellessen, Samuel, et al.
Published: (2026)
by: Nellessen, Samuel, et al.
Published: (2026)
Evaluating Query Efficiency and Accuracy of Transfer Learning-based Model Extraction Attack in Federated Learning
by: Ahamed, Sayyed Farid, et al.
Published: (2025)
by: Ahamed, Sayyed Farid, et al.
Published: (2025)
Autonomous Penetration Testing: Solving Capture-the-Flag Challenges with LLMs
by: Bakker, Isabelle, et al.
Published: (2025)
by: Bakker, Isabelle, et al.
Published: (2025)
A Self-Improving Architecture for Dynamic Safety in Large Language Models
by: Slater, Tyler
Published: (2025)
by: Slater, Tyler
Published: (2025)
Retrieval Augmented Classification for Confidential Documents
by: Chang, Yeseul E., et al.
Published: (2026)
by: Chang, Yeseul E., et al.
Published: (2026)
SALLIE: Safeguarding Against Latent Language & Image Exploits
by: Azov, Guy, et al.
Published: (2026)
by: Azov, Guy, et al.
Published: (2026)
Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models
by: X, Abdullah
Published: (2025)
by: X, Abdullah
Published: (2025)
Continuous Discovery of Vulnerabilities in LLM Serving Systems with Fuzzing
by: Zhao, Yunze, et al.
Published: (2026)
by: Zhao, Yunze, et al.
Published: (2026)
BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders
by: DeLeeuw, Caleb
Published: (2026)
by: DeLeeuw, Caleb
Published: (2026)
Provable Repair of Deep Neural Network Defects by Preimage Synthesis and Property Refinement
by: Ma, Jianan, et al.
Published: (2025)
by: Ma, Jianan, et al.
Published: (2025)
RADEP: A Resilient Adaptive Defense Framework Against Model Extraction Attacks
by: Chakraborty, Amit, et al.
Published: (2025)
by: Chakraborty, Amit, et al.
Published: (2025)
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
by: Blanco-Justicia, Alberto, et al.
Published: (2024)
by: Blanco-Justicia, Alberto, et al.
Published: (2024)
PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models
by: Seddik, Issam, et al.
Published: (2025)
by: Seddik, Issam, et al.
Published: (2025)
Similar Items
-
Hidden in Memory: Sleeper Memory Poisoning in LLM Agents
by: Pulipaka, Sidharth, et al.
Published: (2026) -
Binary-30K: A Heterogeneous Dataset for Deep Learning in Binary Analysis and Malware Detection
by: Bommarito II, Michael J.
Published: (2025) -
Attacking interpretable NLP systems
by: Abdukhamidov, Eldor, et al.
Published: (2025) -
AI Bill of Materials and Beyond: Systematizing Security Assurance through the AI Risk Scanning (AIRS) Framework
by: Nathanson, Samuel, et al.
Published: (2025) -
Predicting Known Vulnerabilities from Attack Descriptions Using Sentence Transformers
by: Othman, Refat
Published: (2026)