:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Barnes, Jarrod
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.21083
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation
by: Barnes, Jarrod
Published: (2026)

Continual Learning, Not Training: Online Adaptation For Agents
by: Jaglan, Aman, et al.
Published: (2025)

AIR: Improving Agent Safety through Incident Response
by: Xiao, Zibo, et al.
Published: (2026)

Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models
by: Caballero, Mario Marín, et al.
Published: (2026)

SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting with Tri-Context Personalization
by: Meidan, Yair, et al.
Published: (2026)

SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents
by: Begimher, Daniel, et al.
Published: (2026)

TheraAgent: Multi-Agent Framework with Self-Evolving Memory and Evidence-Calibrated Reasoning for PET Theranostics
by: Chen, Zhihao, et al.
Published: (2026)

Incident Analysis for AI Agents
by: Ezell, Carson, et al.
Published: (2025)

SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories
by: Shen, Chihao, et al.
Published: (2025)

Anomaly Detection for Incident Response at Scale
by: Wang, Hanzhang, et al.
Published: (2024)

Measuring Responsibility in Multi-Agent Systems
by: Mu, Chunyan, et al.
Published: (2024)

EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law
by: Lichkovski, Ilija, et al.
Published: (2025)

In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach
by: Gao, Yiran, et al.
Published: (2026)

Cloud-based XAI Services for Assessing Open Repository Models Under Adversarial Attacks
by: Wang, Zerui, et al.
Published: (2024)

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
by: Becker, Joel, et al.
Published: (2025)

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique
by: Hariharan, Suhas, et al.
Published: (2024)

OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
by: Ullrich, Karen, et al.
Published: (2025)

ShIOEnv: A Command Evaluation Environment for Grammar-Constrained Synthesis and Execution Behavior Modeling
by: Ragsdale, Jarrod, et al.
Published: (2025)

AI Loss of Control Incident Management: Response & Resilience
by: Gruetzemacher, Ross
Published: (2026)

AI for DevSecOps: A Landscape and Future Opportunities
by: Fu, Michael, et al.
Published: (2024)

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing
by: Meguimtsop, Almene De Meran, et al.
Published: (2026)

Automated Traffic Incident Response Plans using Generative Artificial Intelligence: Part 1 -- Building the Incident Response Benchmark
by: Grigorev, Artur, et al.
Published: (2025)

OpsAgent: An Evolving Multi-agent System for Incident Management in Microservices
by: Luo, Yu, et al.
Published: (2025)

SecCodeBench-V2 Technical Report
by: Chen, Longfei, et al.
Published: (2026)

The potential of LLM-generated reports in DevSecOps
by: Lykousas, Nikolaos, et al.
Published: (2024)

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG
by: Qian, Pin, et al.
Published: (2026)

SecInfer: Preventing Prompt Injection via Inference-time Scaling
by: Liu, Yupei, et al.
Published: (2025)

TIPS: Threat Actor Informed Prioritization of Applications using SecEncoder
by: Bulut, Muhammed Fatih, et al.
Published: (2024)

Agents of Change: Self-Evolving LLM Agents for Strategic Planning
by: Belle, Nikolas, et al.
Published: (2025)

Chances and Challenges of the Model Context Protocol in Digital Forensics and Incident Response
by: Hilgert, Jan-Niclas, et al.
Published: (2025)

HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation
by: Chen, Qirui, et al.
Published: (2026)

SecPE: Secure Prompt Ensembling for Private and Robust Large Language Models
by: Zhang, Jiawen, et al.
Published: (2025)

SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
by: Jing, Pengfei, et al.
Published: (2024)

A Hybrid Model for Traffic Incident Detection based on Generative Adversarial Networks and Transformer Model
by: Lu, Xinying, et al.
Published: (2024)

SecEncoder: Logs are All You Need in Security
by: Bulut, Muhammed Fatih, et al.
Published: (2024)

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization
by: Wang, Hao, et al.
Published: (2026)

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
by: Chen, Sizhe, et al.
Published: (2025)

Robust Calibration For Improved Weather Prediction Under Distributional Shift
by: Gilda, Sankalp, et al.
Published: (2024)

ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
by: Chen, Yifei, et al.
Published: (2026)

Development and Evaluation of an Ontology for Non-Invasive Respiratory Support in Acute Care
by: Islam, Md Fantacher, et al.
Published: (2025)