Saved in:
| Main Authors: | Leshin, Jonah, Shah, Manish, Timmis, Ian, Kang, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.19022 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications
by: Rehan, Tzafrir
Published: (2026)
by: Rehan, Tzafrir
Published: (2026)
AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
by: Bhardwaj, Varun Pratap
Published: (2026)
by: Bhardwaj, Varun Pratap
Published: (2026)
Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation
by: Downing, Mara, et al.
Published: (2025)
by: Downing, Mara, et al.
Published: (2025)
Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces
by: Vispute, Neelmani, et al.
Published: (2026)
by: Vispute, Neelmani, et al.
Published: (2026)
Tracking the Behavioral Trajectories of Adapting Agents
by: Leshin, Jonah, et al.
Published: (2026)
by: Leshin, Jonah, et al.
Published: (2026)
elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings
by: Osborne, Philip, et al.
Published: (2025)
by: Osborne, Philip, et al.
Published: (2025)
Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
Test Case Features as Hyper-heuristics for Inductive Programming
by: McDaid, Edward, et al.
Published: (2024)
by: McDaid, Edward, et al.
Published: (2024)
Redacted
by: de Lima, Bruno Rucy Carneiro Alves, et al.
Published: (2023)
by: de Lima, Bruno Rucy Carneiro Alves, et al.
Published: (2023)
Instruction and Solution Probabilities as Heuristics for Inductive Programming
by: McDaid, Edward, et al.
Published: (2025)
by: McDaid, Edward, et al.
Published: (2025)
LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB
by: Bekmyradov, Vekil, et al.
Published: (2026)
by: Bekmyradov, Vekil, et al.
Published: (2026)
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
by: Lee, Christine, et al.
Published: (2025)
by: Lee, Christine, et al.
Published: (2025)
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
by: Donadel, Denis, et al.
Published: (2024)
by: Donadel, Denis, et al.
Published: (2024)
Demystifying the Silence of Correctness Bugs in PyTorch Compiler
by: Li, Meiziniu, et al.
Published: (2026)
by: Li, Meiziniu, et al.
Published: (2026)
Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries
by: Li, Meiziniu, et al.
Published: (2024)
by: Li, Meiziniu, et al.
Published: (2024)
COMET: Coverage-guided Model Generation For Deep Learning Library Testing
by: Li, Meiziniu, et al.
Published: (2022)
by: Li, Meiziniu, et al.
Published: (2022)
Semantic Modeling for World-Centered Architectures
by: Mantsivoda, Andrei, et al.
Published: (2026)
by: Mantsivoda, Andrei, et al.
Published: (2026)
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%
by: Dillon, Drew, et al.
Published: (2026)
by: Dillon, Drew, et al.
Published: (2026)
Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
by: Cambronero, José, et al.
Published: (2025)
by: Cambronero, José, et al.
Published: (2025)
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges
by: Muzsai, Lajos, et al.
Published: (2025)
by: Muzsai, Lajos, et al.
Published: (2025)
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
by: Muzsai, Lajos, et al.
Published: (2024)
by: Muzsai, Lajos, et al.
Published: (2024)
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
by: Vaithilingam, Priyan, et al.
Published: (2025)
by: Vaithilingam, Priyan, et al.
Published: (2025)
Automated structural testing of LLM-based agents: methods, framework, and case studies
by: Kohl, Jens, et al.
Published: (2026)
by: Kohl, Jens, et al.
Published: (2026)
BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks
by: Gandhi, Shubham, et al.
Published: (2024)
by: Gandhi, Shubham, et al.
Published: (2024)
A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines
by: Du, Gaoyuan, et al.
Published: (2026)
by: Du, Gaoyuan, et al.
Published: (2026)
An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
by: More, Riddhi, et al.
Published: (2025)
by: More, Riddhi, et al.
Published: (2025)
Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
by: Khatchadourian, Raffi
Published: (2026)
by: Khatchadourian, Raffi
Published: (2026)
Towards a Probabilistic Framework for Analyzing and Improving LLM-Enabled Software
by: Baldonado, Juan Manuel, et al.
Published: (2025)
by: Baldonado, Juan Manuel, et al.
Published: (2025)
NeuroLog: Reasoning You Can Audit -- Neuro-Symbolic Vulnerability Discovery via LLM Facts, Datalog, and SMT
by: Rawat, Sanjay
Published: (2026)
by: Rawat, Sanjay
Published: (2026)
Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis
by: Woisetschläger, Herbert, et al.
Published: (2024)
by: Woisetschläger, Herbert, et al.
Published: (2024)
Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution
by: Sohail, Sarmad, et al.
Published: (2026)
by: Sohail, Sarmad, et al.
Published: (2026)
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
by: Jana, Prithwish, et al.
Published: (2023)
by: Jana, Prithwish, et al.
Published: (2023)
Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries
by: Kurtz, Andrew, et al.
Published: (2026)
by: Kurtz, Andrew, et al.
Published: (2026)
LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops
by: Ravi, Ravin, et al.
Published: (2026)
by: Ravi, Ravin, et al.
Published: (2026)
Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories
by: Bercovich, Ivan, et al.
Published: (2026)
by: Bercovich, Ivan, et al.
Published: (2026)
Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting
by: Ortigoso, Ana Rita, et al.
Published: (2025)
by: Ortigoso, Ana Rita, et al.
Published: (2025)
Robust Agent Compensation (RAC): Teaching AI Agents to Compensate
by: Perera, Srinath, et al.
Published: (2026)
by: Perera, Srinath, et al.
Published: (2026)
A Grounded Memory System For Smart Personal Assistants
by: Ocker, Felix, et al.
Published: (2025)
by: Ocker, Felix, et al.
Published: (2025)
Addressing Data Leakage in HumanEval Using Combinatorial Test Design
by: Bradbury, Jeremy S., et al.
Published: (2024)
by: Bradbury, Jeremy S., et al.
Published: (2024)
Experience with GitHub Copilot for Developer Productivity at Zoominfo
by: Bakal, Gal, et al.
Published: (2025)
by: Bakal, Gal, et al.
Published: (2025)
Similar Items
-
Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications
by: Rehan, Tzafrir
Published: (2026) -
AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
by: Bhardwaj, Varun Pratap
Published: (2026) -
Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation
by: Downing, Mara, et al.
Published: (2025) -
Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces
by: Vispute, Neelmani, et al.
Published: (2026) -
Tracking the Behavioral Trajectories of Adapting Agents
by: Leshin, Jonah, et al.
Published: (2026)