:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Leshin, Jonah, Shah, Manish, Timmis, Ian, Kang, Daniel
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence I.2.1; D.2.5
Online Access:	https://arxiv.org/abs/2603.19022
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications
by: Rehan, Tzafrir
Published: (2026)

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
by: Bhardwaj, Varun Pratap
Published: (2026)

Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation
by: Downing, Mara, et al.
Published: (2025)

Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces
by: Vispute, Neelmani, et al.
Published: (2026)

Tracking the Behavioral Trajectories of Adapting Agents
by: Leshin, Jonah, et al.
Published: (2026)

elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings
by: Osborne, Philip, et al.
Published: (2025)

Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)

Test Case Features as Hyper-heuristics for Inductive Programming
by: McDaid, Edward, et al.
Published: (2024)

Redacted
by: de Lima, Bruno Rucy Carneiro Alves, et al.
Published: (2023)

Instruction and Solution Probabilities as Heuristics for Inductive Programming
by: McDaid, Edward, et al.
Published: (2025)

LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB
by: Bekmyradov, Vekil, et al.
Published: (2026)

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
by: Lee, Christine, et al.
Published: (2025)

Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
by: Donadel, Denis, et al.
Published: (2024)

Demystifying the Silence of Correctness Bugs in PyTorch Compiler
by: Li, Meiziniu, et al.
Published: (2026)

Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries
by: Li, Meiziniu, et al.
Published: (2024)

COMET: Coverage-guided Model Generation For Deep Learning Library Testing
by: Li, Meiziniu, et al.
Published: (2022)

Semantic Modeling for World-Centered Architectures
by: Mantsivoda, Andrei, et al.
Published: (2026)

Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%
by: Dillon, Drew, et al.
Published: (2026)

Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
by: Cambronero, José, et al.
Published: (2025)

Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges
by: Muzsai, Lajos, et al.
Published: (2025)

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
by: Muzsai, Lajos, et al.
Published: (2024)

Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
by: Vaithilingam, Priyan, et al.
Published: (2025)

Automated structural testing of LLM-based agents: methods, framework, and case studies
by: Kohl, Jens, et al.
Published: (2026)

BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks
by: Gandhi, Shubham, et al.
Published: (2024)

A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines
by: Du, Gaoyuan, et al.
Published: (2026)

An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
by: More, Riddhi, et al.
Published: (2025)

Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
by: Khatchadourian, Raffi
Published: (2026)

Towards a Probabilistic Framework for Analyzing and Improving LLM-Enabled Software
by: Baldonado, Juan Manuel, et al.
Published: (2025)

NeuroLog: Reasoning You Can Audit -- Neuro-Symbolic Vulnerability Discovery via LLM Facts, Datalog, and SMT
by: Rawat, Sanjay
Published: (2026)

Federated Learning and AI Regulation in the European Union: Who is Responsible? -- An Interdisciplinary Analysis
by: Woisetschläger, Herbert, et al.
Published: (2024)

Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution
by: Sohail, Sarmad, et al.
Published: (2026)

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
by: Jana, Prithwish, et al.
Published: (2023)

Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries
by: Kurtz, Andrew, et al.
Published: (2026)

LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops
by: Ravi, Ravin, et al.
Published: (2026)

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories
by: Bercovich, Ivan, et al.
Published: (2026)

Project Riley: Multimodal Multi-Agent LLM Collaboration with Emotional Reasoning and Voting
by: Ortigoso, Ana Rita, et al.
Published: (2025)

Robust Agent Compensation (RAC): Teaching AI Agents to Compensate
by: Perera, Srinath, et al.
Published: (2026)

A Grounded Memory System For Smart Personal Assistants
by: Ocker, Felix, et al.
Published: (2025)

Addressing Data Leakage in HumanEval Using Combinatorial Test Design
by: Bradbury, Jeremy S., et al.
Published: (2024)

Experience with GitHub Copilot for Developer Productivity at Zoominfo
by: Bakal, Gal, et al.
Published: (2025)