:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Joshi, Arun
Format:	Preprint
Published:	2026
Subjects:	Software Engineering Artificial Intelligence I.2.6
Online Access:	https://arxiv.org/abs/2603.05941
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)

CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
by: Savenkov, Vladislav
Published: (2026)

ContractBench: Can LLM Agents Preserve Observation Contracts?
by: Wang, Jicheng, et al.
Published: (2026)

SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
by: Chen, Mu-Chi, et al.
Published: (2026)

CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)

NormCode Canvas: Making LLM Agentic Workflows Development Sustainable via Case-Based Reasoning
by: Guan, Xin, et al.
Published: (2026)

Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach
by: Nguyen, Quang-Dung, et al.
Published: (2025)

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions
by: Ma, Jianan, et al.
Published: (2026)

On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
by: Hundal, Rajdeep Singh, et al.
Published: (2025)

Towards Explainable Test Case Prioritisation with Learning-to-Rank Models
by: Ramírez, Aurora, et al.
Published: (2024)

Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning
by: Puccioni, Laura, et al.
Published: (2025)

Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent
by: Xia, Bowei, et al.
Published: (2026)

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
by: Mazaheri, Parsa, et al.
Published: (2026)

AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment
by: Gao, Yuxuan, et al.
Published: (2026)

The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
by: Palacios, Diego Cabezas
Published: (2026)

AIRA: AI-Induced Risk Audit: A Structured Inspection Framework for AI-Generated Code
by: Parris, William M.
Published: (2026)

EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization
by: Zhang, Jiahao, et al.
Published: (2026)

MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
by: Tanjim, Md Mehrab, et al.
Published: (2026)

Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
by: Zhang, Linghao
Published: (2026)

Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains
by: Untila, Octavian
Published: (2026)

Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents
by: Akarlar, Gokturk Aytug
Published: (2025)

MFH: A Multi-faceted Heuristic Algorithm Selection Approach for Software Verification
by: Su, Jie, et al.
Published: (2025)

Context Engineering for Multi-Agent LLM Code Assistants Using Elicit, NotebookLM, ChatGPT, and Claude Code
by: Haseeb, Muhammad
Published: (2025)

DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
by: Jahan, Sigma, et al.
Published: (2026)

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges
by: Darshan, Parth, et al.
Published: (2026)

Dual-Process Scaffold Reasoning for Enhancing LLM Code Debugging
by: Hsieh, Po-Chung, et al.
Published: (2025)

From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software Engineering
by: Wlodarski, Rafal
Published: (2026)

AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve
by: Ananda, Chaitanya Mamatha, et al.
Published: (2026)

Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics
by: Bhusal, Jatin, et al.
Published: (2026)

A Self-Improving Architecture for Dynamic Safety in Large Language Models
by: Slater, Tyler
Published: (2025)

AI-Assisted Engineering Should Track the Epistemic Status and Temporal Validity of Architectural Decisions
by: Gilda, Sankalp, et al.
Published: (2026)

A measurement substrate for agentic Kubernetes operations: Methodology and a case study in retrieval-compounding falsification
by: Odmark, Joshua, et al.
Published: (2026)

Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution
by: Khatchadourian, Raffi, et al.
Published: (2025)

Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis
by: Xu, Zhihao, et al.
Published: (2025)

TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
by: Jana, Prithwish, et al.
Published: (2026)

How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
by: Ogenrwot, Daniel, et al.
Published: (2026)

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
by: Agrawal, Lakshya A, et al.
Published: (2025)

Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines
by: Soni, Aniket Abhishek, et al.
Published: (2026)

Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation
by: Hasan, Md Toufique, et al.
Published: (2025)

Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)