Saved in:
| Main Author: | Joshi, Arun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05941 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)
by: Hu, Yuelin, et al.
Published: (2026)
CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
by: Savenkov, Vladislav
Published: (2026)
by: Savenkov, Vladislav
Published: (2026)
ContractBench: Can LLM Agents Preserve Observation Contracts?
by: Wang, Jicheng, et al.
Published: (2026)
by: Wang, Jicheng, et al.
Published: (2026)
SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
by: Chen, Mu-Chi, et al.
Published: (2026)
by: Chen, Mu-Chi, et al.
Published: (2026)
CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)
by: Li, Han, et al.
Published: (2026)
NormCode Canvas: Making LLM Agentic Workflows Development Sustainable via Case-Based Reasoning
by: Guan, Xin, et al.
Published: (2026)
by: Guan, Xin, et al.
Published: (2026)
Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach
by: Nguyen, Quang-Dung, et al.
Published: (2025)
by: Nguyen, Quang-Dung, et al.
Published: (2025)
Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions
by: Ma, Jianan, et al.
Published: (2026)
by: Ma, Jianan, et al.
Published: (2026)
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
by: Hundal, Rajdeep Singh, et al.
Published: (2025)
by: Hundal, Rajdeep Singh, et al.
Published: (2025)
Towards Explainable Test Case Prioritisation with Learning-to-Rank Models
by: Ramírez, Aurora, et al.
Published: (2024)
by: Ramírez, Aurora, et al.
Published: (2024)
Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning
by: Puccioni, Laura, et al.
Published: (2025)
by: Puccioni, Laura, et al.
Published: (2025)
Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent
by: Xia, Bowei, et al.
Published: (2026)
by: Xia, Bowei, et al.
Published: (2026)
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
by: Mazaheri, Parsa, et al.
Published: (2026)
by: Mazaheri, Parsa, et al.
Published: (2026)
AgentPulse: A Continuous Multi-Signal Framework for Evaluating AI Agents in Deployment
by: Gao, Yuxuan, et al.
Published: (2026)
by: Gao, Yuxuan, et al.
Published: (2026)
The Single-File Test: A Longitudinal Public-Interface Evaluation of First-Output LLM Web Generation with Social Reach Tracking
by: Palacios, Diego Cabezas
Published: (2026)
by: Palacios, Diego Cabezas
Published: (2026)
AIRA: AI-Induced Risk Audit: A Structured Inspection Framework for AI-Generated Code
by: Parris, William M.
Published: (2026)
by: Parris, William M.
Published: (2026)
EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization
by: Zhang, Jiahao, et al.
Published: (2026)
by: Zhang, Jiahao, et al.
Published: (2026)
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization
by: Tanjim, Md Mehrab, et al.
Published: (2026)
by: Tanjim, Md Mehrab, et al.
Published: (2026)
Nurture-First Agent Development: Building Domain-Expert AI Agents Through Conversational Knowledge Crystallization
by: Zhang, Linghao
Published: (2026)
by: Zhang, Linghao
Published: (2026)
Emergent Formal Verification: How an Autonomous AI Ecosystem Independently Discovered SMT-Based Safety Across Six Domains
by: Untila, Octavian
Published: (2026)
by: Untila, Octavian
Published: (2026)
Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents
by: Akarlar, Gokturk Aytug
Published: (2025)
by: Akarlar, Gokturk Aytug
Published: (2025)
MFH: A Multi-faceted Heuristic Algorithm Selection Approach for Software Verification
by: Su, Jie, et al.
Published: (2025)
by: Su, Jie, et al.
Published: (2025)
Context Engineering for Multi-Agent LLM Code Assistants Using Elicit, NotebookLM, ChatGPT, and Claude Code
by: Haseeb, Muhammad
Published: (2025)
by: Haseeb, Muhammad
Published: (2025)
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
by: Jahan, Sigma, et al.
Published: (2026)
by: Jahan, Sigma, et al.
Published: (2026)
When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges
by: Darshan, Parth, et al.
Published: (2026)
by: Darshan, Parth, et al.
Published: (2026)
Dual-Process Scaffold Reasoning for Enhancing LLM Code Debugging
by: Hsieh, Po-Chung, et al.
Published: (2025)
by: Hsieh, Po-Chung, et al.
Published: (2025)
From Domain Understanding to Design Readiness: a playbook for GenAI-supported learning in Software Engineering
by: Wlodarski, Rafal
Published: (2026)
by: Wlodarski, Rafal
Published: (2026)
AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolve
by: Ananda, Chaitanya Mamatha, et al.
Published: (2026)
by: Ananda, Chaitanya Mamatha, et al.
Published: (2026)
Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics
by: Bhusal, Jatin, et al.
Published: (2026)
by: Bhusal, Jatin, et al.
Published: (2026)
A Self-Improving Architecture for Dynamic Safety in Large Language Models
by: Slater, Tyler
Published: (2025)
by: Slater, Tyler
Published: (2025)
AI-Assisted Engineering Should Track the Epistemic Status and Temporal Validity of Architectural Decisions
by: Gilda, Sankalp, et al.
Published: (2026)
by: Gilda, Sankalp, et al.
Published: (2026)
A measurement substrate for agentic Kubernetes operations: Methodology and a case study in retrieval-compounding falsification
by: Odmark, Joshua, et al.
Published: (2026)
by: Odmark, Joshua, et al.
Published: (2026)
Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution
by: Khatchadourian, Raffi, et al.
Published: (2025)
by: Khatchadourian, Raffi, et al.
Published: (2025)
Rethinking LLM-Based RTL Code Optimization Via Timing Logic Metamorphosis
by: Xu, Zhihao, et al.
Published: (2025)
by: Xu, Zhihao, et al.
Published: (2025)
TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback
by: Jana, Prithwish, et al.
Published: (2026)
by: Jana, Prithwish, et al.
Published: (2026)
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
by: Ogenrwot, Daniel, et al.
Published: (2026)
by: Ogenrwot, Daniel, et al.
Published: (2026)
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
by: Agrawal, Lakshya A, et al.
Published: (2025)
by: Agrawal, Lakshya A, et al.
Published: (2025)
Reinforcement Learning for Dynamic Workflow Optimization in CI/CD Pipelines
by: Soni, Aniket Abhishek, et al.
Published: (2026)
by: Soni, Aniket Abhishek, et al.
Published: (2026)
Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation
by: Hasan, Md Toufique, et al.
Published: (2025)
by: Hasan, Md Toufique, et al.
Published: (2025)
Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
by: Oskooei, Amirkia Rafiei, et al.
Published: (2025)
Similar Items
-
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026) -
CIDR: A Large-Scale Industrial Source Code Dataset for Software Engineering Research
by: Savenkov, Vladislav
Published: (2026) -
ContractBench: Can LLM Agents Preserve Observation Contracts?
by: Wang, Jicheng, et al.
Published: (2026) -
SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation
by: Chen, Mu-Chi, et al.
Published: (2026) -
CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)