:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ouyang, Yipeng, Huang, Xin, Liu, Bingjie, Zheng, Zhongchun, Gu, Yuhao, Zhang, Xianwei
Format:	Preprint
Published:	2026
Subjects:	Software Engineering Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.27492
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Runtime-Structured Task Decomposition for Agentic Coding Systems
by: Asthana, Shubhi, et al.
Published: (2026)

Towards Agentic Runtime Healing
by: Sun, Zhensu, et al.
Published: (2024)

VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations
by: Zheng, Zhongchun, et al.
Published: (2025)

1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
by: Xu, Qiao, et al.
Published: (2026)

Themisto: Jupyter-Based Runtime Benchmark
by: Grotov, Konstantin, et al.
Published: (2025)

RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Management
by: Zhang, Lingzhe, et al.
Published: (2026)

ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse
by: Liu, Simiao, et al.
Published: (2026)

CaveAgent: Transforming LLMs into Stateful Runtime Operators
by: Ran, Maohao, et al.
Published: (2026)

Optimizing Code Runtime Performance through Context-Aware Retrieval-Augmented Generation
by: Acharya, Manish, et al.
Published: (2025)

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
by: Srinivasan, Vasundra
Published: (2026)

FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
by: Zhou, Qixing, et al.
Published: (2026)

ToolMisuseBench: An Offline Deterministic Benchmark for Tool Misuse and Recovery in Agentic Systems
by: Sigdel, Akshey, et al.
Published: (2026)

Pragmos: A Process Agentic Modeling System
by: Hernández-Ávalos, Pedro-Aarón, et al.
Published: (2026)

MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems
by: Wang, Renzhi, et al.
Published: (2024)

Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems
by: Zhang, Weihao, et al.
Published: (2026)

SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces
by: Xu, Duling, et al.
Published: (2026)

ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
by: Yang, Jie, et al.
Published: (2026)

From Prompt to Product: A Human-Centered Benchmark of Agentic App Generation Systems
by: Ortiz, Marcos, et al.
Published: (2025)

Agentic Harness for Real-World Compilers
by: Zheng, Yingwei, et al.
Published: (2026)

A Method for the Runtime Validation of AI-based Environment Perception in Automated Driving System
by: Aslam, Iqra, et al.
Published: (2024)

AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents
by: Zhong, Hailin, et al.
Published: (2026)

AgentGuard: Runtime Verification of AI Agents
by: Koohestani, Roham
Published: (2025)

Agentic Business Process Management Systems
by: Dumas, Marlon, et al.
Published: (2026)

World of Workflows: A Benchmark for Bringing World Models to Enterprise Systems
by: Gupta, Lakshya, et al.
Published: (2026)

GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git
by: Lindenbauer, Tobias, et al.
Published: (2025)

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
by: Xu, Jingxuan, et al.
Published: (2025)

AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
by: Su, Jianhao, et al.
Published: (2026)

DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder
by: Zhang, Jiaran, et al.
Published: (2026)

REDO: Execution-Free Runtime Error Detection for COding Agents
by: Li, Shou, et al.
Published: (2024)

ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety
by: Wang, Haoyu, et al.
Published: (2025)

From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level
by: Li, Jia, et al.
Published: (2026)

Agentic Software Issue Resolution with Large Language Models: A Survey
by: Jiang, Zhonghao, et al.
Published: (2025)

Quantifying the Expectation-Realisation Gap for Agentic AI Systems
by: Lobentanzer, Sebastian
Published: (2026)

DeepCode: Open Agentic Coding
by: Li, Zongwei, et al.
Published: (2025)

Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code
by: Rajput, Prateek, et al.
Published: (2026)

Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?
by: Garg, Spandan, et al.
Published: (2026)

LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities
by: Tang, Yongjian, et al.
Published: (2026)

Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems
by: Ferrando, Angelo
Published: (2025)

RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
by: Liang, Linxi, et al.
Published: (2025)

GenAI for Simulation Model in Model-Based Systems Engineering
by: Zhang, Lin, et al.
Published: (2025)