Saved in:
| Main Authors: | Ouyang, Yipeng, Huang, Xin, Liu, Bingjie, Zheng, Zhongchun, Gu, Yuhao, Zhang, Xianwei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.27492 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Runtime-Structured Task Decomposition for Agentic Coding Systems
by: Asthana, Shubhi, et al.
Published: (2026)
by: Asthana, Shubhi, et al.
Published: (2026)
Towards Agentic Runtime Healing
by: Sun, Zhensu, et al.
Published: (2024)
by: Sun, Zhensu, et al.
Published: (2024)
VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations
by: Zheng, Zhongchun, et al.
Published: (2025)
by: Zheng, Zhongchun, et al.
Published: (2025)
1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
by: Xu, Qiao, et al.
Published: (2026)
by: Xu, Qiao, et al.
Published: (2026)
Themisto: Jupyter-Based Runtime Benchmark
by: Grotov, Konstantin, et al.
Published: (2025)
by: Grotov, Konstantin, et al.
Published: (2025)
RuntimeSlicer: Towards Generalizable Unified Runtime State Representation for Failure Management
by: Zhang, Lingzhe, et al.
Published: (2026)
by: Zhang, Lingzhe, et al.
Published: (2026)
ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse
by: Liu, Simiao, et al.
Published: (2026)
by: Liu, Simiao, et al.
Published: (2026)
CaveAgent: Transforming LLMs into Stateful Runtime Operators
by: Ran, Maohao, et al.
Published: (2026)
by: Ran, Maohao, et al.
Published: (2026)
Optimizing Code Runtime Performance through Context-Aware Retrieval-Augmented Generation
by: Acharya, Manish, et al.
Published: (2025)
by: Acharya, Manish, et al.
Published: (2025)
A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents
by: Srinivasan, Vasundra
Published: (2026)
by: Srinivasan, Vasundra
Published: (2026)
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development
by: Zhou, Qixing, et al.
Published: (2026)
by: Zhou, Qixing, et al.
Published: (2026)
ToolMisuseBench: An Offline Deterministic Benchmark for Tool Misuse and Recovery in Agentic Systems
by: Sigdel, Akshey, et al.
Published: (2026)
by: Sigdel, Akshey, et al.
Published: (2026)
Pragmos: A Process Agentic Modeling System
by: Hernández-Ávalos, Pedro-Aarón, et al.
Published: (2026)
by: Hernández-Ávalos, Pedro-Aarón, et al.
Published: (2026)
MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems
by: Wang, Renzhi, et al.
Published: (2024)
by: Wang, Renzhi, et al.
Published: (2024)
Loosely-Structured Software: Engineering Context, Structure, and Evolution Entropy in Runtime-Rewired Multi-Agent Systems
by: Zhang, Weihao, et al.
Published: (2026)
by: Zhang, Weihao, et al.
Published: (2026)
SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces
by: Xu, Duling, et al.
Published: (2026)
by: Xu, Duling, et al.
Published: (2026)
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
by: Yang, Jie, et al.
Published: (2026)
by: Yang, Jie, et al.
Published: (2026)
From Prompt to Product: A Human-Centered Benchmark of Agentic App Generation Systems
by: Ortiz, Marcos, et al.
Published: (2025)
by: Ortiz, Marcos, et al.
Published: (2025)
Agentic Harness for Real-World Compilers
by: Zheng, Yingwei, et al.
Published: (2026)
by: Zheng, Yingwei, et al.
Published: (2026)
A Method for the Runtime Validation of AI-based Environment Perception in Automated Driving System
by: Aslam, Iqra, et al.
Published: (2024)
by: Aslam, Iqra, et al.
Published: (2024)
AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents
by: Zhong, Hailin, et al.
Published: (2026)
by: Zhong, Hailin, et al.
Published: (2026)
AgentGuard: Runtime Verification of AI Agents
by: Koohestani, Roham
Published: (2025)
by: Koohestani, Roham
Published: (2025)
Agentic Business Process Management Systems
by: Dumas, Marlon, et al.
Published: (2026)
by: Dumas, Marlon, et al.
Published: (2026)
World of Workflows: A Benchmark for Bringing World Models to Enterprise Systems
by: Gupta, Lakshya, et al.
Published: (2026)
by: Gupta, Lakshya, et al.
Published: (2026)
GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git
by: Lindenbauer, Tobias, et al.
Published: (2025)
by: Lindenbauer, Tobias, et al.
Published: (2025)
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
by: Xu, Jingxuan, et al.
Published: (2025)
by: Xu, Jingxuan, et al.
Published: (2025)
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
by: Su, Jianhao, et al.
Published: (2026)
by: Su, Jianhao, et al.
Published: (2026)
DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder
by: Zhang, Jiaran, et al.
Published: (2026)
by: Zhang, Jiaran, et al.
Published: (2026)
REDO: Execution-Free Runtime Error Detection for COding Agents
by: Li, Shou, et al.
Published: (2024)
by: Li, Shou, et al.
Published: (2024)
ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level
by: Li, Jia, et al.
Published: (2026)
by: Li, Jia, et al.
Published: (2026)
Agentic Software Issue Resolution with Large Language Models: A Survey
by: Jiang, Zhonghao, et al.
Published: (2025)
by: Jiang, Zhonghao, et al.
Published: (2025)
Quantifying the Expectation-Realisation Gap for Agentic AI Systems
by: Lobentanzer, Sebastian
Published: (2026)
by: Lobentanzer, Sebastian
Published: (2026)
DeepCode: Open Agentic Coding
by: Li, Zongwei, et al.
Published: (2025)
by: Li, Zongwei, et al.
Published: (2025)
Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code
by: Rajput, Prateek, et al.
Published: (2026)
by: Rajput, Prateek, et al.
Published: (2026)
Terminus-4B: Can a Smaller Model Replace Frontier LLMs at Agentic Execution Tasks?
by: Garg, Spandan, et al.
Published: (2026)
by: Garg, Spandan, et al.
Published: (2026)
LLM-Based Agentic Systems for Software Engineering: Challenges and Opportunities
by: Tang, Yongjian, et al.
Published: (2026)
by: Tang, Yongjian, et al.
Published: (2026)
Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems
by: Ferrando, Angelo
Published: (2025)
by: Ferrando, Angelo
Published: (2025)
RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation
by: Liang, Linxi, et al.
Published: (2025)
by: Liang, Linxi, et al.
Published: (2025)
GenAI for Simulation Model in Model-Based Systems Engineering
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
Similar Items
-
Runtime-Structured Task Decomposition for Agentic Coding Systems
by: Asthana, Shubhi, et al.
Published: (2026) -
Towards Agentic Runtime Healing
by: Sun, Zhensu, et al.
Published: (2024) -
VecTrans: Enhancing Compiler Auto-Vectorization through LLM-Assisted Code Transformations
by: Zheng, Zhongchun, et al.
Published: (2025) -
1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
by: Xu, Qiao, et al.
Published: (2026) -
Themisto: Jupyter-Based Runtime Benchmark
by: Grotov, Konstantin, et al.
Published: (2025)