Saved in:
| Main Author: | Kovács, Ádám |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.04979 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
by: Phan, Huy Nhat, et al.
Published: (2024)
by: Phan, Huy Nhat, et al.
Published: (2024)
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
by: Ni, Ziyi, et al.
Published: (2025)
by: Ni, Ziyi, et al.
Published: (2025)
Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks
by: Li, Yuangang, et al.
Published: (2026)
by: Li, Yuangang, et al.
Published: (2026)
ToolFuzz -- Automated Agent Tool Testing
by: Milev, Ivan, et al.
Published: (2025)
by: Milev, Ivan, et al.
Published: (2025)
Output Format Biases in the Evaluation of Large Language Models for Code Translation
by: Macedo, Marcos, et al.
Published: (2024)
by: Macedo, Marcos, et al.
Published: (2024)
Beyond Isolated Tasks: A Framework for Evaluating Coding Agents on Sequential Software Evolution
by: Shastry, KN Ajay, et al.
Published: (2026)
by: Shastry, KN Ajay, et al.
Published: (2026)
Architecture Without Architects: How AI Coding Agents Shape Software Architecture
by: Konrad, Phongsakon Mark, et al.
Published: (2026)
by: Konrad, Phongsakon Mark, et al.
Published: (2026)
User Centric Evaluation of Code Generation Tools
by: Miah, Tanha, et al.
Published: (2024)
by: Miah, Tanha, et al.
Published: (2024)
RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution
by: Liu, Aofan, et al.
Published: (2025)
by: Liu, Aofan, et al.
Published: (2025)
CodeWatcher: IDE Telemetry Data Extraction Tool for Understanding Coding Interactions with LLMs
by: Basha, Manaal, et al.
Published: (2025)
by: Basha, Manaal, et al.
Published: (2025)
CodeR: Issue Resolving with Multi-Agent and Task Graphs
by: Chen, Dong, et al.
Published: (2024)
by: Chen, Dong, et al.
Published: (2024)
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
by: Li, Dawei, et al.
Published: (2026)
by: Li, Dawei, et al.
Published: (2026)
Less is More: Towards Green Code Large Language Models via Unified Structural Pruning
by: Yang, Guang, et al.
Published: (2024)
by: Yang, Guang, et al.
Published: (2024)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
by: Guo, Chengquan, et al.
Published: (2024)
by: Guo, Chengquan, et al.
Published: (2024)
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models
by: Tambon, Florian, et al.
Published: (2024)
by: Tambon, Florian, et al.
Published: (2024)
EfficientUICoder: Efficient MLLM-based UI Code Generation via Input and Output Token Compression
by: Xiao, Jingyu, et al.
Published: (2025)
by: Xiao, Jingyu, et al.
Published: (2025)
CodeGenLink: A Tool to Find the Likely Origin and License of Automatically Generated Code
by: Bifolco, Daniele, et al.
Published: (2025)
by: Bifolco, Daniele, et al.
Published: (2025)
Code Review Agent Benchmark
by: Zhang, Yuntong, et al.
Published: (2026)
by: Zhang, Yuntong, et al.
Published: (2026)
Applying an Agentic Coding Tool for Improving Published Algorithm Implementations
by: Suwannik, Worasait
Published: (2026)
by: Suwannik, Worasait
Published: (2026)
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
by: Chen, Aili, et al.
Published: (2026)
by: Chen, Aili, et al.
Published: (2026)
GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks
by: Hou, Shuyang, et al.
Published: (2024)
by: Hou, Shuyang, et al.
Published: (2024)
Runtime-Structured Task Decomposition for Agentic Coding Systems
by: Asthana, Shubhi, et al.
Published: (2026)
by: Asthana, Shubhi, et al.
Published: (2026)
Task Abstention for Large Language Models in Code Generation
by: Zhou, Yanke, et al.
Published: (2026)
by: Zhou, Yanke, et al.
Published: (2026)
Do Generative AI Tools Ensure Green Code? An Investigative Study
by: Sikand, Samarth, et al.
Published: (2025)
by: Sikand, Samarth, et al.
Published: (2025)
Ambiguity Resolution with Human Feedback for Code Writing Tasks
by: Nandan, Aditey, et al.
Published: (2025)
by: Nandan, Aditey, et al.
Published: (2025)
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
by: Wang, Ruiqi, et al.
Published: (2025)
by: Wang, Ruiqi, et al.
Published: (2025)
Automated Benchmark Generation for Repository-Level Coding Tasks
by: Vergopoulos, Konstantinos, et al.
Published: (2025)
by: Vergopoulos, Konstantinos, et al.
Published: (2025)
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
by: Orlanski, Gabriel, et al.
Published: (2026)
by: Orlanski, Gabriel, et al.
Published: (2026)
Workflows vs Agents for Code Translation
by: Gray, Henry, et al.
Published: (2025)
by: Gray, Henry, et al.
Published: (2025)
Theory of Code Space: Do Code Agents Understand Software Architecture?
by: Sapunov, Grigory
Published: (2026)
by: Sapunov, Grigory
Published: (2026)
Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics
by: Borg, Markus, et al.
Published: (2026)
by: Borg, Markus, et al.
Published: (2026)
SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion
by: Ma, George, et al.
Published: (2025)
by: Ma, George, et al.
Published: (2025)
ASA: Training-Free Representation Engineering for Tool-Calling Agents
by: Wang, Youjin, et al.
Published: (2026)
by: Wang, Youjin, et al.
Published: (2026)
MCP-Zero: Active Tool Discovery for Autonomous LLM Agents
by: Fei, Xiang, et al.
Published: (2025)
by: Fei, Xiang, et al.
Published: (2025)
Analyzing Message-Code Inconsistency in AI Coding Agent-Authored Pull Requests
by: Gong, Jingzhi, et al.
Published: (2026)
by: Gong, Jingzhi, et al.
Published: (2026)
Schema First Tool APIs for LLM Agents: A Controlled Study of Tool Misuse, Recovery, and Budgeted Performance
by: Sigdel, Akshey, et al.
Published: (2026)
by: Sigdel, Akshey, et al.
Published: (2026)
Code Researcher: Deep Research Agent for Large Systems Code and Commit History
by: Singh, Ramneet, et al.
Published: (2025)
by: Singh, Ramneet, et al.
Published: (2025)
Scaling Coding Agents via Atomic Skills
by: Ma, Yingwei, et al.
Published: (2026)
by: Ma, Yingwei, et al.
Published: (2026)
DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production
by: Liang, Xiaoyun, et al.
Published: (2024)
by: Liang, Xiaoyun, et al.
Published: (2024)
Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code
by: Kim, Myeongsoo, et al.
Published: (2026)
by: Kim, Myeongsoo, et al.
Published: (2026)
Similar Items
-
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
by: Phan, Huy Nhat, et al.
Published: (2024) -
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
by: Ni, Ziyi, et al.
Published: (2025) -
Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks
by: Li, Yuangang, et al.
Published: (2026) -
ToolFuzz -- Automated Agent Tool Testing
by: Milev, Ivan, et al.
Published: (2025) -
Output Format Biases in the Evaluation of Large Language Models for Code Translation
by: Macedo, Marcos, et al.
Published: (2024)