Saved in:
| Main Authors: | Ni, Ziyi, Li, Yifan, Yang, Ning, Shen, Dou, Lv, Pin, Dong, Daxiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.15305 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tree-of-Code: A Hybrid Approach for Robust Complex Task Planning and Execution
by: Ni, Ziyi, et al.
Published: (2024)
by: Ni, Ziyi, et al.
Published: (2024)
ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development
by: Lu, Pengrui, et al.
Published: (2026)
by: Lu, Pengrui, et al.
Published: (2026)
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
by: Ni, Ziyi, et al.
Published: (2025)
by: Ni, Ziyi, et al.
Published: (2025)
RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution
by: Liu, Aofan, et al.
Published: (2025)
by: Liu, Aofan, et al.
Published: (2025)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
by: Guo, Chengquan, et al.
Published: (2024)
by: Guo, Chengquan, et al.
Published: (2024)
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
by: Zheng, Tianyu, et al.
Published: (2024)
by: Zheng, Tianyu, et al.
Published: (2024)
Toward Executable Repository-Level Code Generation via Environment Alignment
by: Pan, Ruwei, et al.
Published: (2026)
by: Pan, Ruwei, et al.
Published: (2026)
Argus: Resilience-Oriented Safety Assurance Framework for End-to-End ADSs
by: Wang, Dingji, et al.
Published: (2025)
by: Wang, Dingji, et al.
Published: (2025)
CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects
by: Guo, Hanyang, et al.
Published: (2025)
by: Guo, Hanyang, et al.
Published: (2025)
Executing as You Generate: Hiding Execution Latency in LLM Code Generation
by: Sun, Zhensu, et al.
Published: (2026)
by: Sun, Zhensu, et al.
Published: (2026)
Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation
by: Khan, M Zafir Sadik, et al.
Published: (2026)
by: Khan, M Zafir Sadik, et al.
Published: (2026)
GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks
by: Hou, Shuyang, et al.
Published: (2024)
by: Hou, Shuyang, et al.
Published: (2024)
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
by: Ni, Yuansheng, et al.
Published: (2025)
by: Ni, Yuansheng, et al.
Published: (2025)
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
by: Jiang, Xue, et al.
Published: (2025)
by: Jiang, Xue, et al.
Published: (2025)
How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python?
by: Gong, Jianian, et al.
Published: (2024)
by: Gong, Jianian, et al.
Published: (2024)
VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents
by: Guo, JunJia, et al.
Published: (2026)
by: Guo, JunJia, et al.
Published: (2026)
Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages
by: Wu, Fan, et al.
Published: (2026)
by: Wu, Fan, et al.
Published: (2026)
CodeScore: Evaluating Code Generation by Learning Code Execution
by: Dong, Yihong, et al.
Published: (2023)
by: Dong, Yihong, et al.
Published: (2023)
SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks
by: Zeng, Yucheng, et al.
Published: (2026)
by: Zeng, Yucheng, et al.
Published: (2026)
Integrating Symbolic Execution into the Fine-Tuning of Code-Generating LLMs
by: Sakharova, Marina, et al.
Published: (2025)
by: Sakharova, Marina, et al.
Published: (2025)
FasterPy: An LLM-based Code Execution Efficiency Optimization Framework
by: Wu, Yue, et al.
Published: (2025)
by: Wu, Yue, et al.
Published: (2025)
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks
by: Chervyakov, Artem, et al.
Published: (2025)
by: Chervyakov, Artem, et al.
Published: (2025)
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation
by: Huang, Dong, et al.
Published: (2023)
by: Huang, Dong, et al.
Published: (2023)
Do Machines and Humans Focus on Similar Code? Exploring Explainability of Large Language Models in Code Summarization
by: Li, Jiliang, et al.
Published: (2024)
by: Li, Jiliang, et al.
Published: (2024)
E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task
by: Liu, Jingyao, et al.
Published: (2025)
by: Liu, Jingyao, et al.
Published: (2025)
Learning to Align Human Code Preferences
by: Yin, Xin, et al.
Published: (2025)
by: Yin, Xin, et al.
Published: (2025)
Runtime-Structured Task Decomposition for Agentic Coding Systems
by: Asthana, Shubhi, et al.
Published: (2026)
by: Asthana, Shubhi, et al.
Published: (2026)
Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development
by: Tran, Hung, et al.
Published: (2026)
by: Tran, Hung, et al.
Published: (2026)
Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code
by: Liu, Fang, et al.
Published: (2024)
by: Liu, Fang, et al.
Published: (2024)
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving
by: Wang, Huacan, et al.
Published: (2025)
by: Wang, Huacan, et al.
Published: (2025)
Selection of Prompt Engineering Techniques for Code Generation through Predicting Code Complexity
by: Wang, Chung-Yu, et al.
Published: (2024)
by: Wang, Chung-Yu, et al.
Published: (2024)
CCISolver: End-to-End Detection and Repair of Method-Level Code-Comment Inconsistency
by: Zhong, Renyi, et al.
Published: (2025)
by: Zhong, Renyi, et al.
Published: (2025)
GenCode: A Generic Data Augmentation Framework for Boosting Deep Learning-Based Code Understanding
by: Dong, Zeming, et al.
Published: (2024)
by: Dong, Zeming, et al.
Published: (2024)
An Execution-Verified Multi-Language Benchmark for Code Semantic Reasoning
by: Li, Yikun, et al.
Published: (2026)
by: Li, Yikun, et al.
Published: (2026)
Learning Adaptive Parallel Execution for Efficient Code Localization
by: Xu, Ke, et al.
Published: (2026)
by: Xu, Ke, et al.
Published: (2026)
Automated Benchmark Generation for Repository-Level Coding Tasks
by: Vergopoulos, Konstantinos, et al.
Published: (2025)
by: Vergopoulos, Konstantinos, et al.
Published: (2025)
Task Abstention for Large Language Models in Code Generation
by: Zhou, Yanke, et al.
Published: (2026)
by: Zhou, Yanke, et al.
Published: (2026)
EvoDev: An Iterative Feature-Driven Framework for End-to-End Software Development with LLM-based Agents
by: Liu, Junwei, et al.
Published: (2025)
by: Liu, Junwei, et al.
Published: (2025)
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation
by: Yang, Zhen, et al.
Published: (2024)
by: Yang, Zhen, et al.
Published: (2024)
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
by: Zhuo, Terry Yue, et al.
Published: (2025)
by: Zhuo, Terry Yue, et al.
Published: (2025)
Similar Items
-
Tree-of-Code: A Hybrid Approach for Robust Complex Task Planning and Execution
by: Ni, Ziyi, et al.
Published: (2024) -
ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development
by: Lu, Pengrui, et al.
Published: (2026) -
GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging
by: Ni, Ziyi, et al.
Published: (2025) -
RA-Gen: A Controllable Code Generation Framework Using ReAct for Multi-Agent Task Execution
by: Liu, Aofan, et al.
Published: (2025) -
RedCode: Risky Code Execution and Generation Benchmark for Code Agents
by: Guo, Chengquan, et al.
Published: (2024)