:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wang, Hao, Li, Rui, Sha, Lei, Zhang, Jie M.
Format:	Preprint
Publié:	2026
Sujets:	Software Engineering Computation and Language
Accès en ligne:	https://arxiv.org/abs/2605.11922
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
par: Jiang, Xue, et autres
Publié: (2025)

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
par: Tian, Yuchen, et autres
Publié: (2024)

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
par: Dou, Shihan, et autres
Publié: (2024)

NExT: Teaching Large Language Models to Reason about Code Execution
par: Ni, Ansong, et autres
Publié: (2024)

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
par: Yu, Zhuohao, et autres
Publié: (2024)

ReCode: Reinforcing Code Generation with Reasoning-Process Rewards
par: Fan, Lishui, et autres
Publié: (2025)

Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
par: Wang, Junqiao, et autres
Publié: (2024)

DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
par: Li, Jia, et autres
Publié: (2024)

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
par: Zheng, Tianyu, et autres
Publié: (2024)

PurpCode: Reasoning for Safer Code Generation
par: Liu, Jiawei, et autres
Publié: (2025)

Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
par: Yang, Dayu, et autres
Publié: (2025)

EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
par: Li, Jia, et autres
Publié: (2024)

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
par: Xie, Yiqing, et autres
Publié: (2024)

GenX: Mastering Code and Test Generation with Execution Feedback
par: Wang, Nan, et autres
Publié: (2024)

ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning
par: Tang, Lingxiao, et autres
Publié: (2026)

CodeMind: Evaluating Large Language Models for Code Reasoning
par: Liu, Changshu, et autres
Publié: (2024)

Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
par: Zhao, Songwen, et autres
Publié: (2025)

SelfCodeAlign: Self-Alignment for Code Generation
par: Wei, Yuxiang, et autres
Publié: (2024)

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
par: Zhuo, Terry Yue, et autres
Publié: (2025)

CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation
par: Chen, Zaoyu, et autres
Publié: (2026)

Evaluating and Achieving Controllable Code Completion in Code LLM
par: Zhang, Jiajun, et autres
Publié: (2026)

CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
par: Wang, Sizhe, et autres
Publié: (2025)

CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents
par: Sutawika, Lintang, et autres
Publié: (2026)

Pragmatic Reasoning improves LLM Code Generation
par: Cao, Zhuchen, et autres
Publié: (2025)

CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing
par: Zhu, Mingzhi, et autres
Publié: (2026)

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
par: Yan, Weixiang, et autres
Publié: (2023)

CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks
par: Fu, Lingyue, et autres
Publié: (2025)

CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora
par: Li, Shangyu, et autres
Publié: (2026)

Python Symbolic Execution with LLM-powered Code Generation
par: Wang, Wenhan, et autres
Publié: (2024)

Agent-Diff: Benchmarking LLM Agents on Enterprise API Tasks via Code Execution with State-Diff-Based Evaluation
par: Pysklo, Hubert M., et autres
Publié: (2026)

CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning
par: Tang, Lingxiao, et autres
Publié: (2025)

Demystifying Errors in LLM Reasoning Traces: An Empirical Study of Code Execution Simulation
par: Abdollahi, Mohammad, et autres
Publié: (2025)

PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
par: Peng, Yun, et autres
Publié: (2024)

SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis
par: Li, Yansong, et autres
Publié: (2025)

DuET: Dual Execution for Test Output Prediction with Generated Code and Pseudocode
par: Han, Hojae, et autres
Publié: (2026)

MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
par: Wang, Zhengren, et autres
Publié: (2025)

EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization
par: Huang, Dong, et autres
Publié: (2024)

Measuring the Influence of Incorrect Code on Test Generation
par: Huang, Dong, et autres
Publié: (2024)

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
par: Yang, Cheng, et autres
Publié: (2024)

EffiSkill: Agent Skill Based Automated Code Efficiency Optimization
par: Wang, Zimu, et autres
Publié: (2026)