Saved in:
| Main Authors: | Li, Yufei, Chen, Simin, Guo, Yanghong, Yang, Wei, Dong, Yue, Liu, Cong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.05939 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
by: Chen, Simin, et al.
Published: (2025)
by: Chen, Simin, et al.
Published: (2025)
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models
by: Chen, Simin, et al.
Published: (2024)
by: Chen, Simin, et al.
Published: (2024)
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
by: Chou, Jason, et al.
Published: (2025)
by: Chou, Jason, et al.
Published: (2025)
Mercury: A Code Efficiency Benchmark for Code Large Language Models
by: Du, Mingzhe, et al.
Published: (2024)
by: Du, Mingzhe, et al.
Published: (2024)
LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models
by: Feng, Xiaoning, et al.
Published: (2022)
by: Feng, Xiaoning, et al.
Published: (2022)
EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
by: Huang, Dong, et al.
Published: (2024)
by: Huang, Dong, et al.
Published: (2024)
Bridging Code Graphs and Large Language Models for Better Code Understanding
by: Chen, Zeqi, et al.
Published: (2025)
by: Chen, Zeqi, et al.
Published: (2025)
CodeMind: Evaluating Large Language Models for Code Reasoning
by: Liu, Changshu, et al.
Published: (2024)
by: Liu, Changshu, et al.
Published: (2024)
A Code Comprehension Benchmark for Large Language Models for Code
by: Havare, Jayant, et al.
Published: (2025)
by: Havare, Jayant, et al.
Published: (2025)
LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators
by: Liu, Yuchen, et al.
Published: (2025)
by: Liu, Yuchen, et al.
Published: (2025)
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
by: Guo, Jiawei, et al.
Published: (2024)
by: Guo, Jiawei, et al.
Published: (2024)
ICE-Score: Instructing Large Language Models to Evaluate Code
by: Zhuo, Terry Yue
Published: (2023)
by: Zhuo, Terry Yue
Published: (2023)
Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models
by: Guan, Batu, et al.
Published: (2025)
by: Guan, Batu, et al.
Published: (2025)
CODEMENV: Benchmarking Large Language Models on Code Migration
by: Cheng, Keyuan, et al.
Published: (2025)
by: Cheng, Keyuan, et al.
Published: (2025)
R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
by: Deng, Ken, et al.
Published: (2024)
by: Deng, Ken, et al.
Published: (2024)
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
by: Yang, Kang, et al.
Published: (2025)
by: Yang, Kang, et al.
Published: (2025)
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models
by: Lin, Hong Yi, et al.
Published: (2025)
by: Lin, Hong Yi, et al.
Published: (2025)
InstructCoder: Instruction Tuning Large Language Models for Code Editing
by: Li, Kaixin, et al.
Published: (2023)
by: Li, Kaixin, et al.
Published: (2023)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback
by: Wallraven, Stephan, et al.
Published: (2026)
by: Wallraven, Stephan, et al.
Published: (2026)
Strengthening Programming Comprehension in Large Language Models through Code Generation
by: Ren, Xiaoning, et al.
Published: (2025)
by: Ren, Xiaoning, et al.
Published: (2025)
Calibration of Large Language Models on Code Summarization
by: Virk, Yuvraj, et al.
Published: (2024)
by: Virk, Yuvraj, et al.
Published: (2024)
Leveraging Print Debugging to Improve Code Generation in Large Language Models
by: Hu, Xueyu, et al.
Published: (2024)
by: Hu, Xueyu, et al.
Published: (2024)
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
by: Huang, Yue, et al.
Published: (2023)
by: Huang, Yue, et al.
Published: (2023)
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models
by: Zheng, Jiasheng, et al.
Published: (2024)
by: Zheng, Jiasheng, et al.
Published: (2024)
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models
by: Zhuo, Terry Yue, et al.
Published: (2024)
by: Zhuo, Terry Yue, et al.
Published: (2024)
CodeMirage: Hallucinations in Code Generated by Large Language Models
by: Agarwal, Vibhor, et al.
Published: (2024)
by: Agarwal, Vibhor, et al.
Published: (2024)
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models
by: Huang, Yuheng, et al.
Published: (2023)
by: Huang, Yuheng, et al.
Published: (2023)
Self-Explained Keywords Empower Large Language Models for Code Generation
by: Fan, Lishui, et al.
Published: (2024)
by: Fan, Lishui, et al.
Published: (2024)
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
by: Guo, Daya, et al.
Published: (2024)
by: Guo, Daya, et al.
Published: (2024)
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks
by: Dandamudi, Rohit, et al.
Published: (2024)
by: Dandamudi, Rohit, et al.
Published: (2024)
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases
by: Liu, Xiangyan, et al.
Published: (2024)
by: Liu, Xiangyan, et al.
Published: (2024)
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?
by: Zhao, Yuwei, et al.
Published: (2024)
by: Zhao, Yuwei, et al.
Published: (2024)
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
by: Zan, Daoguang, et al.
Published: (2024)
by: Zan, Daoguang, et al.
Published: (2024)
Dafny as Verification-Aware Intermediate Language for Code Generation
by: Li, Yue Chen, et al.
Published: (2025)
by: Li, Yue Chen, et al.
Published: (2025)
E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task
by: Liu, Jingyao, et al.
Published: (2025)
by: Liu, Jingyao, et al.
Published: (2025)
COBOL-Coder: Domain-Adapted Large Language Models for COBOL Code Generation and Translation
by: Dau, Anh T. V., et al.
Published: (2026)
by: Dau, Anh T. V., et al.
Published: (2026)
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval
by: Wu, Jiarong, et al.
Published: (2025)
by: Wu, Jiarong, et al.
Published: (2025)
Code Readability in the Age of Large Language Models: An Industrial Case Study from Atlassian
by: Takerngsaksiri, Wannita, et al.
Published: (2025)
by: Takerngsaksiri, Wannita, et al.
Published: (2025)
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
by: Huang, Dong, et al.
Published: (2024)
by: Huang, Dong, et al.
Published: (2024)
Similar Items
-
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
by: Chen, Simin, et al.
Published: (2025) -
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models
by: Chen, Simin, et al.
Published: (2024) -
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
by: Chou, Jason, et al.
Published: (2025) -
Mercury: A Code Efficiency Benchmark for Code Large Language Models
by: Du, Mingzhe, et al.
Published: (2024) -
LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models
by: Feng, Xiaoning, et al.
Published: (2022)