Saved in:
| Main Authors: | Liang, Shanchao, Hu, Yiran, Jiang, Nan, Tan, Lin |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.21647 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WAFFLE: Finetuning Multi-Modal Models for Automated Front-End Development
by: Liang, Shanchao, et al.
Published: (2024)
by: Liang, Shanchao, et al.
Published: (2024)
TENET: Leveraging Tests Beyond Validation for Code Generation
by: Hu, Yiran, et al.
Published: (2025)
by: Hu, Yiran, et al.
Published: (2025)
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code
by: Jiang, Nan, et al.
Published: (2024)
by: Jiang, Nan, et al.
Published: (2024)
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?
by: Zhao, Yuwei, et al.
Published: (2024)
by: Zhao, Yuwei, et al.
Published: (2024)
Large Language Models for IT Automation Tasks: Are We There Yet?
by: Hassan, Md Mahadi, et al.
Published: (2025)
by: Hassan, Md Mahadi, et al.
Published: (2025)
From Completion to Editing: Unlocking Context-Aware Code Infilling via Search-and-Replace Instruction Tuning
by: Zhang, Jiajun, et al.
Published: (2026)
by: Zhang, Jiajun, et al.
Published: (2026)
SWE-QA: Can Language Models Answer Repository-level Code Questions?
by: Peng, Weihan, et al.
Published: (2025)
by: Peng, Weihan, et al.
Published: (2025)
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models
by: Lin, Hong Yi, et al.
Published: (2025)
by: Lin, Hong Yi, et al.
Published: (2025)
YABLoCo: Yet Another Benchmark for Long Context Code Generation
by: Valeev, Aidar, et al.
Published: (2025)
by: Valeev, Aidar, et al.
Published: (2025)
AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
by: Chou, Jason, et al.
Published: (2025)
by: Chou, Jason, et al.
Published: (2025)
CodeRAG-Bench: Can Retrieval Augment Code Generation?
by: Wang, Zora Zhiruo, et al.
Published: (2024)
by: Wang, Zora Zhiruo, et al.
Published: (2024)
Leveraging Print Debugging to Improve Code Generation in Large Language Models
by: Hu, Xueyu, et al.
Published: (2024)
by: Hu, Xueyu, et al.
Published: (2024)
ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models
by: Zheng, Jiasheng, et al.
Published: (2026)
by: Zheng, Jiasheng, et al.
Published: (2026)
Unified Software Engineering Agent as AI Software Engineer
by: Applis, Leonhard, et al.
Published: (2025)
by: Applis, Leonhard, et al.
Published: (2025)
InstructCoder: Instruction Tuning Large Language Models for Code Editing
by: Li, Kaixin, et al.
Published: (2023)
by: Li, Kaixin, et al.
Published: (2023)
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
by: Shi, Yuling, et al.
Published: (2026)
by: Shi, Yuling, et al.
Published: (2026)
Mercury: A Code Efficiency Benchmark for Code Large Language Models
by: Du, Mingzhe, et al.
Published: (2024)
by: Du, Mingzhe, et al.
Published: (2024)
LongCodeZip: Compress Long Context for Code Language Models
by: Shi, Yuling, et al.
Published: (2025)
by: Shi, Yuling, et al.
Published: (2025)
Bridging Code Graphs and Large Language Models for Better Code Understanding
by: Chen, Zeqi, et al.
Published: (2025)
by: Chen, Zeqi, et al.
Published: (2025)
Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency
by: Huang, Baizhou, et al.
Published: (2023)
by: Huang, Baizhou, et al.
Published: (2023)
NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations
by: Chen, Junkai, et al.
Published: (2024)
by: Chen, Junkai, et al.
Published: (2024)
Calibration of Large Language Models on Code Summarization
by: Virk, Yuvraj, et al.
Published: (2024)
by: Virk, Yuvraj, et al.
Published: (2024)
Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?
by: Li, Xiangyang, et al.
Published: (2025)
by: Li, Xiangyang, et al.
Published: (2025)
What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts
by: Yang, Chenyang, et al.
Published: (2025)
by: Yang, Chenyang, et al.
Published: (2025)
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
by: Lee, Seonghyeon, et al.
Published: (2025)
by: Lee, Seonghyeon, et al.
Published: (2025)
Evaluating and Achieving Controllable Code Completion in Code LLM
by: Zhang, Jiajun, et al.
Published: (2026)
by: Zhang, Jiajun, et al.
Published: (2026)
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
by: Chen, Guoxin, et al.
Published: (2026)
by: Chen, Guoxin, et al.
Published: (2026)
GenX: Mastering Code and Test Generation with Execution Feedback
by: Wang, Nan, et al.
Published: (2024)
by: Wang, Nan, et al.
Published: (2024)
Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model
by: Lin, Yu-Chen, et al.
Published: (2023)
by: Lin, Yu-Chen, et al.
Published: (2023)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval
by: Wang, Jiexin, et al.
Published: (2024)
by: Wang, Jiexin, et al.
Published: (2024)
A Survey on Large Language Models for Code Generation
by: Jiang, Juyong, et al.
Published: (2024)
by: Jiang, Juyong, et al.
Published: (2024)
Vulnerability Detection with Code Language Models: How Far Are We?
by: Ding, Yangruibo, et al.
Published: (2024)
by: Ding, Yangruibo, et al.
Published: (2024)
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
by: Yang, Kang, et al.
Published: (2025)
by: Yang, Kang, et al.
Published: (2025)
A Survey of using Large Language Models for Generating Infrastructure as Code
by: Srivatsa, Kalahasti Ganesh, et al.
Published: (2024)
by: Srivatsa, Kalahasti Ganesh, et al.
Published: (2024)
Enhanced Automated Code Vulnerability Repair using Large Language Models
by: de-Fitero-Dominguez, David, et al.
Published: (2024)
by: de-Fitero-Dominguez, David, et al.
Published: (2024)
Is Your Benchmark (Still) Useful? Dynamic Benchmarking for Code Language Models
by: Guan, Batu, et al.
Published: (2025)
by: Guan, Batu, et al.
Published: (2025)
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
by: Jiang, Hongchao, et al.
Published: (2025)
by: Jiang, Hongchao, et al.
Published: (2025)
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers
by: Shi, Yuling, et al.
Published: (2024)
by: Shi, Yuling, et al.
Published: (2024)
How Programming Concepts and Neurons Are Shared in Code Language Models
by: Kargaran, Amir Hossein, et al.
Published: (2025)
by: Kargaran, Amir Hossein, et al.
Published: (2025)
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases
by: Liu, Xiangyan, et al.
Published: (2024)
by: Liu, Xiangyan, et al.
Published: (2024)
Similar Items
-
WAFFLE: Finetuning Multi-Modal Models for Automated Front-End Development
by: Liang, Shanchao, et al.
Published: (2024) -
TENET: Leveraging Tests Beyond Validation for Code Generation
by: Hu, Yiran, et al.
Published: (2025) -
Collu-Bench: A Benchmark for Predicting Language Model Hallucinations in Code
by: Jiang, Nan, et al.
Published: (2024) -
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?
by: Zhao, Yuwei, et al.
Published: (2024) -
Large Language Models for IT Automation Tasks: Are We There Yet?
by: Hassan, Md Mahadi, et al.
Published: (2025)