Saved in:
| Main Authors: | Brahman, Danny, Mahoor, Mohammad |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03432 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TransLibEval: Demystify Large Language Models' Capability in Third-party Library-targeted Code Translation
by: Xue, Pengyu, et al.
Published: (2025)
by: Xue, Pengyu, et al.
Published: (2025)
ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation
by: Xue, Pengyu, et al.
Published: (2024)
by: Xue, Pengyu, et al.
Published: (2024)
CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
by: Yu, Hao, et al.
Published: (2023)
by: Yu, Hao, et al.
Published: (2023)
Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation
by: Hamedi, Parisa, et al.
Published: (2025)
by: Hamedi, Parisa, et al.
Published: (2025)
ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code
by: Feng, Jia, et al.
Published: (2024)
by: Feng, Jia, et al.
Published: (2024)
SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation
by: Peng, Zhiyuan, et al.
Published: (2025)
by: Peng, Zhiyuan, et al.
Published: (2025)
SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications
by: Ma, Lezhi, et al.
Published: (2024)
by: Ma, Lezhi, et al.
Published: (2024)
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?
by: Zhao, Yuwei, et al.
Published: (2024)
by: Zhao, Yuwei, et al.
Published: (2024)
FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks
by: Dai, Dekun, et al.
Published: (2025)
by: Dai, Dekun, et al.
Published: (2025)
AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation
by: Zhang, Tanghaoran, et al.
Published: (2026)
by: Zhang, Tanghaoran, et al.
Published: (2026)
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models
by: Tambon, Florian, et al.
Published: (2024)
by: Tambon, Florian, et al.
Published: (2024)
Measuring how changes in code readability attributes affect code quality evaluation by Large Language Models
by: Simoes, Igor Regis da Silva, et al.
Published: (2025)
by: Simoes, Igor Regis da Silva, et al.
Published: (2025)
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models
by: Wang, Yan, et al.
Published: (2024)
by: Wang, Yan, et al.
Published: (2024)
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
by: Kammakomati, Mehant, et al.
Published: (2024)
by: Kammakomati, Mehant, et al.
Published: (2024)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval
by: Wang, Jiexin, et al.
Published: (2024)
by: Wang, Jiexin, et al.
Published: (2024)
VHDL-Eval: A Framework for Evaluating Large Language Models in VHDL Code Generation
by: Vijayaraghavan, Prashanth, et al.
Published: (2024)
by: Vijayaraghavan, Prashanth, et al.
Published: (2024)
Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code
by: Hussain, Aftab, et al.
Published: (2024)
by: Hussain, Aftab, et al.
Published: (2024)
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
by: Yu, Zhaojian, et al.
Published: (2024)
by: Yu, Zhaojian, et al.
Published: (2024)
Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design
by: Aljagthami, Aamer, et al.
Published: (2025)
by: Aljagthami, Aamer, et al.
Published: (2025)
LEANCODE: Understanding Models Better for Code Simplification of Pre-trained Large Language Models
by: Wang, Yan, et al.
Published: (2025)
by: Wang, Yan, et al.
Published: (2025)
Emotion Classification In Software Engineering Texts: A Comparative Analysis of Pre-trained Transformers Language Models
by: Imran, Mia Mohammad
Published: (2024)
by: Imran, Mia Mohammad
Published: (2024)
Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models
by: Zhang, Sheng, et al.
Published: (2023)
by: Zhang, Sheng, et al.
Published: (2023)
SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis
by: Li, Yansong, et al.
Published: (2025)
by: Li, Yansong, et al.
Published: (2025)
Greening Large Language Models of Code
by: Shi, Jieke, et al.
Published: (2023)
by: Shi, Jieke, et al.
Published: (2023)
Hotfixing Large Language Models for Code
by: Yang, Zhou, et al.
Published: (2024)
by: Yang, Zhou, et al.
Published: (2024)
Ecosystem of Large Language Models for Code
by: Yang, Zhou, et al.
Published: (2024)
by: Yang, Zhou, et al.
Published: (2024)
Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code
by: Chen, Yujia, et al.
Published: (2024)
by: Chen, Yujia, et al.
Published: (2024)
Exploring the Potential of Large Language Models in Self-adaptive Systems
by: Li, Jialong, et al.
Published: (2024)
by: Li, Jialong, et al.
Published: (2024)
Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience
by: Yin, Xin, et al.
Published: (2024)
by: Yin, Xin, et al.
Published: (2024)
Large Language Model Unlearning for Source Code
by: Jiang, Xue, et al.
Published: (2025)
by: Jiang, Xue, et al.
Published: (2025)
Unlearning Trojans in Large Language Models: A Comparison Between Natural Language and Source Code
by: Kazemi, Mahdi, et al.
Published: (2024)
by: Kazemi, Mahdi, et al.
Published: (2024)
On Trojan Signatures in Large Language Models of Code
by: Hussain, Aftab, et al.
Published: (2024)
by: Hussain, Aftab, et al.
Published: (2024)
ArkEval: Benchmarking and Evaluating Automated CodeRepair for ArkTS
by: Xie, Bang, et al.
Published: (2026)
by: Xie, Bang, et al.
Published: (2026)
Practical Program Repair in the Era of Large Pre-trained Language Models
by: Xia, Chunqiu Steven, et al.
Published: (2022)
by: Xia, Chunqiu Steven, et al.
Published: (2022)
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval
by: Wu, Jiarong, et al.
Published: (2025)
by: Wu, Jiarong, et al.
Published: (2025)
Automated Harmfulness Testing for Code Large Language Models
by: Tan, Honghao, et al.
Published: (2025)
by: Tan, Honghao, et al.
Published: (2025)
Assertion Messages with Large Language Models (LLMs) for Code
by: Aljohani, Ahmed, et al.
Published: (2025)
by: Aljohani, Ahmed, et al.
Published: (2025)
Can Large Language Models Generate Geospatial Code?
by: Hou, Shuyang, et al.
Published: (2024)
by: Hou, Shuyang, et al.
Published: (2024)
Large Language Models for Code Generation: The Practitioners Perspective
by: Rasheed, Zeeshan, et al.
Published: (2025)
by: Rasheed, Zeeshan, et al.
Published: (2025)
Optimizing Large Language Model Hyperparameters for Code Generation
by: Arora, Chetan, et al.
Published: (2024)
by: Arora, Chetan, et al.
Published: (2024)
Similar Items
-
TransLibEval: Demystify Large Language Models' Capability in Third-party Library-targeted Code Translation
by: Xue, Pengyu, et al.
Published: (2025) -
ClassEval-T: Evaluating Large Language Models in Class-Level Code Translation
by: Xue, Pengyu, et al.
Published: (2024) -
CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
by: Yu, Hao, et al.
Published: (2023) -
Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation
by: Hamedi, Parisa, et al.
Published: (2025) -
ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code
by: Feng, Jia, et al.
Published: (2024)