Saved in:
| Main Authors: | Ahmed, Khaled, Song, Jialing, Chen, Boqi, Wei, Ou, Zheng, Bingzhou |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.00630 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accurate and Consistent Graph Model Generation from Text with Large Language Models
by: Chen, Boqi, et al.
Published: (2025)
by: Chen, Boqi, et al.
Published: (2025)
Hierarchical Evaluation of Software Design Capabilities of Large Language Models of Code
by: Saad, Mootez, et al.
Published: (2025)
by: Saad, Mootez, et al.
Published: (2025)
Structure- and Event-Driven Frameworks for State Machine Modeling with Large Language Models
by: Abdulkarim, Samer, et al.
Published: (2026)
by: Abdulkarim, Samer, et al.
Published: (2026)
On Inter-dataset Code Duplication and Data Leakage in Large Language Models
by: López, José Antonio Hernández, et al.
Published: (2024)
by: López, José Antonio Hernández, et al.
Published: (2024)
SHERPA: A Model-Driven Framework for Large Language Model Execution
by: Chen, Boqi, et al.
Published: (2025)
by: Chen, Boqi, et al.
Published: (2025)
When Elo Lies: Hidden Biases in Codeforces-Based Evaluation of Large Language Models
by: Zheng, Shenyu, et al.
Published: (2026)
by: Zheng, Shenyu, et al.
Published: (2026)
Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks
by: Li, Yuangang, et al.
Published: (2026)
by: Li, Yuangang, et al.
Published: (2026)
Are Decoder-Only Large Language Models the Silver Bullet for Code Search?
by: Chen, Yuxuan, et al.
Published: (2024)
by: Chen, Yuxuan, et al.
Published: (2024)
Narrowing the Complexity Gap in the Evaluation of Large Language Models
by: Chen, Yang, et al.
Published: (2026)
by: Chen, Yang, et al.
Published: (2026)
Evaluating and Improving Large Language Models for Competitive Program Generation
by: Wei, Minnan, et al.
Published: (2025)
by: Wei, Minnan, et al.
Published: (2025)
On the Evaluation of Large Language Models in Unit Test Generation
by: Yang, Lin, et al.
Published: (2024)
by: Yang, Lin, et al.
Published: (2024)
On the Evaluation of Large Language Models in Multilingual Vulnerability Repair
by: wang, Dong, et al.
Published: (2025)
by: wang, Dong, et al.
Published: (2025)
DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models
by: Chen, Yunnong, et al.
Published: (2025)
by: Chen, Yunnong, et al.
Published: (2025)
A Survey on Evaluating Large Language Models in Code Generation Tasks
by: Chen, Liguo, et al.
Published: (2024)
by: Chen, Liguo, et al.
Published: (2024)
Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models
by: Wang, Yanlin, et al.
Published: (2024)
by: Wang, Yanlin, et al.
Published: (2024)
Assertion Messages with Large Language Models (LLMs) for Code
by: Aljohani, Ahmed, et al.
Published: (2025)
by: Aljohani, Ahmed, et al.
Published: (2025)
FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks
by: Dai, Dekun, et al.
Published: (2025)
by: Dai, Dekun, et al.
Published: (2025)
Can Small GenAI Language Models Rival Large Language Models in Understanding Application Behavior?
by: Meymani, Mohammad, et al.
Published: (2025)
by: Meymani, Mohammad, et al.
Published: (2025)
Commit Messages in the Age of Large Language Models
by: Lopes, Cristina V., et al.
Published: (2024)
by: Lopes, Cristina V., et al.
Published: (2024)
Large Language Models in Game Development: Implications for Gameplay, Playability, and Player Experience
by: Johnson, Keeryn, et al.
Published: (2026)
by: Johnson, Keeryn, et al.
Published: (2026)
On the use of Large Language Models in Model-Driven Engineering
by: Di Rocco, Juri, et al.
Published: (2024)
by: Di Rocco, Juri, et al.
Published: (2024)
An Empirical Study on Low-Code Programming using Traditional vs Large Language Model Support
by: Liu, Yongkun, et al.
Published: (2024)
by: Liu, Yongkun, et al.
Published: (2024)
Analyzing the Instability of Large Language Models in Automated Bug Injection and Correction
by: Er, Mehmet Bilal, et al.
Published: (2025)
by: Er, Mehmet Bilal, et al.
Published: (2025)
Defining and Detecting the Defects of the Large Language Model-based Autonomous Agents
by: Ning, Kaiwen, et al.
Published: (2024)
by: Ning, Kaiwen, et al.
Published: (2024)
VulStamp: Vulnerability Assessment using Large Language Model
by: Shen, Hao, et al.
Published: (2025)
by: Shen, Hao, et al.
Published: (2025)
Bidirectional Empowerment of Metamorphic Testing and Large Language Models: A Systematic Survey
by: Zheng, Zheng, et al.
Published: (2026)
by: Zheng, Zheng, et al.
Published: (2026)
Evaluating Large Language Models for Multilingual Vulnerability Detection at Dual Granularities
by: Shu, Honglin, et al.
Published: (2025)
by: Shu, Honglin, et al.
Published: (2025)
LLM-based Satisfiability Checking of String Requirements by Consistent Data and Checker Generation
by: Chen, Boqi, et al.
Published: (2025)
by: Chen, Boqi, et al.
Published: (2025)
Evaluating Generated Commit Messages with Large Language Models
by: Zeng, Qunhong, et al.
Published: (2025)
by: Zeng, Qunhong, et al.
Published: (2025)
Debugging with Open-Source Large Language Models: An Evaluation
by: Majdoub, Yacine, et al.
Published: (2024)
by: Majdoub, Yacine, et al.
Published: (2024)
Evaluating Large Language Models in Detecting Test Smells
by: Lucas, Keila, et al.
Published: (2024)
by: Lucas, Keila, et al.
Published: (2024)
Software Testing with Large Language Models: Survey, Landscape, and Vision
by: Wang, Junjie, et al.
Published: (2023)
by: Wang, Junjie, et al.
Published: (2023)
Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation
by: Sun, Zhihong, et al.
Published: (2025)
by: Sun, Zhihong, et al.
Published: (2025)
Calibration and Correctness of Language Models for Code
by: Spiess, Claudio, et al.
Published: (2024)
by: Spiess, Claudio, et al.
Published: (2024)
Towards an Understanding of Large Language Models in Software Engineering Tasks
by: Zheng, Zibin, et al.
Published: (2023)
by: Zheng, Zibin, et al.
Published: (2023)
A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends
by: Zheng, Zibin, et al.
Published: (2023)
by: Zheng, Zibin, et al.
Published: (2023)
PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing
by: Zhang, Yuwei, et al.
Published: (2025)
by: Zhang, Yuwei, et al.
Published: (2025)
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness
by: Wang, Wenxuan
Published: (2024)
by: Wang, Wenxuan
Published: (2024)
Secret Breach Detection in Source Code with Large Language Models
by: Rahman, Md Nafiu, et al.
Published: (2025)
by: Rahman, Md Nafiu, et al.
Published: (2025)
Class Model Generation from Requirements using Large Language Models
by: Nguyen, Jackson, et al.
Published: (2026)
by: Nguyen, Jackson, et al.
Published: (2026)
Similar Items
-
Accurate and Consistent Graph Model Generation from Text with Large Language Models
by: Chen, Boqi, et al.
Published: (2025) -
Hierarchical Evaluation of Software Design Capabilities of Large Language Models of Code
by: Saad, Mootez, et al.
Published: (2025) -
Structure- and Event-Driven Frameworks for State Machine Modeling with Large Language Models
by: Abdulkarim, Samer, et al.
Published: (2026) -
On Inter-dataset Code Duplication and Data Leakage in Large Language Models
by: López, José Antonio Hernández, et al.
Published: (2024) -
SHERPA: A Model-Driven Framework for Large Language Model Execution
by: Chen, Boqi, et al.
Published: (2025)