Saved in:
| Main Authors: | Yang, Lekang, Liu, Yuetong, Zhang, Yitong, Li, Jia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24975 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking LLMs for Unit Test Generation from Real-World Functions
by: Huang, Dong, et al.
Published: (2025)
by: Huang, Dong, et al.
Published: (2025)
To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
by: Zhang, Yitong, et al.
Published: (2026)
by: Zhang, Yitong, et al.
Published: (2026)
TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation
by: Liu, Steven, et al.
Published: (2026)
by: Liu, Steven, et al.
Published: (2026)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance
by: Ma, Yichuan, et al.
Published: (2025)
by: Ma, Yichuan, et al.
Published: (2025)
Dynamic Scaling of Unit Tests for Code Reward Modeling
by: Ma, Zeyao, et al.
Published: (2025)
by: Ma, Zeyao, et al.
Published: (2025)
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
by: Liu, Yujian, et al.
Published: (2025)
by: Liu, Yujian, et al.
Published: (2025)
Showing LLM-Generated Code Selectively Based on Confidence of LLMs
by: Li, Jia, et al.
Published: (2024)
by: Li, Jia, et al.
Published: (2024)
MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms
by: Wang, Yibo, et al.
Published: (2025)
by: Wang, Yibo, et al.
Published: (2025)
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure
by: Yang, Zheyuan, et al.
Published: (2025)
by: Yang, Zheyuan, et al.
Published: (2025)
CodeContests+: High-Quality Test Case Generation for Competitive Programming
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
by: Yang, Jialin, et al.
Published: (2025)
by: Yang, Jialin, et al.
Published: (2025)
HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation
by: Karanjai, Rabimba, et al.
Published: (2025)
by: Karanjai, Rabimba, et al.
Published: (2025)
CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language
by: Cheng, Junhang, et al.
Published: (2026)
by: Cheng, Junhang, et al.
Published: (2026)
GUI Test Migration via Abstraction and Concretization
by: Zhang, Yakun, et al.
Published: (2024)
by: Zhang, Yakun, et al.
Published: (2024)
Learning to Generate Unit Tests for Automated Debugging
by: Prasad, Archiki, et al.
Published: (2025)
by: Prasad, Archiki, et al.
Published: (2025)
Machine Translation Testing via Syntactic Tree Pruning
by: Zhang, Quanjun, et al.
Published: (2024)
by: Zhang, Quanjun, et al.
Published: (2024)
Measuring the Influence of Incorrect Code on Test Generation
by: Huang, Dong, et al.
Published: (2024)
by: Huang, Dong, et al.
Published: (2024)
Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing
by: Feng, Sidong, et al.
Published: (2025)
by: Feng, Sidong, et al.
Published: (2025)
PackMonitor: Enabling Zero Package Hallucinations Through Decoding-Time Monitoring
by: Liu, Xiting, et al.
Published: (2026)
by: Liu, Xiting, et al.
Published: (2026)
Bias Testing and Mitigation in Black Box LLMs using Metamorphic Relations
by: Salimian, Sina, et al.
Published: (2025)
by: Salimian, Sina, et al.
Published: (2025)
Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems
by: Cao, Yuhan, et al.
Published: (2025)
by: Cao, Yuhan, et al.
Published: (2025)
GenX: Mastering Code and Test Generation with Execution Feedback
by: Wang, Nan, et al.
Published: (2024)
by: Wang, Nan, et al.
Published: (2024)
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
by: Wang, Junqiao, et al.
Published: (2024)
by: Wang, Junqiao, et al.
Published: (2024)
Solver-Independent Automated Problem Formulation via LLMs for High-Cost Simulation-Driven Design
by: Li, Yuchen, et al.
Published: (2025)
by: Li, Yuchen, et al.
Published: (2025)
SimCT: A Simple Consistency Test Protocol in LLMs Development Lifecycle
by: Zhao, Fufangchen, et al.
Published: (2024)
by: Zhao, Fufangchen, et al.
Published: (2024)
Evaluating LLMs on Sequential API Call Through Automated Test Generation
by: Huang, Yuheng, et al.
Published: (2025)
by: Huang, Yuheng, et al.
Published: (2025)
Multi-Programming Language Sandbox for LLMs
by: Dou, Shihan, et al.
Published: (2024)
by: Dou, Shihan, et al.
Published: (2024)
Measuring LLM Code Generation Stability via Structural Entropy
by: Song, Yewei, et al.
Published: (2025)
by: Song, Yewei, et al.
Published: (2025)
Model Editing for LLMs4Code: How Far are We?
by: Li, Xiaopeng, et al.
Published: (2024)
by: Li, Xiaopeng, et al.
Published: (2024)
Evaluation of Code LLMs on Geospatial Code Generation
by: Gramacki, Piotr, et al.
Published: (2024)
by: Gramacki, Piotr, et al.
Published: (2024)
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024)
by: Tian, Yuchen, et al.
Published: (2024)
TDD-Bench Verified: Can LLMs Generate Tests for Issues Before They Get Resolved?
by: Ahmed, Toufique, et al.
Published: (2024)
by: Ahmed, Toufique, et al.
Published: (2024)
Assessing Evaluation Metrics for Neural Test Oracle Generation
by: Shin, Jiho, et al.
Published: (2023)
by: Shin, Jiho, et al.
Published: (2023)
EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations
by: Li, Jia, et al.
Published: (2024)
by: Li, Jia, et al.
Published: (2024)
CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation
by: Cai, Jianfeng, et al.
Published: (2026)
by: Cai, Jianfeng, et al.
Published: (2026)
Zero-shot Bilingual App Reviews Mining with Large Language Models
by: Wei, Jialiang, et al.
Published: (2023)
by: Wei, Jialiang, et al.
Published: (2023)
HateModerate: Testing Hate Speech Detectors against Content Moderation Policies
by: Zheng, Jiangrui, et al.
Published: (2023)
by: Zheng, Jiangrui, et al.
Published: (2023)
FairCoder: Evaluating Social Bias of LLMs in Code Generation
by: Du, Yongkang, et al.
Published: (2025)
by: Du, Yongkang, et al.
Published: (2025)
Software Design Pattern Model and Data Structure Algorithm Abilities on Microservices Architecture Design in High-tech Enterprises
by: Cui, Jun
Published: (2024)
by: Cui, Jun
Published: (2024)
Learning to Commit: Generating Organic Pull Requests via Online Repository Memory
by: Li, Mo, et al.
Published: (2026)
by: Li, Mo, et al.
Published: (2026)
Similar Items
-
Benchmarking LLMs for Unit Test Generation from Real-World Functions
by: Huang, Dong, et al.
Published: (2025) -
To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
by: Zhang, Yitong, et al.
Published: (2026) -
TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation
by: Liu, Steven, et al.
Published: (2026) -
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance
by: Ma, Yichuan, et al.
Published: (2025) -
Dynamic Scaling of Unit Tests for Code Reward Modeling
by: Ma, Zeyao, et al.
Published: (2025)