Saved in:
| Main Authors: | Broestl, Noah, Abdalla, Adel Nasser, Bale, Rajprakash, Gupta, Hersh, Struever, Max |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.00001 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification
by: Xu, Jiacheng, et al.
Published: (2025)
by: Xu, Jiacheng, et al.
Published: (2025)
Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes
by: Liu, Jingxiong, et al.
Published: (2024)
by: Liu, Jingxiong, et al.
Published: (2024)
A Conceptual Framework for Ethical Evaluation of Machine Learning Systems
by: Gupta, Neha R., et al.
Published: (2024)
by: Gupta, Neha R., et al.
Published: (2024)
Agile Story-Point Estimation: Is RAG a Better Way to Go?
by: Maha, Lamyea, et al.
Published: (2026)
by: Maha, Lamyea, et al.
Published: (2026)
Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces
by: Hort, Max, et al.
Published: (2025)
by: Hort, Max, et al.
Published: (2025)
Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations
by: Akash, Ranit Debnath, et al.
Published: (2025)
by: Akash, Ranit Debnath, et al.
Published: (2025)
Semantic-Preserving Transformations as Mutation Operators: A Study on Their Effectiveness in Defect Detection
by: Hort, Max, et al.
Published: (2025)
by: Hort, Max, et al.
Published: (2025)
MASTEST: A LLM-Based Multi-Agent System For RESTful API Tests
by: Han, Xiaoke, et al.
Published: (2025)
by: Han, Xiaoke, et al.
Published: (2025)
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing
by: Wei, Zeming, et al.
Published: (2026)
by: Wei, Zeming, et al.
Published: (2026)
scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns
by: Samsonau, Sergey V.
Published: (2026)
by: Samsonau, Sergey V.
Published: (2026)
Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?
by: Sorokin, Lev, et al.
Published: (2024)
by: Sorokin, Lev, et al.
Published: (2024)
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair
by: Fatima, Sakina, et al.
Published: (2023)
by: Fatima, Sakina, et al.
Published: (2023)
How Robustly do LLMs Understand Execution Semantics?
by: Spiess, Claudio, et al.
Published: (2026)
by: Spiess, Claudio, et al.
Published: (2026)
When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
by: Casey, Emma, et al.
Published: (2026)
by: Casey, Emma, et al.
Published: (2026)
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
by: Jacopin, Éric
Published: (2026)
by: Jacopin, Éric
Published: (2026)
Generative AI to Generate Test Data Generators
by: Baudry, Benoit, et al.
Published: (2024)
by: Baudry, Benoit, et al.
Published: (2024)
Understanding LLM-Driven Test Oracle Generation
by: Bodicoat, Adam, et al.
Published: (2026)
by: Bodicoat, Adam, et al.
Published: (2026)
Code Generation by Differential Test Time Scaling
by: He, Yifeng, et al.
Published: (2026)
by: He, Yifeng, et al.
Published: (2026)
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation
by: Jiang, Shan, et al.
Published: (2026)
by: Jiang, Shan, et al.
Published: (2026)
CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG
by: Chen, Pengzhou, et al.
Published: (2026)
by: Chen, Pengzhou, et al.
Published: (2026)
Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study
by: Giamattei, Luca, et al.
Published: (2024)
by: Giamattei, Luca, et al.
Published: (2024)
Mutation-Guided LLM-based Test Generation at Meta
by: Foster, Christopher, et al.
Published: (2025)
by: Foster, Christopher, et al.
Published: (2025)
DeepKnowledge: Generalisation-Driven Deep Learning Testing
by: Missaoui, Sondess, et al.
Published: (2024)
by: Missaoui, Sondess, et al.
Published: (2024)
A Theoretical Analysis of Test-Driven Code Generation
by: Menet, Nicolas, et al.
Published: (2026)
by: Menet, Nicolas, et al.
Published: (2026)
A Stochastic Differential Equation Framework for Multi-Objective LLM Interactions: Dynamical Systems Analysis with Code Generation Applications
by: Shukla, Shivani, et al.
Published: (2025)
by: Shukla, Shivani, et al.
Published: (2025)
Enhancing LLM-Based Test Generation by Eliminating Covered Code
by: Xu, WeiZhe, et al.
Published: (2026)
by: Xu, WeiZhe, et al.
Published: (2026)
The Impact of Software Testing with Quantum Optimization Meets Machine Learning
by: Bandarupalli, Gopichand
Published: (2025)
by: Bandarupalli, Gopichand
Published: (2025)
RBT4DNN: Requirements-based Testing of Neural Networks
by: Mozumder, Nusrat Jahan, et al.
Published: (2025)
by: Mozumder, Nusrat Jahan, et al.
Published: (2025)
Read, Extract, Classify: A Tool for Smarter Requirements Engineering
by: Bhattacharya, Paheli, et al.
Published: (2026)
by: Bhattacharya, Paheli, et al.
Published: (2026)
LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
by: Zhou, Zenghui, et al.
Published: (2026)
by: Zhou, Zenghui, et al.
Published: (2026)
Using Quality Attribute Scenarios for ML Model Test Case Generation
by: Brower-Sinning, Rachel, et al.
Published: (2024)
by: Brower-Sinning, Rachel, et al.
Published: (2024)
VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation
by: Bai, Yifan, et al.
Published: (2026)
by: Bai, Yifan, et al.
Published: (2026)
SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents
by: Ding, Yifeng, et al.
Published: (2026)
by: Ding, Yifeng, et al.
Published: (2026)
MILE: A Mutation Testing Framework of In-Context Learning Systems
by: Wei, Zeming, et al.
Published: (2024)
by: Wei, Zeming, et al.
Published: (2024)
Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach
by: Canonne, Clément L., et al.
Published: (2025)
by: Canonne, Clément L., et al.
Published: (2025)
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
by: Mündler, Niels, et al.
Published: (2024)
by: Mündler, Niels, et al.
Published: (2024)
MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning
by: Zhu, Sicheng, et al.
Published: (2026)
by: Zhu, Sicheng, et al.
Published: (2026)
AI-driven Java Performance Testing: Balancing Result Quality with Testing Time
by: Traini, Luca, et al.
Published: (2024)
by: Traini, Luca, et al.
Published: (2024)
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
by: Storhaug, André, et al.
Published: (2024)
by: Storhaug, André, et al.
Published: (2024)
A Reference Architecture of Reinforcement Learning Frameworks
by: Liu, Xiaoran, et al.
Published: (2026)
by: Liu, Xiaoran, et al.
Published: (2026)
Similar Items
-
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification
by: Xu, Jiacheng, et al.
Published: (2025) -
Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes
by: Liu, Jingxiong, et al.
Published: (2024) -
A Conceptual Framework for Ethical Evaluation of Machine Learning Systems
by: Gupta, Neha R., et al.
Published: (2024) -
Agile Story-Point Estimation: Is RAG a Better Way to Go?
by: Maha, Lamyea, et al.
Published: (2026) -
Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces
by: Hort, Max, et al.
Published: (2025)