:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Broestl, Noah, Abdalla, Adel Nasser, Bale, Rajprakash, Gupta, Hersh, Struever, Max
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Software Engineering
Online Access:	https://arxiv.org/abs/2510.00001
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification
by: Xu, Jiacheng, et al.
Published: (2025)

Exploring the Integration of Large Language Models in Industrial Test Maintenance Processes
by: Liu, Jingxiong, et al.
Published: (2024)

A Conceptual Framework for Ethical Evaluation of Machine Learning Systems
by: Gupta, Neha R., et al.
Published: (2024)

Agile Story-Point Estimation: Is RAG a Better Way to Go?
by: Maha, Lamyea, et al.
Published: (2026)

Codehacks: A Dataset of Adversarial Tests for Competitive Programming Problems Obtained from Codeforces
by: Hort, Max, et al.
Published: (2025)

Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations
by: Akash, Ranit Debnath, et al.
Published: (2025)

Semantic-Preserving Transformations as Mutation Operators: A Study on Their Effectiveness in Defect Detection
by: Hort, Max, et al.
Published: (2025)

MASTEST: A LLM-Based Multi-Agent System For RESTful API Tests
by: Han, Xiaoke, et al.
Published: (2025)

RACC: Representation-Aware Coverage Criteria for LLM Safety Testing
by: Wei, Zeming, et al.
Published: (2026)

scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns
by: Samsonau, Sergey V.
Published: (2026)

Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?
by: Sorokin, Lev, et al.
Published: (2024)

FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair
by: Fatima, Sakina, et al.
Published: (2023)

How Robustly do LLMs Understand Execution Semantics?
by: Spiess, Claudio, et al.
Published: (2026)

When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems
by: Casey, Emma, et al.
Published: (2026)

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
by: Jacopin, Éric
Published: (2026)

Generative AI to Generate Test Data Generators
by: Baudry, Benoit, et al.
Published: (2024)

Understanding LLM-Driven Test Oracle Generation
by: Bodicoat, Adam, et al.
Published: (2026)

Code Generation by Differential Test Time Scaling
by: He, Yifeng, et al.
Published: (2026)

Semantic Voting: Execution-Grounded Consensus for LLM Code Generation
by: Jiang, Shan, et al.
Published: (2026)

CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG
by: Chen, Pengzhou, et al.
Published: (2026)

Reinforcement Learning for Online Testing of Autonomous Driving Systems: a Replication and Extension Study
by: Giamattei, Luca, et al.
Published: (2024)

Mutation-Guided LLM-based Test Generation at Meta
by: Foster, Christopher, et al.
Published: (2025)

DeepKnowledge: Generalisation-Driven Deep Learning Testing
by: Missaoui, Sondess, et al.
Published: (2024)

A Theoretical Analysis of Test-Driven Code Generation
by: Menet, Nicolas, et al.
Published: (2026)

A Stochastic Differential Equation Framework for Multi-Objective LLM Interactions: Dynamical Systems Analysis with Code Generation Applications
by: Shukla, Shivani, et al.
Published: (2025)

Enhancing LLM-Based Test Generation by Eliminating Covered Code
by: Xu, WeiZhe, et al.
Published: (2026)

The Impact of Software Testing with Quantum Optimization Meets Machine Learning
by: Bandarupalli, Gopichand
Published: (2025)

RBT4DNN: Requirements-based Testing of Neural Networks
by: Mozumder, Nusrat Jahan, et al.
Published: (2025)

Read, Extract, Classify: A Tool for Smarter Requirements Engineering
by: Bhattacharya, Paheli, et al.
Published: (2026)

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
by: Zhou, Zenghui, et al.
Published: (2026)

Using Quality Attribute Scenarios for ML Model Test Case Generation
by: Brower-Sinning, Rachel, et al.
Published: (2024)

VeriScale: Adversarial Test-Suite Scaling for Verifiable Code Generation
by: Bai, Yifan, et al.
Published: (2026)

SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents
by: Ding, Yifeng, et al.
Published: (2026)

MILE: A Mutation Testing Framework of In-Context Learning Systems
by: Wei, Zeming, et al.
Published: (2024)

Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach
by: Canonne, Clément L., et al.
Published: (2025)

SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
by: Mündler, Niels, et al.
Published: (2024)

MIST-RL: Mutation-based Incremental Suite Testing via Reinforcement Learning
by: Zhu, Sicheng, et al.
Published: (2026)

AI-driven Java Performance Testing: Balancing Result Quality with Testing Time
by: Traini, Luca, et al.
Published: (2024)

Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
by: Storhaug, André, et al.
Published: (2024)

A Reference Architecture of Reinforcement Learning Frameworks
by: Liu, Xiaoran, et al.
Published: (2026)