:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Siyuan, Yang, Zhice, Chen, Huangxun
Format:	Preprint
Published:	2025
Subjects:	Software Engineering Artificial Intelligence I.2.5
Online Access:	https://arxiv.org/abs/2507.23178
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Demystifying the Silence of Correctness Bugs in PyTorch Compiler
by: Li, Meiziniu, et al.
Published: (2026)

Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries
by: Li, Meiziniu, et al.
Published: (2024)

COMET: Coverage-guided Model Generation For Deep Learning Library Testing
by: Li, Meiziniu, et al.
Published: (2022)

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement
by: Borra, Ajay Krishna, et al.
Published: (2026)

Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
by: Cambronero, José, et al.
Published: (2025)

Automated Code Fix Suggestions for Accessibility Issues in Mobile Apps
by: Mehralian, Forough, et al.
Published: (2024)

Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation
by: Downing, Mara, et al.
Published: (2025)

AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)

On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
by: Hundal, Rajdeep Singh, et al.
Published: (2025)

Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models
by: More, Riddhi, et al.
Published: (2025)

An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
by: More, Riddhi, et al.
Published: (2025)

A Systematic Approach for Assessing Large Language Models' Test Case Generation Capability
by: Chang, Hung-Fu, et al.
Published: (2025)

AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
by: Bhardwaj, Varun Pratap
Published: (2026)

From Untestable to Testable: Metamorphic Testing in the Age of LLMs
by: Terragni, Valerio
Published: (2026)

Towards Explainable Test Case Prioritisation with Learning-to-Rank Models
by: Ramírez, Aurora, et al.
Published: (2024)

RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code
by: Chen, Jiachi, et al.
Published: (2024)

Experience with GitHub Copilot for Developer Productivity at Zoominfo
by: Bakal, Gal, et al.
Published: (2025)

Addressing Data Leakage in HumanEval Using Combinatorial Test Design
by: Bradbury, Jeremy S., et al.
Published: (2024)

Automated structural testing of LLM-based agents: methods, framework, and case studies
by: Kohl, Jens, et al.
Published: (2026)

The Explabox: Model-Agnostic Machine Learning Transparency & Analysis
by: Robeer, Marcel, et al.
Published: (2024)

LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB
by: Bekmyradov, Vekil, et al.
Published: (2026)

LLMORPH: Automated Metamorphic Testing of Large Language Models
by: Cho, Steven, et al.
Published: (2026)

Orion: Fuzzing Workflow Automation
by: Bazalii, Max, et al.
Published: (2025)

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
by: Jana, Prithwish, et al.
Published: (2023)

RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
by: Liu, Shunyu, et al.
Published: (2025)

Multi-Agent Code Verification via Information Theory
by: Rajan, Shreshth
Published: (2025)

RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
by: Gautam, Dhruv, et al.
Published: (2025)

DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
by: Jahan, Sigma, et al.
Published: (2026)

AcTracer: Active Testing of Large Language Model via Multi-Stage Sampling
by: Huang, Yuheng, et al.
Published: (2024)

DeepCodeProbe: Towards Understanding What Models Trained on Code Learn
by: Majdinasab, Vahid, et al.
Published: (2024)

InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment
by: Delgado-Pérez, Pedro, et al.
Published: (2024)

Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks
by: Lee, Hokyung, et al.
Published: (2024)

Implementing Knowledge Representation and Reasoning with Object Oriented Design
by: Bassiouny, Abdelrhman, et al.
Published: (2026)

GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
by: Liu, Zhuoyao, et al.
Published: (2026)

SpecOps: A Fully Automated AI Agent Testing Framework in Real-World GUI Environments
by: Ahmed, Syed Yusuf, et al.
Published: (2026)

RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
by: Li, Kenan, et al.
Published: (2026)

CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)

Towards a Probabilistic Framework for Analyzing and Improving LLM-Enabled Software
by: Baldonado, Juan Manuel, et al.
Published: (2025)

Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate
by: Wu, Jiaqing, et al.
Published: (2026)

Monitoring Agentic Systems Before They're Reliable
by: Boston, Marisa Ferrara, et al.
Published: (2026)