Saved in:
| Main Authors: | Liu, Siyuan, Yang, Zhice, Chen, Huangxun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.23178 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Demystifying the Silence of Correctness Bugs in PyTorch Compiler
by: Li, Meiziniu, et al.
Published: (2026)
by: Li, Meiziniu, et al.
Published: (2026)
Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries
by: Li, Meiziniu, et al.
Published: (2024)
by: Li, Meiziniu, et al.
Published: (2024)
COMET: Coverage-guided Model Generation For Deep Learning Library Testing
by: Li, Meiziniu, et al.
Published: (2022)
by: Li, Meiziniu, et al.
Published: (2022)
CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement
by: Borra, Ajay Krishna, et al.
Published: (2026)
by: Borra, Ajay Krishna, et al.
Published: (2026)
Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
by: Cambronero, José, et al.
Published: (2025)
by: Cambronero, José, et al.
Published: (2025)
Automated Code Fix Suggestions for Accessibility Issues in Mobile Apps
by: Mehralian, Forough, et al.
Published: (2024)
by: Mehralian, Forough, et al.
Published: (2024)
Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation
by: Downing, Mara, et al.
Published: (2025)
by: Downing, Mara, et al.
Published: (2025)
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)
by: Hu, Yuelin, et al.
Published: (2026)
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
by: Hundal, Rajdeep Singh, et al.
Published: (2025)
by: Hundal, Rajdeep Singh, et al.
Published: (2025)
Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models
by: More, Riddhi, et al.
Published: (2025)
by: More, Riddhi, et al.
Published: (2025)
An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
by: More, Riddhi, et al.
Published: (2025)
by: More, Riddhi, et al.
Published: (2025)
A Systematic Approach for Assessing Large Language Models' Test Case Generation Capability
by: Chang, Hung-Fu, et al.
Published: (2025)
by: Chang, Hung-Fu, et al.
Published: (2025)
AgentAssay: Token-Efficient Regression Testing for Non-Deterministic AI Agent Workflows
by: Bhardwaj, Varun Pratap
Published: (2026)
by: Bhardwaj, Varun Pratap
Published: (2026)
From Untestable to Testable: Metamorphic Testing in the Age of LLMs
by: Terragni, Valerio
Published: (2026)
by: Terragni, Valerio
Published: (2026)
Towards Explainable Test Case Prioritisation with Learning-to-Rank Models
by: Ramírez, Aurora, et al.
Published: (2024)
by: Ramírez, Aurora, et al.
Published: (2024)
RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code
by: Chen, Jiachi, et al.
Published: (2024)
by: Chen, Jiachi, et al.
Published: (2024)
Experience with GitHub Copilot for Developer Productivity at Zoominfo
by: Bakal, Gal, et al.
Published: (2025)
by: Bakal, Gal, et al.
Published: (2025)
Addressing Data Leakage in HumanEval Using Combinatorial Test Design
by: Bradbury, Jeremy S., et al.
Published: (2024)
by: Bradbury, Jeremy S., et al.
Published: (2024)
Automated structural testing of LLM-based agents: methods, framework, and case studies
by: Kohl, Jens, et al.
Published: (2026)
by: Kohl, Jens, et al.
Published: (2026)
The Explabox: Model-Agnostic Machine Learning Transparency & Analysis
by: Robeer, Marcel, et al.
Published: (2024)
by: Robeer, Marcel, et al.
Published: (2024)
LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB
by: Bekmyradov, Vekil, et al.
Published: (2026)
by: Bekmyradov, Vekil, et al.
Published: (2026)
LLMORPH: Automated Metamorphic Testing of Large Language Models
by: Cho, Steven, et al.
Published: (2026)
by: Cho, Steven, et al.
Published: (2026)
Orion: Fuzzing Workflow Automation
by: Bazalii, Max, et al.
Published: (2025)
by: Bazalii, Max, et al.
Published: (2025)
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
by: Jana, Prithwish, et al.
Published: (2023)
by: Jana, Prithwish, et al.
Published: (2023)
RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
by: Liu, Shunyu, et al.
Published: (2025)
by: Liu, Shunyu, et al.
Published: (2025)
Multi-Agent Code Verification via Information Theory
by: Rajan, Shreshth
Published: (2025)
by: Rajan, Shreshth
Published: (2025)
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
by: Gautam, Dhruv, et al.
Published: (2025)
by: Gautam, Dhruv, et al.
Published: (2025)
DEFault++: Automated Fault Detection, Categorization, and Diagnosis for Transformer Architectures
by: Jahan, Sigma, et al.
Published: (2026)
by: Jahan, Sigma, et al.
Published: (2026)
AcTracer: Active Testing of Large Language Model via Multi-Stage Sampling
by: Huang, Yuheng, et al.
Published: (2024)
by: Huang, Yuheng, et al.
Published: (2024)
DeepCodeProbe: Towards Understanding What Models Trained on Code Learn
by: Majdinasab, Vahid, et al.
Published: (2024)
by: Majdinasab, Vahid, et al.
Published: (2024)
InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment
by: Delgado-Pérez, Pedro, et al.
Published: (2024)
by: Delgado-Pérez, Pedro, et al.
Published: (2024)
Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks
by: Lee, Hokyung, et al.
Published: (2024)
by: Lee, Hokyung, et al.
Published: (2024)
Implementing Knowledge Representation and Reasoning with Object Oriented Design
by: Bassiouny, Abdelrhman, et al.
Published: (2026)
by: Bassiouny, Abdelrhman, et al.
Published: (2026)
GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
by: Liu, Zhuoyao, et al.
Published: (2026)
by: Liu, Zhuoyao, et al.
Published: (2026)
SpecOps: A Fully Automated AI Agent Testing Framework in Real-World GUI Environments
by: Ahmed, Syed Yusuf, et al.
Published: (2026)
by: Ahmed, Syed Yusuf, et al.
Published: (2026)
RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
by: Li, Kenan, et al.
Published: (2026)
by: Li, Kenan, et al.
Published: (2026)
CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)
by: Li, Han, et al.
Published: (2026)
Towards a Probabilistic Framework for Analyzing and Improving LLM-Enabled Software
by: Baldonado, Juan Manuel, et al.
Published: (2025)
by: Baldonado, Juan Manuel, et al.
Published: (2025)
Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate
by: Wu, Jiaqing, et al.
Published: (2026)
by: Wu, Jiaqing, et al.
Published: (2026)
Monitoring Agentic Systems Before They're Reliable
by: Boston, Marisa Ferrara, et al.
Published: (2026)
by: Boston, Marisa Ferrara, et al.
Published: (2026)
Similar Items
-
Demystifying the Silence of Correctness Bugs in PyTorch Compiler
by: Li, Meiziniu, et al.
Published: (2026) -
Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries
by: Li, Meiziniu, et al.
Published: (2024) -
COMET: Coverage-guided Model Generation For Deep Learning Library Testing
by: Li, Meiziniu, et al.
Published: (2022) -
CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement
by: Borra, Ajay Krishna, et al.
Published: (2026) -
Abstain and Validate: A Dual-LLM Policy for Reducing Noise in Agentic Program Repair
by: Cambronero, José, et al.
Published: (2025)