:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Martinez, Matias, Franch, Xavier
Format:	Preprint
Published:	2026
Subjects:	Software Engineering
Online Access:	https://arxiv.org/abs/2602.04449
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
by: Martinez, Matias, et al.
Published: (2025)

Energy Consumption of Automated Program Repair
by: Martinez, Matias, et al.
Published: (2022)

Automated Requirements Relation Extraction
by: Motger, Quim, et al.
Published: (2024)

SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024)

SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks
by: Mhatre, Sanket, et al.
Published: (2025)

SWE Context Bench: A Benchmark for Context Learning in Coding
by: Zhu, Jiayuan, et al.
Published: (2026)

Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings
by: González, Alexandra, et al.
Published: (2025)

SEMODS: A Validated Dataset of Open-Source Software Engineering Models
by: González, Alexandra, et al.
Published: (2026)

Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation
by: Garg, Spandan, et al.
Published: (2025)

ThinkRepair: Self-Directed Automated Program Repair
by: Yin, Xin, et al.
Published: (2024)

The Impact of Program Reduction on Automated Program Repair
by: Vidziunas, Linas, et al.
Published: (2024)

HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair
by: Rabbi, Fazle, et al.
Published: (2026)

ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs
by: Kong, Jiaolong, et al.
Published: (2024)

CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows
by: Muna, Rabeya Khatun, et al.
Published: (2026)

RepairBench: Leaderboard of Frontier Models for Program Repair
by: Silva, André, et al.
Published: (2024)

SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
by: Oliva, Gustavo A., et al.
Published: (2025)

A Tool for Automatically Cataloguing and Selecting Pre-Trained Models and Datasets for Software Engineering
by: González, Alexandra, et al.
Published: (2026)

Specification Vibing for Automated Program Repair
by: Zhu, Taohong, et al.
Published: (2026)

Does SWE-Bench-Verified Test Agent Ability or Model Memory?
by: Prathifkumar, Thanosan, et al.
Published: (2025)

Lessons Learned from Mining the Hugging Face Repository
by: Castaño, Joel, et al.
Published: (2024)

Characterizing Datasets for LLM-based Requirements Engineering: A Systematic Mapping Study
by: Motger, Quim, et al.
Published: (2025)

Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench
by: Cheshkov, Anton, et al.
Published: (2024)

SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
by: Guo, Lianghong, et al.
Published: (2025)

Innovating for Tomorrow: The Convergence of SE and Green AI
by: Cruz, Luís, et al.
Published: (2024)

What About Emotions? Guiding Fine-Grained Emotion Extraction from Mobile App Reviews
by: Motger, Quim, et al.
Published: (2025)

PathFix: Automated Program Repair with Expected Path
by: He, Xu, et al.
Published: (2025)

On The Effectiveness of Dynamic Reduction Techniques in Automated Program Repair
by: Al-Bataineh, Omar I.
Published: (2024)

Towards Practical and Useful Automated Program Repair for Debugging
by: Xin, Qi, et al.
Published: (2024)

BUGSPHP: A dataset for Automated Program Repair in PHP
by: Pramod, K. D., et al.
Published: (2024)

ASAP-Repair: API-Specific Automated Program Repair Based on API Usage Graphs
by: Nielebock, Sebastian, et al.
Published: (2024)

Unveiling Competition Dynamics in Mobile App Markets through User Reviews
by: Motger, Quim, et al.
Published: (2023)

Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models
by: Oriol, Marc, et al.
Published: (2025)

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
by: Rashid, Muhammad Shihab, et al.
Published: (2025)

How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench
by: Sajadi, Amirali, et al.
Published: (2025)

Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis
by: Li, Fengjie, et al.
Published: (2024)

Software-Based Dialogue Systems: Survey, Taxonomy and Challenges
by: Motger, Quim, et al.
Published: (2021)

A Methodological Framework for LLM-Based Mining of Software Repositories
by: De Martino, Vincenzo, et al.
Published: (2025)

A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering
by: de Martino, Vincenzo, et al.
Published: (2024)

Automated Test Case Repair Using Language Models
by: Yaraghi, Ahmadreza Saboor, et al.
Published: (2024)

Automated Repair of C Programs Using Large Language Models
by: Farzandway, Mahdi, et al.
Published: (2025)