Saved in:
| Main Authors: | Martinez, Matias, Franch, Xavier |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.04449 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
by: Martinez, Matias, et al.
Published: (2025)
by: Martinez, Matias, et al.
Published: (2025)
Energy Consumption of Automated Program Repair
by: Martinez, Matias, et al.
Published: (2022)
by: Martinez, Matias, et al.
Published: (2022)
Automated Requirements Relation Extraction
by: Motger, Quim, et al.
Published: (2024)
by: Motger, Quim, et al.
Published: (2024)
SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024)
by: Aleithan, Reem, et al.
Published: (2024)
SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks
by: Mhatre, Sanket, et al.
Published: (2025)
by: Mhatre, Sanket, et al.
Published: (2025)
SWE Context Bench: A Benchmark for Context Learning in Coding
by: Zhu, Jiayuan, et al.
Published: (2026)
by: Zhu, Jiayuan, et al.
Published: (2026)
Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings
by: González, Alexandra, et al.
Published: (2025)
by: González, Alexandra, et al.
Published: (2025)
SEMODS: A Validated Dataset of Open-Source Software Engineering Models
by: González, Alexandra, et al.
Published: (2026)
by: González, Alexandra, et al.
Published: (2026)
Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation
by: Garg, Spandan, et al.
Published: (2025)
by: Garg, Spandan, et al.
Published: (2025)
ThinkRepair: Self-Directed Automated Program Repair
by: Yin, Xin, et al.
Published: (2024)
by: Yin, Xin, et al.
Published: (2024)
The Impact of Program Reduction on Automated Program Repair
by: Vidziunas, Linas, et al.
Published: (2024)
by: Vidziunas, Linas, et al.
Published: (2024)
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair
by: Rabbi, Fazle, et al.
Published: (2026)
by: Rabbi, Fazle, et al.
Published: (2026)
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs
by: Kong, Jiaolong, et al.
Published: (2024)
by: Kong, Jiaolong, et al.
Published: (2024)
CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows
by: Muna, Rabeya Khatun, et al.
Published: (2026)
by: Muna, Rabeya Khatun, et al.
Published: (2026)
RepairBench: Leaderboard of Frontier Models for Program Repair
by: Silva, André, et al.
Published: (2024)
by: Silva, André, et al.
Published: (2024)
SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
by: Oliva, Gustavo A., et al.
Published: (2025)
by: Oliva, Gustavo A., et al.
Published: (2025)
A Tool for Automatically Cataloguing and Selecting Pre-Trained Models and Datasets for Software Engineering
by: González, Alexandra, et al.
Published: (2026)
by: González, Alexandra, et al.
Published: (2026)
Specification Vibing for Automated Program Repair
by: Zhu, Taohong, et al.
Published: (2026)
by: Zhu, Taohong, et al.
Published: (2026)
Does SWE-Bench-Verified Test Agent Ability or Model Memory?
by: Prathifkumar, Thanosan, et al.
Published: (2025)
by: Prathifkumar, Thanosan, et al.
Published: (2025)
Lessons Learned from Mining the Hugging Face Repository
by: Castaño, Joel, et al.
Published: (2024)
by: Castaño, Joel, et al.
Published: (2024)
Characterizing Datasets for LLM-based Requirements Engineering: A Systematic Mapping Study
by: Motger, Quim, et al.
Published: (2025)
by: Motger, Quim, et al.
Published: (2025)
Exploring the Potential of Conversational Test Suite Based Program Repair on SWE-bench
by: Cheshkov, Anton, et al.
Published: (2024)
by: Cheshkov, Anton, et al.
Published: (2024)
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks
by: Guo, Lianghong, et al.
Published: (2025)
by: Guo, Lianghong, et al.
Published: (2025)
Innovating for Tomorrow: The Convergence of SE and Green AI
by: Cruz, Luís, et al.
Published: (2024)
by: Cruz, Luís, et al.
Published: (2024)
What About Emotions? Guiding Fine-Grained Emotion Extraction from Mobile App Reviews
by: Motger, Quim, et al.
Published: (2025)
by: Motger, Quim, et al.
Published: (2025)
PathFix: Automated Program Repair with Expected Path
by: He, Xu, et al.
Published: (2025)
by: He, Xu, et al.
Published: (2025)
On The Effectiveness of Dynamic Reduction Techniques in Automated Program Repair
by: Al-Bataineh, Omar I.
Published: (2024)
by: Al-Bataineh, Omar I.
Published: (2024)
Towards Practical and Useful Automated Program Repair for Debugging
by: Xin, Qi, et al.
Published: (2024)
by: Xin, Qi, et al.
Published: (2024)
BUGSPHP: A dataset for Automated Program Repair in PHP
by: Pramod, K. D., et al.
Published: (2024)
by: Pramod, K. D., et al.
Published: (2024)
ASAP-Repair: API-Specific Automated Program Repair Based on API Usage Graphs
by: Nielebock, Sebastian, et al.
Published: (2024)
by: Nielebock, Sebastian, et al.
Published: (2024)
Unveiling Competition Dynamics in Mobile App Markets through User Reviews
by: Motger, Quim, et al.
Published: (2023)
by: Motger, Quim, et al.
Published: (2023)
Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models
by: Oriol, Marc, et al.
Published: (2025)
by: Oriol, Marc, et al.
Published: (2025)
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
by: Rashid, Muhammad Shihab, et al.
Published: (2025)
by: Rashid, Muhammad Shihab, et al.
Published: (2025)
How Safe Are AI-Generated Patches? A Large-scale Study on Security Risks in LLM and Agentic Automated Program Repair on SWE-bench
by: Sajadi, Amirali, et al.
Published: (2025)
by: Sajadi, Amirali, et al.
Published: (2025)
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis
by: Li, Fengjie, et al.
Published: (2024)
by: Li, Fengjie, et al.
Published: (2024)
Software-Based Dialogue Systems: Survey, Taxonomy and Challenges
by: Motger, Quim, et al.
Published: (2021)
by: Motger, Quim, et al.
Published: (2021)
A Methodological Framework for LLM-Based Mining of Software Repositories
by: De Martino, Vincenzo, et al.
Published: (2025)
by: De Martino, Vincenzo, et al.
Published: (2025)
A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering
by: de Martino, Vincenzo, et al.
Published: (2024)
by: de Martino, Vincenzo, et al.
Published: (2024)
Automated Test Case Repair Using Language Models
by: Yaraghi, Ahmadreza Saboor, et al.
Published: (2024)
by: Yaraghi, Ahmadreza Saboor, et al.
Published: (2024)
Automated Repair of C Programs Using Large Language Models
by: Farzandway, Mahdi, et al.
Published: (2025)
by: Farzandway, Mahdi, et al.
Published: (2025)
Similar Items
-
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
by: Martinez, Matias, et al.
Published: (2025) -
Energy Consumption of Automated Program Repair
by: Martinez, Matias, et al.
Published: (2022) -
Automated Requirements Relation Extraction
by: Motger, Quim, et al.
Published: (2024) -
SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024) -
SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks
by: Mhatre, Sanket, et al.
Published: (2025)