:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Soni, Aditya Bharat, Ghosh, Rajat, Bhargava, Vaishnavi, Chen, Valerie, Dutta, Debojyoti
Format:	Preprint
Published:	2026
Subjects:	Software Engineering Machine Learning
Online Access:	https://arxiv.org/abs/2601.13713
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?
by: Bhargava, Vaishnavi, et al.
Published: (2024)

RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
by: Shah, Pratik, et al.
Published: (2025)

A Multi-Agent Framework for Stateful Inference-Time Search
by: Lalan, Arshika, et al.
Published: (2025)

CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents
by: Pereira, Kristen, et al.
Published: (2026)

SWE-Refactor: A Repository-Level Benchmark for Real-World LLM-Based Code Refactoring
by: Xu, Yisen, et al.
Published: (2026)

SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories
by: Wang, Lilin, et al.
Published: (2025)

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
by: He, Xinyi, et al.
Published: (2025)

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
by: Miserendino, Samuel, et al.
Published: (2025)

Reproduction Test Generation for Java SWE Issues
by: Ahmed, Toufique, et al.
Published: (2026)

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
by: Ma, Jeffrey Jian, et al.
Published: (2025)

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models
by: Nimmaturi, Datta, et al.
Published: (2025)

SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories
by: Wang, Junhao, et al.
Published: (2025)

SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning
by: Peng, Jinjun, et al.
Published: (2026)

Otter: Generating Tests from Issues to Validate SWE Patches
by: Ahmed, Toufique, et al.
Published: (2025)

SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution
by: Raghavendra, Mohit, et al.
Published: (2026)

SWE-Exp: Experience-Driven Software Issue Resolution
by: Chen, Silin, et al.
Published: (2025)

Resolving Java Code Repository Issues with iSWE Agent
by: Ganhotra, Jatin, et al.
Published: (2026)

Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning
by: Ghosh, Rajat, et al.
Published: (2026)

BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning
by: Zhou, Jinan, et al.
Published: (2025)

Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories
by: Hommersom, Daan, et al.
Published: (2021)

Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go
by: Pipalani, Yashshi, et al.
Published: (2025)

Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories
by: Koohjani, Matin, et al.
Published: (2025)

SWE-Universe: Scale Real-World Verifiable Environments to Millions
by: Chen, Mouxiang, et al.
Published: (2026)

Investigating Test Overfitting on SWE-bench
by: Ahmed, Toufique, et al.
Published: (2025)

SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
by: Liang, Jiarong, et al.
Published: (2026)

An Empirical Validation of Open Source Repository Stability Metrics
by: Adejumo, Elijah Kayode, et al.
Published: (2025)

GiveMeLabeledIssues: An Open Source Issue Recommendation System
by: Vargovich, Joseph, et al.
Published: (2023)

SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding
by: Cai, Songcheng, et al.
Published: (2026)

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
by: Li, Han, et al.
Published: (2025)

Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing
by: Feng, Sidong, et al.
Published: (2025)

SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024)

Classifying Issues in Open-source GitHub Repositories
by: Raaj, Amir Hossain, et al.
Published: (2025)

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
by: Jimenez, Carlos E., et al.
Published: (2023)

HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench
by: Wang, Yueyang, et al.
Published: (2026)

SWE-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
by: Zhao, Zhimin
Published: (2025)

R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents
by: Jain, Naman, et al.
Published: (2025)

Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects
by: He, Runzhi, et al.
Published: (2024)

The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML Products
by: Nahar, Nadia, et al.
Published: (2023)

Are Autonomous Web Agents Good Testers?
by: Chevrot, Antoine, et al.
Published: (2025)

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
by: Han, Tingxu, et al.
Published: (2026)