Saved in:
| Main Authors: | Soni, Aditya Bharat, Ghosh, Rajat, Bhargava, Vaishnavi, Chen, Valerie, Dutta, Debojyoti |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13713 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?
by: Bhargava, Vaishnavi, et al.
Published: (2024)
by: Bhargava, Vaishnavi, et al.
Published: (2024)
RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
by: Shah, Pratik, et al.
Published: (2025)
by: Shah, Pratik, et al.
Published: (2025)
A Multi-Agent Framework for Stateful Inference-Time Search
by: Lalan, Arshika, et al.
Published: (2025)
by: Lalan, Arshika, et al.
Published: (2025)
CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents
by: Pereira, Kristen, et al.
Published: (2026)
by: Pereira, Kristen, et al.
Published: (2026)
SWE-Refactor: A Repository-Level Benchmark for Real-World LLM-Based Code Refactoring
by: Xu, Yisen, et al.
Published: (2026)
by: Xu, Yisen, et al.
Published: (2026)
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories
by: Wang, Lilin, et al.
Published: (2025)
by: Wang, Lilin, et al.
Published: (2025)
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
by: He, Xinyi, et al.
Published: (2025)
by: He, Xinyi, et al.
Published: (2025)
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?
by: Miserendino, Samuel, et al.
Published: (2025)
by: Miserendino, Samuel, et al.
Published: (2025)
Reproduction Test Generation for Java SWE Issues
by: Ahmed, Toufique, et al.
Published: (2026)
by: Ahmed, Toufique, et al.
Published: (2026)
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
by: Ma, Jeffrey Jian, et al.
Published: (2025)
by: Ma, Jeffrey Jian, et al.
Published: (2025)
Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models
by: Nimmaturi, Datta, et al.
Published: (2025)
by: Nimmaturi, Datta, et al.
Published: (2025)
SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories
by: Wang, Junhao, et al.
Published: (2025)
by: Wang, Junhao, et al.
Published: (2025)
SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning
by: Peng, Jinjun, et al.
Published: (2026)
by: Peng, Jinjun, et al.
Published: (2026)
Otter: Generating Tests from Issues to Validate SWE Patches
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution
by: Raghavendra, Mohit, et al.
Published: (2026)
by: Raghavendra, Mohit, et al.
Published: (2026)
SWE-Exp: Experience-Driven Software Issue Resolution
by: Chen, Silin, et al.
Published: (2025)
by: Chen, Silin, et al.
Published: (2025)
Resolving Java Code Repository Issues with iSWE Agent
by: Ganhotra, Jatin, et al.
Published: (2026)
by: Ganhotra, Jatin, et al.
Published: (2026)
Action Shapley: A Training Data Selection Metric for World Model in Reinforcement Learning
by: Ghosh, Rajat, et al.
Published: (2026)
by: Ghosh, Rajat, et al.
Published: (2026)
BAR Conjecture: the Feasibility of Inference Budget-Constrained LLM Services with Authenticity and Reasoning
by: Zhou, Jinan, et al.
Published: (2025)
by: Zhou, Jinan, et al.
Published: (2025)
Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories
by: Hommersom, Daan, et al.
Published: (2021)
by: Hommersom, Daan, et al.
Published: (2021)
Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go
by: Pipalani, Yashshi, et al.
Published: (2025)
by: Pipalani, Yashshi, et al.
Published: (2025)
Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories
by: Koohjani, Matin, et al.
Published: (2025)
by: Koohjani, Matin, et al.
Published: (2025)
SWE-Universe: Scale Real-World Verifiable Environments to Millions
by: Chen, Mouxiang, et al.
Published: (2026)
by: Chen, Mouxiang, et al.
Published: (2026)
Investigating Test Overfitting on SWE-bench
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
by: Liang, Jiarong, et al.
Published: (2026)
by: Liang, Jiarong, et al.
Published: (2026)
An Empirical Validation of Open Source Repository Stability Metrics
by: Adejumo, Elijah Kayode, et al.
Published: (2025)
by: Adejumo, Elijah Kayode, et al.
Published: (2025)
GiveMeLabeledIssues: An Open Source Issue Recommendation System
by: Vargovich, Joseph, et al.
Published: (2023)
by: Vargovich, Joseph, et al.
Published: (2023)
SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding
by: Cai, Songcheng, et al.
Published: (2026)
by: Cai, Songcheng, et al.
Published: (2026)
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
by: Li, Han, et al.
Published: (2025)
by: Li, Han, et al.
Published: (2025)
Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing
by: Feng, Sidong, et al.
Published: (2025)
by: Feng, Sidong, et al.
Published: (2025)
SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024)
by: Aleithan, Reem, et al.
Published: (2024)
Classifying Issues in Open-source GitHub Repositories
by: Raaj, Amir Hossain, et al.
Published: (2025)
by: Raaj, Amir Hossain, et al.
Published: (2025)
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
by: Jimenez, Carlos E., et al.
Published: (2023)
by: Jimenez, Carlos E., et al.
Published: (2023)
HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-bench
by: Wang, Yueyang, et al.
Published: (2026)
by: Wang, Yueyang, et al.
Published: (2026)
SWE-Arena: An Interactive Platform for Evaluating Foundation Models in Software Engineering
by: Zhao, Zhimin
Published: (2025)
by: Zhao, Zhimin
Published: (2025)
R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents
by: Jain, Naman, et al.
Published: (2025)
by: Jain, Naman, et al.
Published: (2025)
Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects
by: He, Runzhi, et al.
Published: (2024)
by: He, Runzhi, et al.
Published: (2024)
The Product Beyond the Model -- An Empirical Study of Repositories of Open-Source ML Products
by: Nahar, Nadia, et al.
Published: (2023)
by: Nahar, Nadia, et al.
Published: (2023)
Are Autonomous Web Agents Good Testers?
by: Chevrot, Antoine, et al.
Published: (2025)
by: Chevrot, Antoine, et al.
Published: (2025)
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
by: Han, Tingxu, et al.
Published: (2026)
by: Han, Tingxu, et al.
Published: (2026)
Similar Items
-
CPP-UT-Bench: Can LLMs Write Complex Unit Tests in C++?
by: Bhargava, Vaishnavi, et al.
Published: (2024) -
RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
by: Shah, Pratik, et al.
Published: (2025) -
A Multi-Agent Framework for Stateful Inference-Time Search
by: Lalan, Arshika, et al.
Published: (2025) -
CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents
by: Pereira, Kristen, et al.
Published: (2026) -
SWE-Refactor: A Repository-Level Benchmark for Real-World LLM-Based Code Refactoring
by: Xu, Yisen, et al.
Published: (2026)