Saved in:
| Main Authors: | Ye, Wencheng, Liu, Yan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.03517 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SWE-TRACE: Optimizing Long-Horizon SWE Agents Through Rubric Process Reward Models and Heuristic Test-Time Scaling
by: Han, Hao, et al.
Published: (2026)
by: Han, Hao, et al.
Published: (2026)
SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents
by: Dihan, Mahir Labib, et al.
Published: (2026)
by: Dihan, Mahir Labib, et al.
Published: (2026)
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
GHIssuemarket: A Sandbox Environment for SWE-Agents Economic Experimentation
by: Fouad, Mohamed A., et al.
Published: (2024)
by: Fouad, Mohamed A., et al.
Published: (2024)
SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle
by: Guan, Hao, et al.
Published: (2026)
by: Guan, Hao, et al.
Published: (2026)
Does SWE-Bench-Verified Test Agent Ability or Model Memory?
by: Prathifkumar, Thanosan, et al.
Published: (2025)
by: Prathifkumar, Thanosan, et al.
Published: (2025)
From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents
by: Ludwig, Nikolai, et al.
Published: (2026)
by: Ludwig, Nikolai, et al.
Published: (2026)
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
by: Badertdinov, Ibragim, et al.
Published: (2026)
by: Badertdinov, Ibragim, et al.
Published: (2026)
Training Software Engineering Agents and Verifiers with SWE-Gym
by: Pan, Jiayi, et al.
Published: (2024)
by: Pan, Jiayi, et al.
Published: (2024)
SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution
by: Raghavendra, Mohit, et al.
Published: (2026)
by: Raghavendra, Mohit, et al.
Published: (2026)
When Agents go Astray: Course-Correcting SWE Agents with PRMs
by: Gandhi, Shubham, et al.
Published: (2025)
by: Gandhi, Shubham, et al.
Published: (2025)
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation
by: Sahoo, Priyam, et al.
Published: (2026)
by: Sahoo, Priyam, et al.
Published: (2026)
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
by: Liang, Jiarong, et al.
Published: (2026)
by: Liang, Jiarong, et al.
Published: (2026)
TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)
by: Zhou, Xuhui, et al.
Published: (2025)
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
by: He, Xinyi, et al.
Published: (2025)
by: He, Xinyi, et al.
Published: (2025)
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
by: Yang, Zonghan, et al.
Published: (2025)
by: Yang, Zonghan, et al.
Published: (2025)
Reproduction Test Generation for Java SWE Issues
by: Ahmed, Toufique, et al.
Published: (2026)
by: Ahmed, Toufique, et al.
Published: (2026)
SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024)
by: Aleithan, Reem, et al.
Published: (2024)
Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study
by: Wang, You, et al.
Published: (2025)
by: Wang, You, et al.
Published: (2025)
SWE-World: Building Software Engineering Agents in Docker-Free Environments
by: Sun, Shuang, et al.
Published: (2026)
by: Sun, Shuang, et al.
Published: (2026)
SWE-Bench Mobile: Can Large Language Model Agents Develop Industry-Level Mobile Applications?
by: Tian, Muxin, et al.
Published: (2026)
by: Tian, Muxin, et al.
Published: (2026)
SWE-Bench-CL: Continual Learning for Coding Agents
by: Joshi, Thomas, et al.
Published: (2025)
by: Joshi, Thomas, et al.
Published: (2025)
SWE-smith: Scaling Data for Software Engineering Agents
by: Yang, John, et al.
Published: (2025)
by: Yang, John, et al.
Published: (2025)
Saving SWE-Bench: A Benchmark Mutation Approach for Realistic Agent Evaluation
by: Garg, Spandan, et al.
Published: (2025)
by: Garg, Spandan, et al.
Published: (2025)
SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training
by: Song, Huatong, et al.
Published: (2026)
by: Song, Huatong, et al.
Published: (2026)
APEX-SWE
by: Kottamasu, Abhi, et al.
Published: (2026)
by: Kottamasu, Abhi, et al.
Published: (2026)
SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories
by: Wang, Junhao, et al.
Published: (2025)
by: Wang, Junhao, et al.
Published: (2025)
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
by: Chen, Guoxin, et al.
Published: (2026)
by: Chen, Guoxin, et al.
Published: (2026)
SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding
by: Tan, Boyin, et al.
Published: (2026)
by: Tan, Boyin, et al.
Published: (2026)
SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents
by: Zhang, Zhirui, et al.
Published: (2026)
by: Zhang, Zhirui, et al.
Published: (2026)
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
by: Badertdinov, Ibragim, et al.
Published: (2025)
by: Badertdinov, Ibragim, et al.
Published: (2025)
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
by: Deng, Xiang, et al.
Published: (2025)
by: Deng, Xiang, et al.
Published: (2025)
Resolving Java Code Repository Issues with iSWE Agent
by: Ganhotra, Jatin, et al.
Published: (2026)
by: Ganhotra, Jatin, et al.
Published: (2026)
Investigating Test Overfitting on SWE-bench
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades
by: Lam, Man Ho, et al.
Published: (2026)
by: Lam, Man Ho, et al.
Published: (2026)
Heterogeneous Prompting and Execution Feedback for SWE Issue Test Generation and Selection
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
Can Old Tests Do New Tricks for Resolving SWE Issues?
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
What's in a Benchmark? The Case of SWE-Bench in Automated Program Repair
by: Martinez, Matias, et al.
Published: (2026)
by: Martinez, Matias, et al.
Published: (2026)
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
by: Shetty, Manish, et al.
Published: (2025)
by: Shetty, Manish, et al.
Published: (2025)
SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints
by: Fan, Zhiyu, et al.
Published: (2025)
by: Fan, Zhiyu, et al.
Published: (2025)
Similar Items
-
SWE-TRACE: Optimizing Long-Horizon SWE Agents Through Rubric Process Reward Models and Heuristic Test-Time Scaling
by: Han, Hao, et al.
Published: (2026) -
SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents
by: Dihan, Mahir Labib, et al.
Published: (2026) -
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
by: Wang, Yuhang, et al.
Published: (2026) -
GHIssuemarket: A Sandbox Environment for SWE-Agents Economic Experimentation
by: Fouad, Mohamed A., et al.
Published: (2024) -
SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle
by: Guan, Hao, et al.
Published: (2026)