Saved in:
| Main Authors: | Han, Hao, Xie, Jin, Ma, Xuehao, Zhu, Weiquan, Zhang, Ziyao, Long, ZhiLiang, Chen, Hongkai, Ye, Qingwen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.14820 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
by: Huang, Jiawei, et al.
Published: (2026)
by: Huang, Jiawei, et al.
Published: (2026)
SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents
by: Ding, Yifeng, et al.
Published: (2026)
by: Ding, Yifeng, et al.
Published: (2026)
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
by: Badertdinov, Ibragim, et al.
Published: (2026)
by: Badertdinov, Ibragim, et al.
Published: (2026)
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
by: Deng, Xiang, et al.
Published: (2025)
by: Deng, Xiang, et al.
Published: (2025)
Investigating Test Overfitting on SWE-bench
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
Reproduction Test Generation for Java SWE Issues
by: Ahmed, Toufique, et al.
Published: (2026)
by: Ahmed, Toufique, et al.
Published: (2026)
SWE-smith: Scaling Data for Software Engineering Agents
by: Yang, John, et al.
Published: (2025)
by: Yang, John, et al.
Published: (2025)
Does SWE-Bench-Verified Test Agent Ability or Model Memory?
by: Prathifkumar, Thanosan, et al.
Published: (2025)
by: Prathifkumar, Thanosan, et al.
Published: (2025)
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering
by: Zeng, Guangtao, et al.
Published: (2025)
by: Zeng, Guangtao, et al.
Published: (2025)
APEX-SWE
by: Kottamasu, Abhi, et al.
Published: (2026)
by: Kottamasu, Abhi, et al.
Published: (2026)
ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents
by: Li, Kenan, et al.
Published: (2026)
by: Li, Kenan, et al.
Published: (2026)
From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents
by: Ludwig, Nikolai, et al.
Published: (2026)
by: Ludwig, Nikolai, et al.
Published: (2026)
SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents
by: Dihan, Mahir Labib, et al.
Published: (2026)
by: Dihan, Mahir Labib, et al.
Published: (2026)
U2F: Encouraging SWE-Agent to Seize Novelty without Losing Feasibility
by: Ye, Wencheng, et al.
Published: (2025)
by: Ye, Wencheng, et al.
Published: (2025)
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
SWE-bench Goes Live!
by: Zhang, Linghao, et al.
Published: (2025)
by: Zhang, Linghao, et al.
Published: (2025)
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
by: Shetty, Manish, et al.
Published: (2025)
by: Shetty, Manish, et al.
Published: (2025)
Training Software Engineering Agents and Verifiers with SWE-Gym
by: Pan, Jiayi, et al.
Published: (2024)
by: Pan, Jiayi, et al.
Published: (2024)
GHIssuemarket: A Sandbox Environment for SWE-Agents Economic Experimentation
by: Fouad, Mohamed A., et al.
Published: (2024)
by: Fouad, Mohamed A., et al.
Published: (2024)
Otter: Generating Tests from Issues to Validate SWE Patches
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
Heterogeneous Prompting and Execution Feedback for SWE Issue Test Generation and Selection
by: Ahmed, Toufique, et al.
Published: (2025)
by: Ahmed, Toufique, et al.
Published: (2025)
Can Old Tests Do New Tricks for Resolving SWE Issues?
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
SWE-Bench-CL: Continual Learning for Coding Agents
by: Joshi, Thomas, et al.
Published: (2025)
by: Joshi, Thomas, et al.
Published: (2025)
SWE-Universe: Scale Real-World Verifiable Environments to Millions
by: Chen, Mouxiang, et al.
Published: (2026)
by: Chen, Mouxiang, et al.
Published: (2026)
When Agents go Astray: Course-Correcting SWE Agents with PRMs
by: Gandhi, Shubham, et al.
Published: (2025)
by: Gandhi, Shubham, et al.
Published: (2025)
AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation
by: Sahoo, Priyam, et al.
Published: (2026)
by: Sahoo, Priyam, et al.
Published: (2026)
TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)
by: Zhou, Xuhui, et al.
Published: (2025)
SWE Atlas: Benchmarking Coding Agents Beyond Issue Resolution
by: Raghavendra, Mohit, et al.
Published: (2026)
by: Raghavendra, Mohit, et al.
Published: (2026)
SWE-Bench+: Enhanced Coding Benchmark for LLMs
by: Aleithan, Reem, et al.
Published: (2024)
by: Aleithan, Reem, et al.
Published: (2024)
UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench
by: Yu, Boxi, et al.
Published: (2025)
by: Yu, Boxi, et al.
Published: (2025)
daVinci-Env: Open SWE Environment Synthesis at Scale
by: Fu, Dayuan, et al.
Published: (2026)
by: Fu, Dayuan, et al.
Published: (2026)
SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories
by: Wang, Junhao, et al.
Published: (2025)
by: Wang, Junhao, et al.
Published: (2025)
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
by: Yang, Zonghan, et al.
Published: (2025)
by: Yang, Zonghan, et al.
Published: (2025)
Resolving Java Code Repository Issues with iSWE Agent
by: Ganhotra, Jatin, et al.
Published: (2026)
by: Ganhotra, Jatin, et al.
Published: (2026)
SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle
by: Guan, Hao, et al.
Published: (2026)
by: Guan, Hao, et al.
Published: (2026)
SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
by: Liang, Jiarong, et al.
Published: (2026)
by: Liang, Jiarong, et al.
Published: (2026)
SWE-World: Building Software Engineering Agents in Docker-Free Environments
by: Sun, Shuang, et al.
Published: (2026)
by: Sun, Shuang, et al.
Published: (2026)
SWE-Bench Mobile: Can Large Language Model Agents Develop Industry-Level Mobile Applications?
by: Tian, Muxin, et al.
Published: (2026)
by: Tian, Muxin, et al.
Published: (2026)
SWE-chat: Coding Agent Interactions From Real Users in the Wild
by: Baumann, Joachim, et al.
Published: (2026)
by: Baumann, Joachim, et al.
Published: (2026)
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
by: Li, Han, et al.
Published: (2025)
by: Li, Han, et al.
Published: (2025)
Similar Items
-
Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
by: Huang, Jiawei, et al.
Published: (2026) -
SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents
by: Ding, Yifeng, et al.
Published: (2026) -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
by: Badertdinov, Ibragim, et al.
Published: (2026) -
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?
by: Deng, Xiang, et al.
Published: (2025) -
Investigating Test Overfitting on SWE-bench
by: Ahmed, Toufique, et al.
Published: (2025)