Saved in:
| Main Authors: | Saebo, Magnus, Gibson, Spencer, Crosse, Tyler, Menon, Achyutha, Jang, Eyon, Cruz, Diogo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03456 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
by: Menon, Achyutha, et al.
Published: (2026)
by: Menon, Achyutha, et al.
Published: (2026)
SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning
by: Peng, Jinjun, et al.
Published: (2026)
by: Peng, Jinjun, et al.
Published: (2026)
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models
by: Nair, Inderjeet, et al.
Published: (2026)
by: Nair, Inderjeet, et al.
Published: (2026)
Exploring Language Model's Code Generation Ability with Auxiliary Functions
by: Lee, Seonghyeon, et al.
Published: (2024)
by: Lee, Seonghyeon, et al.
Published: (2024)
Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation
by: Yeo, Sangyeop, et al.
Published: (2025)
by: Yeo, Sangyeop, et al.
Published: (2025)
Eliciting Instruction-tuned Code Language Models' Capabilities to Utilize Auxiliary Function for Code Generation
by: Lee, Seonghyeon, et al.
Published: (2024)
by: Lee, Seonghyeon, et al.
Published: (2024)
Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search
by: Zhao, Pinhan, et al.
Published: (2025)
by: Zhao, Pinhan, et al.
Published: (2025)
LocAgent: Graph-Guided LLM Agents for Code Localization
by: Chen, Zhaoling, et al.
Published: (2025)
by: Chen, Zhaoling, et al.
Published: (2025)
CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents
by: Sutawika, Lintang, et al.
Published: (2026)
by: Sutawika, Lintang, et al.
Published: (2026)
Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction
by: Bao, Han, et al.
Published: (2026)
by: Bao, Han, et al.
Published: (2026)
Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025)
by: Zhu, Yiqi, et al.
Published: (2025)
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
by: Lee, Seonghyeon, et al.
Published: (2025)
by: Lee, Seonghyeon, et al.
Published: (2025)
Effective Harness Engineering for Algorithm Discovery with Coding Agents
by: Ishibashi, Yoichi, et al.
Published: (2026)
by: Ishibashi, Yoichi, et al.
Published: (2026)
AInsteinBench: Benchmarking Coding Agents on Scientific Repositories
by: Duston, Titouan, et al.
Published: (2025)
by: Duston, Titouan, et al.
Published: (2025)
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
by: Li, Yinxi, et al.
Published: (2025)
by: Li, Yinxi, et al.
Published: (2025)
Code Broker: A Multi-Agent System for Automated Code Quality Assessment
by: Attrah, Samer
Published: (2026)
by: Attrah, Samer
Published: (2026)
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination
by: Chen, Simin, et al.
Published: (2025)
by: Chen, Simin, et al.
Published: (2025)
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1
by: Ma, Qianli, et al.
Published: (2025)
by: Ma, Qianli, et al.
Published: (2025)
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks
by: Orlanski, Gabriel, et al.
Published: (2026)
by: Orlanski, Gabriel, et al.
Published: (2026)
Can Coding Agents Reproduce Findings in Computational Materials Science?
by: Huang, Ziyang, et al.
Published: (2026)
by: Huang, Ziyang, et al.
Published: (2026)
OmniCode: A Benchmark for Evaluating Software Engineering Agents
by: Sonwane, Atharv, et al.
Published: (2026)
by: Sonwane, Atharv, et al.
Published: (2026)
CodeR: Issue Resolving with Multi-Agent and Task Graphs
by: Chen, Dong, et al.
Published: (2024)
by: Chen, Dong, et al.
Published: (2024)
SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents
by: Zhao, Bingchen, et al.
Published: (2026)
by: Zhao, Bingchen, et al.
Published: (2026)
MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents
by: Zhu, Ming, et al.
Published: (2024)
by: Zhu, Ming, et al.
Published: (2024)
CRANE: Constrained Reasoning Injection for Code Agents via Nullspace Editing
by: Zhu, Mingzhi, et al.
Published: (2026)
by: Zhu, Mingzhi, et al.
Published: (2026)
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades
by: Lam, Man Ho, et al.
Published: (2026)
by: Lam, Man Ho, et al.
Published: (2026)
CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents
by: Pereira, Kristen, et al.
Published: (2026)
by: Pereira, Kristen, et al.
Published: (2026)
VisCoder2: Building Multi-Language Visualization Coding Agents
by: Ni, Yuansheng, et al.
Published: (2025)
by: Ni, Yuansheng, et al.
Published: (2025)
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging
by: Islam, Md. Ashraful, et al.
Published: (2025)
by: Islam, Md. Ashraful, et al.
Published: (2025)
Evaluating LLM-Based Goal Extraction in Requirements Engineering: Prompting Strategies and Their Limitations
by: Arnaudo, Anna, et al.
Published: (2026)
by: Arnaudo, Anna, et al.
Published: (2026)
Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces
by: Yu, Jiapeng, et al.
Published: (2024)
by: Yu, Jiapeng, et al.
Published: (2024)
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries
by: Zhang, Xing, et al.
Published: (2026)
by: Zhang, Xing, et al.
Published: (2026)
Confucius Code Agent: Scalable Agent Scaffolding for Real-World Codebases
by: Wong, Sherman, et al.
Published: (2025)
by: Wong, Sherman, et al.
Published: (2025)
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
by: Li, Jia, et al.
Published: (2024)
by: Li, Jia, et al.
Published: (2024)
IndustryCode: A Benchmark for Industry Code Generation
by: Zeng, Puyu, et al.
Published: (2026)
by: Zeng, Puyu, et al.
Published: (2026)
Rethinking Code Refinement: Learning to Judge Code Efficiency
by: Seo, Minju, et al.
Published: (2024)
by: Seo, Minju, et al.
Published: (2024)
LLM Agents Improve Semantic Code Search
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
From I/O to Code with Discovery Agent
by: Dong, Yihong, et al.
Published: (2026)
by: Dong, Yihong, et al.
Published: (2026)
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
by: Zheng, Tianyu, et al.
Published: (2024)
by: Zheng, Tianyu, et al.
Published: (2024)
CodeMirage: Hallucinations in Code Generated by Large Language Models
by: Agarwal, Vibhor, et al.
Published: (2024)
by: Agarwal, Vibhor, et al.
Published: (2024)
Similar Items
-
Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
by: Menon, Achyutha, et al.
Published: (2026) -
SWE-Spot: Building Small Repo-Experts with Repository-Centric Learning
by: Peng, Jinjun, et al.
Published: (2026) -
Value-Conflict Diagnostics Reveal Widespread Alignment Faking in Language Models
by: Nair, Inderjeet, et al.
Published: (2026) -
Exploring Language Model's Code Generation Ability with Auxiliary Functions
by: Lee, Seonghyeon, et al.
Published: (2024) -
Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation
by: Yeo, Sangyeop, et al.
Published: (2025)