Saved in:
| Main Authors: | Li, Han, Zhu, Letian, Zhang, Bohan, Feng, Rili, Wang, Jiaming, Pan, Yue, Barr, Earl T., Sarro, Federica, Chu, Zhaoyang, Ye, He |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.05892 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
by: Liu, Shunyu, et al.
Published: (2025)
by: Liu, Shunyu, et al.
Published: (2025)
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
by: Weng, Haojun, et al.
Published: (2026)
by: Weng, Haojun, et al.
Published: (2026)
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)
by: Hu, Yuelin, et al.
Published: (2026)
CIFE: Code Instruction-Following Evaluation
by: Gunnu, Sravani, et al.
Published: (2025)
by: Gunnu, Sravani, et al.
Published: (2025)
elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings
by: Osborne, Philip, et al.
Published: (2025)
by: Osborne, Philip, et al.
Published: (2025)
Active Context Compression: Autonomous Memory Management in LLM Agents
by: Verma, Nikhil
Published: (2026)
by: Verma, Nikhil
Published: (2026)
Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study
by: Baltaji, Razan, et al.
Published: (2023)
by: Baltaji, Razan, et al.
Published: (2023)
CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)
by: Li, Han, et al.
Published: (2026)
Pareto-Optimized Open-Source LLMs for Healthcare via Context Retrieval
by: Bayarri-Planas, Jordi, et al.
Published: (2024)
by: Bayarri-Planas, Jordi, et al.
Published: (2024)
VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
by: Lee, Christine, et al.
Published: (2025)
by: Lee, Christine, et al.
Published: (2025)
Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
by: Iscan, Mehmet
Published: (2026)
by: Iscan, Mehmet
Published: (2026)
GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
by: Liu, Zhuoyao, et al.
Published: (2026)
by: Liu, Zhuoyao, et al.
Published: (2026)
BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks
by: Gandhi, Shubham, et al.
Published: (2024)
by: Gandhi, Shubham, et al.
Published: (2024)
ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
by: Li, Kenan, et al.
Published: (2026)
by: Li, Kenan, et al.
Published: (2026)
ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents
by: Li, Kenan, et al.
Published: (2026)
by: Li, Kenan, et al.
Published: (2026)
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
by: Jana, Prithwish, et al.
Published: (2023)
by: Jana, Prithwish, et al.
Published: (2023)
SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair
by: Dinu, Ion George, et al.
Published: (2026)
by: Dinu, Ion George, et al.
Published: (2026)
Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture
by: Iscan, Mehmet
Published: (2026)
by: Iscan, Mehmet
Published: (2026)
Addressing Data Leakage in HumanEval Using Combinatorial Test Design
by: Bradbury, Jeremy S., et al.
Published: (2024)
by: Bradbury, Jeremy S., et al.
Published: (2024)
Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
by: Xu, Qiyuan, et al.
Published: (2026)
by: Xu, Qiyuan, et al.
Published: (2026)
AgentModernize: Preserving Business Logic in Legacy Modernization with Multi-Agent LLMs and Behavioral Specification Graphs
by: Ahmed, Sheikh Nazib, et al.
Published: (2026)
by: Ahmed, Sheikh Nazib, et al.
Published: (2026)
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
by: Wang, Yihao, et al.
Published: (2026)
by: Wang, Yihao, et al.
Published: (2026)
LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB
by: Bekmyradov, Vekil, et al.
Published: (2026)
by: Bekmyradov, Vekil, et al.
Published: (2026)
VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs
by: Daneshvar, Seyed Shayan, et al.
Published: (2024)
by: Daneshvar, Seyed Shayan, et al.
Published: (2024)
LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters
by: van der Meer, Virgill, et al.
Published: (2026)
by: van der Meer, Virgill, et al.
Published: (2026)
From Scientific Texts to Verifiable Code: Automating the Process with Transformers
by: Wang, Changjie, et al.
Published: (2025)
by: Wang, Changjie, et al.
Published: (2025)
Merge-Bench: Resolve Merge Conflicts with Large Language Models
by: Schesch, Benedikt, et al.
Published: (2026)
by: Schesch, Benedikt, et al.
Published: (2026)
Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling
by: Kashyap, Ankit
Published: (2025)
by: Kashyap, Ankit
Published: (2025)
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
by: Zhu, Yuxuan, et al.
Published: (2025)
by: Zhu, Yuxuan, et al.
Published: (2025)
Can AI Assist in Olympiad Coding
by: Ren, Samuel
Published: (2025)
by: Ren, Samuel
Published: (2025)
Toward Architecture-Aware Evaluation Metrics for LLM Agents
by: Souza, Débora, et al.
Published: (2026)
by: Souza, Débora, et al.
Published: (2026)
WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
by: Liu, Bingnan, et al.
Published: (2026)
by: Liu, Bingnan, et al.
Published: (2026)
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
by: Huang, Yunpeng, et al.
Published: (2023)
by: Huang, Yunpeng, et al.
Published: (2023)
NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles
by: Jia, Xiao
Published: (2026)
by: Jia, Xiao
Published: (2026)
Convex Optimization for Alignment and Preference Learning on a Single GPU
by: Feng, Miria, et al.
Published: (2026)
by: Feng, Miria, et al.
Published: (2026)
Evaluating the efficacy of LLM Safety Solutions : The Palit Benchmark Dataset
by: Palit, Sayon, et al.
Published: (2025)
by: Palit, Sayon, et al.
Published: (2025)
ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks
by: Tanguturi, Samuel Sameer
Published: (2026)
by: Tanguturi, Samuel Sameer
Published: (2026)
Semantic Modeling for World-Centered Architectures
by: Mantsivoda, Andrei, et al.
Published: (2026)
by: Mantsivoda, Andrei, et al.
Published: (2026)
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation
by: Du, Bangde, et al.
Published: (2025)
by: Du, Bangde, et al.
Published: (2025)
Similar Items
-
RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
by: Liu, Shunyu, et al.
Published: (2025) -
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
by: Weng, Haojun, et al.
Published: (2026) -
AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026) -
CIFE: Code Instruction-Following Evaluation
by: Gunnu, Sravani, et al.
Published: (2025) -
elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings
by: Osborne, Philip, et al.
Published: (2025)