:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Han, Zhu, Letian, Zhang, Bohan, Feng, Rili, Wang, Jiaming, Pan, Yue, Barr, Earl T., Sarro, Federica, Chu, Zhaoyang, Ye, He
Format:	Preprint
Published:	2026
Subjects:	Machine Learning D.2.5; I.2.2; I.2.7
Online Access:	https://arxiv.org/abs/2602.05892
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code
by: Liu, Shunyu, et al.
Published: (2025)

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
by: Weng, Haojun, et al.
Published: (2026)

AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair
by: Hu, Yuelin, et al.
Published: (2026)

CIFE: Code Instruction-Following Evaluation
by: Gunnu, Sravani, et al.
Published: (2025)

elsciRL: Integrating Language Solutions into Reinforcement Learning Problem Settings
by: Osborne, Philip, et al.
Published: (2025)

Active Context Compression: Autonomous Memory Management in LLM Agents
by: Verma, Nikhil
Published: (2026)

Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study
by: Baltaji, Razan, et al.
Published: (2023)

CodeTracer: Towards Traceable Agent States
by: Li, Han, et al.
Published: (2026)

Pareto-Optimized Open-Source LLMs for Healthcare via Context Retrieval
by: Bayarri-Planas, Jordi, et al.
Published: (2024)

VeriPlan: Integrating Formal Verification and LLMs into End-User Planning
by: Lee, Christine, et al.
Published: (2025)

Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents
by: Iscan, Mehmet
Published: (2026)

GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
by: Liu, Zhuoyao, et al.
Published: (2026)

BudgetMLAgent: A Cost-Effective LLM Multi-Agent system for Automating Machine Learning Tasks
by: Gandhi, Shubham, et al.
Published: (2024)

ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges
by: Zhou, Yue, et al.
Published: (2025)

RepoLaunch: Automating Build&Test Pipeline of Code Repositories on ANY Language and ANY Platform
by: Li, Kenan, et al.
Published: (2026)

ORACLE-SWE: Quantifying the Contribution of Oracle Information Signals on SWE Agents
by: Li, Kenan, et al.
Published: (2026)

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
by: Jana, Prithwish, et al.
Published: (2023)

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair
by: Dinu, Ion George, et al.
Published: (2026)

Feedback-Normalized Developer Memory for Reinforcement-Learning Coding Agents: A Safety-Gated MCP Architecture
by: Iscan, Mehmet
Published: (2026)

Addressing Data Leakage in HumanEval Using Combinatorial Test Design
by: Bradbury, Jeremy S., et al.
Published: (2024)

Neural Theorem Proving for Verification Conditions: A Real-World Benchmark
by: Xu, Qiyuan, et al.
Published: (2026)

AgentModernize: Preserving Business Logic in Legacy Modernization with Multi-Agent LLMs and Behavioral Specification Graphs
by: Ahmed, Sheikh Nazib, et al.
Published: (2026)

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
by: Wang, Yihao, et al.
Published: (2026)

LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB
by: Bekmyradov, Vekil, et al.
Published: (2026)

VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs
by: Daneshvar, Seyed Shayan, et al.
Published: (2024)

LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters
by: van der Meer, Virgill, et al.
Published: (2026)

From Scientific Texts to Verifiable Code: Automating the Process with Transformers
by: Wang, Changjie, et al.
Published: (2025)

Merge-Bench: Resolve Merge Conflicts with Large Language Models
by: Schesch, Benedikt, et al.
Published: (2026)

Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling
by: Kashyap, Ankit
Published: (2025)

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
by: Zhu, Yuxuan, et al.
Published: (2025)

Can AI Assist in Olympiad Coding
by: Ren, Samuel
Published: (2025)

Toward Architecture-Aware Evaluation Metrics for LLM Agents
by: Souza, Débora, et al.
Published: (2026)

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
by: Liu, Bingnan, et al.
Published: (2026)

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
by: Huang, Yunpeng, et al.
Published: (2023)

NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles
by: Jia, Xiao
Published: (2026)

Convex Optimization for Alignment and Preference Learning on a Single GPU
by: Feng, Miria, et al.
Published: (2026)

Evaluating the efficacy of LLM Safety Solutions : The Palit Benchmark Dataset
by: Palit, Sayon, et al.
Published: (2025)

ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks
by: Tanguturi, Samuel Sameer
Published: (2026)

Semantic Modeling for World-Centered Architectures
by: Mantsivoda, Andrei, et al.
Published: (2026)

TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation
by: Du, Bangde, et al.
Published: (2025)