:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zou, Qingyun, Cui, Jiahao, Chen, Nuo, He, Bingsheng, Wong, Weng-Fai
Format:	Preprint
Published:	2026
Subjects:	Programming Languages Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.03708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench
by: Zou, Qingyun, et al.
Published: (2026)

HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning
by: Zou, Qingyun, et al.
Published: (2026)

AInsteinBench: Benchmarking Coding Agents on Scientific Repositories
by: Duston, Titouan, et al.
Published: (2025)

JudgeLRM: Large Reasoning Models as a Judge
by: Chen, Nuo, et al.
Published: (2025)

HLStrans: Dataset for C-to-HLS Hardware Code Synthesis
by: Zou, Qingyun, et al.
Published: (2025)

RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation
by: Wang, Yanli, et al.
Published: (2024)

Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation
by: Chen, Nuo, et al.
Published: (2026)

Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration
by: Chen, Nuo, et al.
Published: (2025)

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
by: Dong, Honghua, et al.
Published: (2025)

Towards Repository-Level Program Verification with Large Language Models
by: Zhong, Si Cheng, et al.
Published: (2025)

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
by: Tang, Xiangru, et al.
Published: (2023)

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
by: Paul, Indraneil, et al.
Published: (2024)

Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
by: Chen, Le, et al.
Published: (2025)

Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation
by: Khan, M Zafir Sadik, et al.
Published: (2026)

VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
by: Zeng, Lingfei, et al.
Published: (2025)

Analysis of AdvFusion: Adapter-based Multilingual Learning for Code Large Language Models
by: Esmaeili, Amirreza, et al.
Published: (2025)

CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
by: Wang, Peiding, et al.
Published: (2025)

A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks
by: Dandamudi, Rohit, et al.
Published: (2024)

OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding
by: Ding, Deming, et al.
Published: (2026)

Insights from the Usage of the Ansible Lightspeed Code Completion Service
by: Sahoo, Priyam, et al.
Published: (2024)

Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
by: Councilman, Aaron, et al.
Published: (2025)

CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
by: Zhao, Yang, et al.
Published: (2024)

EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
by: Li, Jia, et al.
Published: (2024)

FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
by: Ravi, Nikil, et al.
Published: (2026)

Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
by: Begolli, Igli, et al.
Published: (2025)

Chain of Execution Supervision Promotes General Reasoning in Large Language Models
by: Chen, Nuo, et al.
Published: (2025)

Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference
by: Chen, Nuo, et al.
Published: (2025)

How Do Humans Write Code? Large Models Do It the Same Way Too
by: Li, Long, et al.
Published: (2024)

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
by: Wang, Rui, et al.
Published: (2025)

JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
by: Cao, Jialun, et al.
Published: (2024)

SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025)

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
by: Weng, Haojun, et al.
Published: (2026)

VisCoder2: Building Multi-Language Visualization Coding Agents
by: Ni, Yuansheng, et al.
Published: (2025)

LILO: Learning Interpretable Libraries by Compressing and Documenting Code
by: Grand, Gabriel, et al.
Published: (2023)

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
by: Le, Hung, et al.
Published: (2023)

CodeS: Natural Language to Code Repository via Multi-Layer Sketch
by: Zan, Daoguang, et al.
Published: (2024)

QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code
by: Fang, Hainan, et al.
Published: (2025)

SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion
by: Chen, Xiaohan, et al.
Published: (2025)

CodeMind: Evaluating Large Language Models for Code Reasoning
by: Liu, Changshu, et al.
Published: (2024)

Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?
by: Liang, Qingyuan, et al.
Published: (2025)