Saved in:
| Main Authors: | Zou, Qingyun, Cui, Jiahao, Chen, Nuo, He, Bingsheng, Wong, Weng-Fai |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench
by: Zou, Qingyun, et al.
Published: (2026)
by: Zou, Qingyun, et al.
Published: (2026)
HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning
by: Zou, Qingyun, et al.
Published: (2026)
by: Zou, Qingyun, et al.
Published: (2026)
AInsteinBench: Benchmarking Coding Agents on Scientific Repositories
by: Duston, Titouan, et al.
Published: (2025)
by: Duston, Titouan, et al.
Published: (2025)
JudgeLRM: Large Reasoning Models as a Judge
by: Chen, Nuo, et al.
Published: (2025)
by: Chen, Nuo, et al.
Published: (2025)
HLStrans: Dataset for C-to-HLS Hardware Code Synthesis
by: Zou, Qingyun, et al.
Published: (2025)
by: Zou, Qingyun, et al.
Published: (2025)
RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation
by: Wang, Yanli, et al.
Published: (2024)
by: Wang, Yanli, et al.
Published: (2024)
Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation
by: Chen, Nuo, et al.
Published: (2026)
by: Chen, Nuo, et al.
Published: (2026)
Beyond Brainstorming: What Drives High-Quality Scientific Ideas? Lessons from Multi-Agent Collaboration
by: Chen, Nuo, et al.
Published: (2025)
by: Chen, Nuo, et al.
Published: (2025)
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
by: Dong, Honghua, et al.
Published: (2025)
by: Dong, Honghua, et al.
Published: (2025)
Towards Repository-Level Program Verification with Large Language Models
by: Zhong, Si Cheng, et al.
Published: (2025)
by: Zhong, Si Cheng, et al.
Published: (2025)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
by: Tang, Xiangru, et al.
Published: (2023)
by: Tang, Xiangru, et al.
Published: (2023)
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
by: Paul, Indraneil, et al.
Published: (2024)
by: Paul, Indraneil, et al.
Published: (2024)
Beyond Code Pairs: Dialogue-Based Data Generation for LLM Code Translation
by: Chen, Le, et al.
Published: (2025)
by: Chen, Le, et al.
Published: (2025)
Bench4HLS: End-to-End Evaluation of LLMs in High-Level Synthesis Code Generation
by: Khan, M Zafir Sadik, et al.
Published: (2026)
by: Khan, M Zafir Sadik, et al.
Published: (2026)
VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code
by: Zeng, Lingfei, et al.
Published: (2025)
by: Zeng, Lingfei, et al.
Published: (2025)
Analysis of AdvFusion: Adapter-based Multilingual Learning for Code Large Language Models
by: Esmaeili, Amirreza, et al.
Published: (2025)
by: Esmaeili, Amirreza, et al.
Published: (2025)
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
by: Wang, Peiding, et al.
Published: (2025)
by: Wang, Peiding, et al.
Published: (2025)
A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks
by: Dandamudi, Rohit, et al.
Published: (2024)
by: Dandamudi, Rohit, et al.
Published: (2024)
OctoBench: Benchmarking Scaffold-Aware Instruction Following in Repository-Grounded Agentic Coding
by: Ding, Deming, et al.
Published: (2026)
by: Ding, Deming, et al.
Published: (2026)
Insights from the Usage of the Ansible Lightspeed Code Completion Service
by: Sahoo, Priyam, et al.
Published: (2024)
by: Sahoo, Priyam, et al.
Published: (2024)
Towards Formal Verification of LLM-Generated Code from Natural Language Prompts
by: Councilman, Aaron, et al.
Published: (2025)
by: Councilman, Aaron, et al.
Published: (2025)
CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization
by: Zhao, Yang, et al.
Published: (2024)
by: Zhao, Yang, et al.
Published: (2024)
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
by: Li, Jia, et al.
Published: (2024)
by: Li, Jia, et al.
Published: (2024)
FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
by: Ravi, Nikil, et al.
Published: (2026)
by: Ravi, Nikil, et al.
Published: (2026)
Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
by: Begolli, Igli, et al.
Published: (2025)
by: Begolli, Igli, et al.
Published: (2025)
Chain of Execution Supervision Promotes General Reasoning in Large Language Models
by: Chen, Nuo, et al.
Published: (2025)
by: Chen, Nuo, et al.
Published: (2025)
Position: The Current AI Conference Model is Unsustainable! Diagnosing the Crisis of Centralized AI Conference
by: Chen, Nuo, et al.
Published: (2025)
by: Chen, Nuo, et al.
Published: (2025)
How Do Humans Write Code? Large Models Do It the Same Way Too
by: Li, Long, et al.
Published: (2024)
by: Li, Long, et al.
Published: (2024)
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models
by: Wang, Rui, et al.
Published: (2025)
by: Wang, Rui, et al.
Published: (2025)
JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models
by: Cao, Jialun, et al.
Published: (2024)
by: Cao, Jialun, et al.
Published: (2024)
SeaExam and SeaBench: Benchmarking LLMs with Local Multilingual Questions in Southeast Asia
by: Liu, Chaoqun, et al.
Published: (2025)
by: Liu, Chaoqun, et al.
Published: (2025)
When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
by: Weng, Haojun, et al.
Published: (2026)
by: Weng, Haojun, et al.
Published: (2026)
VisCoder2: Building Multi-Language Visualization Coding Agents
by: Ni, Yuansheng, et al.
Published: (2025)
by: Ni, Yuansheng, et al.
Published: (2025)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code
by: Grand, Gabriel, et al.
Published: (2023)
by: Grand, Gabriel, et al.
Published: (2023)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
by: Le, Hung, et al.
Published: (2023)
by: Le, Hung, et al.
Published: (2023)
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
by: Zan, Daoguang, et al.
Published: (2024)
by: Zan, Daoguang, et al.
Published: (2024)
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code
by: Fang, Hainan, et al.
Published: (2025)
by: Fang, Hainan, et al.
Published: (2025)
SaraCoder: Orchestrating Semantic and Structural Cues for Resource-Optimized Repository-Level Code Completion
by: Chen, Xiaohan, et al.
Published: (2025)
by: Chen, Xiaohan, et al.
Published: (2025)
CodeMind: Evaluating Large Language Models for Code Reasoning
by: Liu, Changshu, et al.
Published: (2024)
by: Liu, Changshu, et al.
Published: (2024)
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?
by: Liang, Qingyuan, et al.
Published: (2025)
by: Liang, Qingyuan, et al.
Published: (2025)
Similar Items
-
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench
by: Zou, Qingyun, et al.
Published: (2026) -
HLS-Seek: QoR-Aware Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning
by: Zou, Qingyun, et al.
Published: (2026) -
AInsteinBench: Benchmarking Coding Agents on Scientific Repositories
by: Duston, Titouan, et al.
Published: (2025) -
JudgeLRM: Large Reasoning Models as a Judge
by: Chen, Nuo, et al.
Published: (2025) -
HLStrans: Dataset for C-to-HLS Hardware Code Synthesis
by: Zou, Qingyun, et al.
Published: (2025)