Saved in:
| Main Authors: | Fei, Tianxiang, Chen, Cheng, Pan, Yue, Zheng, Mao, Song, Mingyang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.14914 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains
by: Li, Zheng, et al.
Published: (2026)
by: Li, Zheng, et al.
Published: (2026)
Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study
by: Song, Mingyang, et al.
Published: (2024)
by: Song, Mingyang, et al.
Published: (2024)
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
by: Song, Mingyang, et al.
Published: (2025)
by: Song, Mingyang, et al.
Published: (2025)
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models
by: Song, Mingyang, et al.
Published: (2025)
by: Song, Mingyang, et al.
Published: (2025)
Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language Models
by: Song, Mingyang, et al.
Published: (2024)
by: Song, Mingyang, et al.
Published: (2024)
Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions
by: Song, Mingyang, et al.
Published: (2026)
by: Song, Mingyang, et al.
Published: (2026)
A Survey of Query Optimization in Large Language Models
by: Song, Mingyang, et al.
Published: (2024)
by: Song, Mingyang, et al.
Published: (2024)
A Survey of On-Policy Distillation for Large Language Models
by: Song, Mingyang, et al.
Published: (2026)
by: Song, Mingyang, et al.
Published: (2026)
PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment
by: Xu, Chenning, et al.
Published: (2026)
by: Xu, Chenning, et al.
Published: (2026)
Beyond the Illusion of Consensus: From Surface Heuristics to Knowledge-Grounded Evaluation in LLM-as-a-Judge
by: Song, Mingyang, et al.
Published: (2026)
by: Song, Mingyang, et al.
Published: (2026)
GRP: Goal-Reversed Prompting for Zero-Shot Evaluation with LLMs
by: Song, Mingyang, et al.
Published: (2025)
by: Song, Mingyang, et al.
Published: (2025)
CoAct-1: Computer-using Multi-Agent System with Coding Actions
by: Song, Linxin, et al.
Published: (2025)
by: Song, Linxin, et al.
Published: (2025)
Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning
by: Wang, Jingjing, et al.
Published: (2026)
by: Wang, Jingjing, et al.
Published: (2026)
Executable Code Actions Elicit Better LLM Agents
by: Wang, Xingyao, et al.
Published: (2024)
by: Wang, Xingyao, et al.
Published: (2024)
Code as Agent Harness
by: Ning, Xuying, et al.
Published: (2026)
by: Ning, Xuying, et al.
Published: (2026)
PodBench: A Comprehensive Benchmark for Instruction-Aware Audio-Oriented Podcast Script Generation
by: Xu, Chenning, et al.
Published: (2026)
by: Xu, Chenning, et al.
Published: (2026)
MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for Table Reasoning
by: Li, Zheng, et al.
Published: (2024)
by: Li, Zheng, et al.
Published: (2024)
TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment
by: Li, Zheng, et al.
Published: (2025)
by: Li, Zheng, et al.
Published: (2025)
Coding Agents are Effective Long-Context Processors
by: Cao, Weili, et al.
Published: (2026)
by: Cao, Weili, et al.
Published: (2026)
HY-MT1.5 Technical Report
by: Zheng, Mao, et al.
Published: (2025)
by: Zheng, Mao, et al.
Published: (2025)
VisCoder2: Building Multi-Language Visualization Coding Agents
by: Ni, Yuansheng, et al.
Published: (2025)
by: Ni, Yuansheng, et al.
Published: (2025)
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
by: Wang, Yuhang, et al.
Published: (2026)
by: Wang, Yuhang, et al.
Published: (2026)
CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants
by: Noah, Amit Finkman, et al.
Published: (2024)
by: Noah, Amit Finkman, et al.
Published: (2024)
Understanding and Mitigating Errors of LLM-Generated RTL Code
by: Zhang, Jiazheng, et al.
Published: (2025)
by: Zhang, Jiazheng, et al.
Published: (2025)
Towards Multi-Agent Reasoning Systems for Collaborative Expertise Delegation: An Exploratory Design Study
by: Xu, Baixuan, et al.
Published: (2025)
by: Xu, Baixuan, et al.
Published: (2025)
DNAZEN: Enhanced Gene Sequence Representations via Mixed Granularities of Coding Units
by: Mao, Lei, et al.
Published: (2025)
by: Mao, Lei, et al.
Published: (2025)
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
by: Zhang, Ziyao, et al.
Published: (2024)
by: Zhang, Ziyao, et al.
Published: (2024)
Towards an Understanding of Context Utilization in Code Intelligence
by: Wang, Yanlin, et al.
Published: (2025)
by: Wang, Yanlin, et al.
Published: (2025)
Mitigating Multilingual Hallucination in Large Vision-Language Models
by: Qu, Xiaoye, et al.
Published: (2024)
by: Qu, Xiaoye, et al.
Published: (2024)
SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
by: Deng, Boyi, et al.
Published: (2025)
by: Deng, Boyi, et al.
Published: (2025)
Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback
by: Bi, Zhangqian, et al.
Published: (2024)
by: Bi, Zhangqian, et al.
Published: (2024)
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
by: Chen, Guoxin, et al.
Published: (2026)
by: Chen, Guoxin, et al.
Published: (2026)
From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence
by: Yang, Jian, et al.
Published: (2025)
by: Yang, Jian, et al.
Published: (2025)
Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models
by: He, Chengyang, et al.
Published: (2025)
by: He, Chengyang, et al.
Published: (2025)
Verbal Process Supervision Elicits Better Coding Agents
by: Chen, Hao-Yuan, et al.
Published: (2025)
by: Chen, Hao-Yuan, et al.
Published: (2025)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code
by: Guan, Batu, et al.
Published: (2024)
by: Guan, Batu, et al.
Published: (2024)
CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments
by: Fu, Lingyue, et al.
Published: (2025)
by: Fu, Lingyue, et al.
Published: (2025)
Mitigating GenAI-powered Evidence Pollution for Out-of-Context Multimodal Misinformation Detection
by: Yan, Zehong, et al.
Published: (2025)
by: Yan, Zehong, et al.
Published: (2025)
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
by: Zhang, Lei, et al.
Published: (2024)
by: Zhang, Lei, et al.
Published: (2024)
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
by: Tian, Yuchen, et al.
Published: (2024)
by: Tian, Yuchen, et al.
Published: (2024)
Similar Items
-
HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains
by: Li, Zheng, et al.
Published: (2026) -
Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study
by: Song, Mingyang, et al.
Published: (2024) -
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
by: Song, Mingyang, et al.
Published: (2025) -
FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models
by: Song, Mingyang, et al.
Published: (2025) -
Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language Models
by: Song, Mingyang, et al.
Published: (2024)