Saved in:
| Main Authors: | Zhang, Xuanwang, Song, Yunze, Wang, Yidong, Tang, Shuyun, Li, Xinfeng, Zeng, Zhengran, Wu, Zhen, Ye, Wei, Xu, Wenyuan, Zhang, Yue, Dai, Xinyu, Zhang, Shikun, Wen, Qingsong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.11381 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios
by: Zeng, Zhengran, et al.
Published: (2024)
by: Zeng, Zhengran, et al.
Published: (2024)
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025)
by: Wang, Yidong, et al.
Published: (2025)
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
WebNavigator: Global Web Navigation via Interaction Graph Retrieval
by: Zhang, Xuanwang, et al.
Published: (2026)
by: Zhang, Xuanwang, et al.
Published: (2026)
Recognize Your Orchestrator: An Entropy Dynamics Perspective for LLM Multi-Agent Systems
by: Zhu, Junze, et al.
Published: (2026)
by: Zhu, Junze, et al.
Published: (2026)
AutoSurvey: Large Language Models Can Automatically Write Surveys
by: Wang, Yidong, et al.
Published: (2024)
by: Wang, Yidong, et al.
Published: (2024)
Benchmarking and Studying the LLM-based Agent System in End-to-End Software Development
by: Zeng, Zhengran, et al.
Published: (2025)
by: Zeng, Zhengran, et al.
Published: (2025)
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models
by: Xing, Chengli, et al.
Published: (2026)
by: Xing, Chengli, et al.
Published: (2026)
Seed&Steer: Guiding Large Language Models with Compilable Prefix and Branch Signals for Unit Test Generation
by: Zhou, Shuaiyu, et al.
Published: (2025)
by: Zhou, Shuaiyu, et al.
Published: (2025)
CodeShell Technical Report
by: Xie, Rui, et al.
Published: (2024)
by: Xie, Rui, et al.
Published: (2024)
SAEMark: Steering Personalized Multilingual LLM Watermarks with Sparse Autoencoders
by: Yu, Zhuohao, et al.
Published: (2025)
by: Yu, Zhuohao, et al.
Published: (2025)
EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations
by: Zhou, Xinyun, et al.
Published: (2025)
by: Zhou, Xinyun, et al.
Published: (2025)
Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
by: Liu, Zhuoyao, et al.
Published: (2026)
by: Liu, Zhuoyao, et al.
Published: (2026)
Benchmarking and Studying the LLM-based Code Review
by: Zeng, Zhengran, et al.
Published: (2025)
by: Zeng, Zhengran, et al.
Published: (2025)
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
by: Wang, Yidong, et al.
Published: (2023)
by: Wang, Yidong, et al.
Published: (2023)
ISC4DGF: Enhancing Directed Grey-box Fuzzing with LLM-Driven Initial Seed Corpus Generation
by: Xu, Yijiang, et al.
Published: (2024)
by: Xu, Yijiang, et al.
Published: (2024)
Enhancing In-Context Learning via Implicit Demonstration Augmentation
by: Zhou, Xiaoling, et al.
Published: (2024)
by: Zhou, Xiaoling, et al.
Published: (2024)
A Survey on Evaluating Large Language Models in Code Generation Tasks
by: Chen, Liguo, et al.
Published: (2024)
by: Chen, Liguo, et al.
Published: (2024)
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
by: Yu, Zhuohao, et al.
Published: (2024)
by: Yu, Zhuohao, et al.
Published: (2024)
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
by: Jin, Jiajie, et al.
Published: (2024)
by: Jin, Jiajie, et al.
Published: (2024)
UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression
by: Wang, Jia, et al.
Published: (2025)
by: Wang, Jia, et al.
Published: (2025)
RewardAnything: Generalizable Principle-Following Reward Models
by: Yu, Zhuohao, et al.
Published: (2025)
by: Yu, Zhuohao, et al.
Published: (2025)
Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models
by: Zhang, Yuzhe, et al.
Published: (2024)
by: Zhang, Yuzhe, et al.
Published: (2024)
MCCE: A Framework for Multi-LLM Collaborative Co-Evolution
by: Ran, Nian, et al.
Published: (2025)
by: Ran, Nian, et al.
Published: (2025)
CultureSynth: A Hierarchical Taxonomy-Guided and Retrieval-Augmented Framework for Cultural Question-Answer Synthesis
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Assessing Climate Extremes Indices Over Global Drylands Under Real World Warming Beyond 1.5°C: Spatial Distribution and Temporal Trends
by: Xinyu Ma, et al.
Published: (2025)
by: Xinyu Ma, et al.
Published: (2025)
Boosting Model Resilience via Implicit Adversarial Data Augmentation
by: Zhou, Xiaoling, et al.
Published: (2024)
by: Zhou, Xiaoling, et al.
Published: (2024)
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
by: Zhang, Chenghao, et al.
Published: (2025)
by: Zhang, Chenghao, et al.
Published: (2025)
Why Retrieval-Augmented Generation Fails: A Graph Perspective
by: Guo, Kai, et al.
Published: (2026)
by: Guo, Kai, et al.
Published: (2026)
Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework
by: Jia, Hongrui, et al.
Published: (2025)
by: Jia, Hongrui, et al.
Published: (2025)
Predicting Satisfied User and Machine Ratio for Compressed Images: A Unified Approach
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
Pref-GUIDE Dataset
by: Ji, Zhengran
Published: (2025)
by: Ji, Zhengran
Published: (2025)
CREW: Facilitating Human-AI Teaming Research
by: Zhang, Lingyu, et al.
Published: (2024)
by: Zhang, Lingyu, et al.
Published: (2024)
Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG
by: Li, Bo, et al.
Published: (2025)
by: Li, Bo, et al.
Published: (2025)
Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression
by: Zhang, Gai, et al.
Published: (2024)
by: Zhang, Gai, et al.
Published: (2024)
Corrections Meet Explanations: A Unified Framework for Explainable Grammatical Error Correction
by: Ye, Jingheng, et al.
Published: (2025)
by: Ye, Jingheng, et al.
Published: (2025)
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
by: Yu, Yue, et al.
Published: (2024)
by: Yu, Yue, et al.
Published: (2024)
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
by: Mou, Yutao, et al.
Published: (2024)
by: Mou, Yutao, et al.
Published: (2024)
Similar Items
-
CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios
by: Zeng, Zhengran, et al.
Published: (2024) -
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
by: Yu, Zhuohao, et al.
Published: (2024) -
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
by: Wang, Yidong, et al.
Published: (2025) -
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
by: Yu, Zhuohao, et al.
Published: (2024) -
WebNavigator: Global Web Navigation via Interaction Graph Retrieval
by: Zhang, Xuanwang, et al.
Published: (2026)