Guardado en:
| Autores principales: | Dunivin, Zackary Okun, Noori, Mobina, Frey, Seth, Atkinson, Curtis |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2601.09905 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks
por: Dunivin, Zackary Okun
Publicado: (2024)
por: Dunivin, Zackary Okun
Publicado: (2024)
Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation
por: Fu, Lingyue, et al.
Publicado: (2025)
por: Fu, Lingyue, et al.
Publicado: (2025)
NaviQAte: Functionality-Guided Web Application Navigation
por: Shahbandeh, Mobina, et al.
Publicado: (2024)
por: Shahbandeh, Mobina, et al.
Publicado: (2024)
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
por: Huang, Shiting, et al.
Publicado: (2025)
por: Huang, Shiting, et al.
Publicado: (2025)
BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation
por: Dihan, Mahir Labib, et al.
Publicado: (2025)
por: Dihan, Mahir Labib, et al.
Publicado: (2025)
LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation
por: Vo, Ngoc Phuoc An, et al.
Publicado: (2025)
por: Vo, Ngoc Phuoc An, et al.
Publicado: (2025)
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
por: Li, Jia, et al.
Publicado: (2024)
por: Li, Jia, et al.
Publicado: (2024)
SEW: Self-Evolving Agentic Workflows for Automated Code Generation
por: Liu, Siwei, et al.
Publicado: (2025)
por: Liu, Siwei, et al.
Publicado: (2025)
ProbeLLM: Automating Principled Diagnosis of LLM Failures
por: Huang, Yue, et al.
Publicado: (2026)
por: Huang, Yue, et al.
Publicado: (2026)
Evaluating and Achieving Controllable Code Completion in Code LLM
por: Zhang, Jiajun, et al.
Publicado: (2026)
por: Zhang, Jiajun, et al.
Publicado: (2026)
Code Fingerprints: Disentangled Attribution of LLM-Generated Code
por: Guo, Jiaxun, et al.
Publicado: (2026)
por: Guo, Jiaxun, et al.
Publicado: (2026)
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
por: Peng, Yun, et al.
Publicado: (2024)
por: Peng, Yun, et al.
Publicado: (2024)
From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models
por: Xu, Zexing, et al.
Publicado: (2024)
por: Xu, Zexing, et al.
Publicado: (2024)
Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking
por: Li, Zhuohao, et al.
Publicado: (2025)
por: Li, Zhuohao, et al.
Publicado: (2025)
Improving Code Localization with Repository Memory
por: Wang, Boshi, et al.
Publicado: (2025)
por: Wang, Boshi, et al.
Publicado: (2025)
Comparing Developer and LLM Biases in Code Evaluation
por: Mittal, Aditya, et al.
Publicado: (2026)
por: Mittal, Aditya, et al.
Publicado: (2026)
EffiSkill: Agent Skill Based Automated Code Efficiency Optimization
por: Wang, Zimu, et al.
Publicado: (2026)
por: Wang, Zimu, et al.
Publicado: (2026)
Enhanced Automated Code Vulnerability Repair using Large Language Models
por: de-Fitero-Dominguez, David, et al.
Publicado: (2024)
por: de-Fitero-Dominguez, David, et al.
Publicado: (2024)
CYCLE: Learning to Self-Refine the Code Generation
por: Ding, Yangruibo, et al.
Publicado: (2024)
por: Ding, Yangruibo, et al.
Publicado: (2024)
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
por: Moon, Jiwon, et al.
Publicado: (2025)
por: Moon, Jiwon, et al.
Publicado: (2025)
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning
por: Dihan, Mahir Labib, et al.
Publicado: (2026)
por: Dihan, Mahir Labib, et al.
Publicado: (2026)
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
por: Wu, Jie JW, et al.
Publicado: (2025)
por: Wu, Jie JW, et al.
Publicado: (2025)
SelfCodeAlign: Self-Alignment for Code Generation
por: Wei, Yuxiang, et al.
Publicado: (2024)
por: Wei, Yuxiang, et al.
Publicado: (2024)
Measuring LLM Code Generation Stability via Structural Entropy
por: Song, Yewei, et al.
Publicado: (2025)
por: Song, Yewei, et al.
Publicado: (2025)
Showing LLM-Generated Code Selectively Based on Confidence of LLMs
por: Li, Jia, et al.
Publicado: (2024)
por: Li, Jia, et al.
Publicado: (2024)
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
por: Wu, Jason, et al.
Publicado: (2024)
por: Wu, Jason, et al.
Publicado: (2024)
LLM Agents Improve Semantic Code Search
por: Jain, Sarthak, et al.
Publicado: (2024)
por: Jain, Sarthak, et al.
Publicado: (2024)
SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
por: Wang, Yuhang, et al.
Publicado: (2026)
por: Wang, Yuhang, et al.
Publicado: (2026)
Leveraging Print Debugging to Improve Code Generation in Large Language Models
por: Hu, Xueyu, et al.
Publicado: (2024)
por: Hu, Xueyu, et al.
Publicado: (2024)
ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation
por: Liu, Kaiyuan, et al.
Publicado: (2025)
por: Liu, Kaiyuan, et al.
Publicado: (2025)
MATCH: Task-Driven Code Evaluation through Contrastive Learning
por: Ghoummaid, Marah, et al.
Publicado: (2025)
por: Ghoummaid, Marah, et al.
Publicado: (2025)
EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization
por: Huang, Dong, et al.
Publicado: (2024)
por: Huang, Dong, et al.
Publicado: (2024)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
por: Ma, Yingwei, et al.
Publicado: (2024)
por: Ma, Yingwei, et al.
Publicado: (2024)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
por: Dou, Shihan, et al.
Publicado: (2024)
por: Dou, Shihan, et al.
Publicado: (2024)
LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes
por: Dearing, Matthew T., et al.
Publicado: (2024)
por: Dearing, Matthew T., et al.
Publicado: (2024)
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
por: Zhang, Chenchen, et al.
Publicado: (2025)
por: Zhang, Chenchen, et al.
Publicado: (2025)
Generating Equivalent Representations of Code By A Self-Reflection Approach
por: Li, Jia, et al.
Publicado: (2024)
por: Li, Jia, et al.
Publicado: (2024)
Improving Small Language Models for Code Generation with Reinforcement Learning from Verification Feedback
por: Skopin, Egor, et al.
Publicado: (2026)
por: Skopin, Egor, et al.
Publicado: (2026)
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
por: Naik, Atharva, et al.
Publicado: (2024)
por: Naik, Atharva, et al.
Publicado: (2024)
Recommender systems, stigmergy, and the tyranny of popularity
por: Dunivin, Zackary Okun, et al.
Publicado: (2025)
por: Dunivin, Zackary Okun, et al.
Publicado: (2025)
Ejemplares similares
-
Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks
por: Dunivin, Zackary Okun
Publicado: (2024) -
Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation
por: Fu, Lingyue, et al.
Publicado: (2025) -
NaviQAte: Functionality-Guided Web Application Navigation
por: Shahbandeh, Mobina, et al.
Publicado: (2024) -
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
por: Huang, Shiting, et al.
Publicado: (2025) -
BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation
por: Dihan, Mahir Labib, et al.
Publicado: (2025)