:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Dunivin, Zackary Okun, Noori, Mobina, Frey, Seth, Atkinson, Curtis
Formato:	Preprint
Publicado:	2026
Materias:	Software Engineering Computation and Language
Acceso en línea:	https://arxiv.org/abs/2601.09905
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic Tasks
por: Dunivin, Zackary Okun
Publicado: (2024)

Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation
por: Fu, Lingyue, et al.
Publicado: (2025)

NaviQAte: Functionality-Guided Web Application Navigation
por: Shahbandeh, Mobina, et al.
Publicado: (2024)

CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
por: Huang, Shiting, et al.
Publicado: (2025)

BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation
por: Dihan, Mahir Labib, et al.
Publicado: (2025)

LLM-as-a-Judge for Reference-less Automatic Code Validation and Refinement for Natural Language to Bash in IT Automation
por: Vo, Ngoc Phuoc An, et al.
Publicado: (2025)

DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
por: Li, Jia, et al.
Publicado: (2024)

SEW: Self-Evolving Agentic Workflows for Automated Code Generation
por: Liu, Siwei, et al.
Publicado: (2025)

ProbeLLM: Automating Principled Diagnosis of LLM Failures
por: Huang, Yue, et al.
Publicado: (2026)

Evaluating and Achieving Controllable Code Completion in Code LLM
por: Zhang, Jiajun, et al.
Publicado: (2026)

Code Fingerprints: Disentangled Attribution of LLM-Generated Code
por: Guo, Jiaxun, et al.
Publicado: (2026)

PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
por: Peng, Yun, et al.
Publicado: (2024)

From Critique to Clarity: A Pathway to Faithful and Personalized Code Explanations with Large Language Models
por: Xu, Zexing, et al.
Publicado: (2024)

Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking
por: Li, Zhuohao, et al.
Publicado: (2025)

Improving Code Localization with Repository Memory
por: Wang, Boshi, et al.
Publicado: (2025)

Comparing Developer and LLM Biases in Code Evaluation
por: Mittal, Aditya, et al.
Publicado: (2026)

EffiSkill: Agent Skill Based Automated Code Efficiency Optimization
por: Wang, Zimu, et al.
Publicado: (2026)

Enhanced Automated Code Vulnerability Repair using Large Language Models
por: de-Fitero-Dominguez, David, et al.
Publicado: (2024)

CYCLE: Learning to Self-Refine the Code Generation
por: Ding, Yangruibo, et al.
Publicado: (2024)

Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
por: Moon, Jiwon, et al.
Publicado: (2025)

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning
por: Dihan, Mahir Labib, et al.
Publicado: (2026)

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
por: Wu, Jie JW, et al.
Publicado: (2025)

SelfCodeAlign: Self-Alignment for Code Generation
por: Wei, Yuxiang, et al.
Publicado: (2024)

Measuring LLM Code Generation Stability via Structural Entropy
por: Song, Yewei, et al.
Publicado: (2025)

Showing LLM-Generated Code Selectively Based on Confidence of LLMs
por: Li, Jia, et al.
Publicado: (2024)

UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
por: Wu, Jason, et al.
Publicado: (2024)

LLM Agents Improve Semantic Code Search
por: Jain, Sarthak, et al.
Publicado: (2024)

SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents
por: Wang, Yuhang, et al.
Publicado: (2026)

Leveraging Print Debugging to Improve Code Generation in Large Language Models
por: Hu, Xueyu, et al.
Publicado: (2024)

ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation
por: Liu, Kaiyuan, et al.
Publicado: (2025)

MATCH: Task-Driven Code Evaluation through Contrastive Learning
por: Ghoummaid, Marah, et al.
Publicado: (2025)

EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization
por: Huang, Dong, et al.
Publicado: (2024)

Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
por: Ma, Yingwei, et al.
Publicado: (2024)

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
por: Dou, Shihan, et al.
Publicado: (2024)

LASSI: An LLM-based Automated Self-Correcting Pipeline for Translating Parallel Scientific Codes
por: Dearing, Matthew T., et al.
Publicado: (2024)

ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation
por: Zhang, Chenchen, et al.
Publicado: (2025)

Generating Equivalent Representations of Code By A Self-Reflection Approach
por: Li, Jia, et al.
Publicado: (2024)

Improving Small Language Models for Code Generation with Reinforcement Learning from Verification Feedback
por: Skopin, Egor, et al.
Publicado: (2026)

CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells
por: Naik, Atharva, et al.
Publicado: (2024)

Recommender systems, stigmergy, and the tyranny of popularity
por: Dunivin, Zackary Okun, et al.
Publicado: (2025)