Saved in:
| Main Authors: | Sahoo, Devanshu, Majhi, Vasudev, Neekhra, Arjun, Sinha, Yash, Mandal, Murari, Kumar, Dhruv |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10415 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation
by: Sahoo, Devanshu, et al.
Published: (2026)
by: Sahoo, Devanshu, et al.
Published: (2026)
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
by: Sahoo, Devanshu, et al.
Published: (2025)
by: Sahoo, Devanshu, et al.
Published: (2025)
Your Build Scripts Stink: The State of Code Smells in Build Scripts
by: Tamanna, Mahzabin, et al.
Published: (2025)
by: Tamanna, Mahzabin, et al.
Published: (2025)
Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts
by: Ouyang, Sheng, et al.
Published: (2025)
by: Ouyang, Sheng, et al.
Published: (2025)
Fuzzing with Agents? Generators Are All You Need
by: Vikram, Vasudev, et al.
Published: (2026)
by: Vikram, Vasudev, et al.
Published: (2026)
The Hidden Cost of Readability: How Code Formatting Silently Consumes Your LLM Budget
by: Pan, Dangfeng, et al.
Published: (2025)
by: Pan, Dangfeng, et al.
Published: (2025)
A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How
by: Wang, Chaozheng, et al.
Published: (2024)
by: Wang, Chaozheng, et al.
Published: (2024)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval
by: Wang, Jiexin, et al.
Published: (2024)
by: Wang, Jiexin, et al.
Published: (2024)
Leveraging Large Language Models to Improve REST API Testing
by: Kim, Myeongsoo, et al.
Published: (2023)
by: Kim, Myeongsoo, et al.
Published: (2023)
Grounded AI for Code Review: Resource-Efficient Large-Model Serving in Enterprise Pipelines
by: Mandal, Sayan, et al.
Published: (2025)
by: Mandal, Sayan, et al.
Published: (2025)
On the Freshness of Pinned Dependencies in Maven
by: Vikram, Vasudev, et al.
Published: (2025)
by: Vikram, Vasudev, et al.
Published: (2025)
Modeling and Recovering Hierarchical Structural Architectures of ROS 2 Systems from Code and Launch Configurations using LLM-based Agents
by: Benchat, Mohamed, et al.
Published: (2026)
by: Benchat, Mohamed, et al.
Published: (2026)
Context-Aware CodeLLM Eviction for AI-assisted Coding
by: Thangarajah, Kishanthan, et al.
Published: (2025)
by: Thangarajah, Kishanthan, et al.
Published: (2025)
CR-Bench: Evaluating the Real-World Utility of AI Code Review Agents
by: Pereira, Kristen, et al.
Published: (2026)
by: Pereira, Kristen, et al.
Published: (2026)
Rubric Is All You Need: Enhancing LLM-based Code Evaluation With Question-Specific Rubrics
by: Pathak, Aditya, et al.
Published: (2025)
by: Pathak, Aditya, et al.
Published: (2025)
Evaluating Large Language Models for Functional and Maintainable Code in Industrial Settings: A Case Study at ASML
by: Mundhra, Yash, et al.
Published: (2025)
by: Mundhra, Yash, et al.
Published: (2025)
CodeArena: A Collective Evaluation Platform for LLM Code Generation
by: Du, Mingzhe, et al.
Published: (2025)
by: Du, Mingzhe, et al.
Published: (2025)
Intention is All You Need: Refining Your Code from Your Intention
by: Guo, Qi, et al.
Published: (2025)
by: Guo, Qi, et al.
Published: (2025)
LLM-as-a-Judge for Human-AI Co-Creation: A Reliability-Aware Evaluation Framework for Coding
by: Amin, Md Faizul Ibne, et al.
Published: (2026)
by: Amin, Md Faizul Ibne, et al.
Published: (2026)
Can Old Tests Do New Tricks for Resolving SWE Issues?
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
WIP: Leveraging LLMs for Enforcing Design Principles in Student Code: Analysis of Prompting Strategies and RAG
by: Kolhatkar, Dhruv, et al.
Published: (2025)
by: Kolhatkar, Dhruv, et al.
Published: (2025)
These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests
by: Duma, Kacper, et al.
Published: (2026)
by: Duma, Kacper, et al.
Published: (2026)
Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Improved Accuracy, Reliability, and Latency
by: Ashrafi, Nazmus, et al.
Published: (2025)
by: Ashrafi, Nazmus, et al.
Published: (2025)
Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing
by: Dhruv, Akash, et al.
Published: (2024)
by: Dhruv, Akash, et al.
Published: (2024)
Can Large Language Models Write Good Property-Based Tests?
by: Vikram, Vasudev, et al.
Published: (2023)
by: Vikram, Vasudev, et al.
Published: (2023)
Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools
by: Bhatia, Shreya, et al.
Published: (2023)
by: Bhatia, Shreya, et al.
Published: (2023)
Adding New Capability in Existing Scientific Application with LLM Assistance
by: Dubey, Anshu, et al.
Published: (2025)
by: Dubey, Anshu, et al.
Published: (2025)
"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors
by: Liu, Yue, et al.
Published: (2025)
by: Liu, Yue, et al.
Published: (2025)
Comment Traps: How Defective Commented-out Code Augment Defects in AI-Assisted Code Generation
by: Huang, Yuan, et al.
Published: (2025)
by: Huang, Yuan, et al.
Published: (2025)
"I Would Have Written My Code Differently'': Beginners Struggle to Understand LLM-Generated Code
by: Zi, Yangtian, et al.
Published: (2025)
by: Zi, Yangtian, et al.
Published: (2025)
Evaluating LLM-Generated Code: A Benchmark and Developer Study
by: Szych, Joanna, et al.
Published: (2026)
by: Szych, Joanna, et al.
Published: (2026)
Copilot Arena: A Platform for Code LLM Evaluation in the Wild
by: Chi, Wayne, et al.
Published: (2025)
by: Chi, Wayne, et al.
Published: (2025)
Evaluating Efficiency and Novelty of LLM-Generated Code for Graph Analysis
by: Nia, Atieh Barati, et al.
Published: (2025)
by: Nia, Atieh Barati, et al.
Published: (2025)
TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation
by: Gong, Zhihao, et al.
Published: (2026)
by: Gong, Zhihao, et al.
Published: (2026)
TRACE: Evaluating Execution Efficiency of LLM-Based Code Translation
by: Gong, Zhihao, et al.
Published: (2025)
by: Gong, Zhihao, et al.
Published: (2025)
A Survey of Code Review Benchmarks and Evaluation Practices in Pre-LLM and LLM Era
by: Khan, Taufiqul Islam, et al.
Published: (2026)
by: Khan, Taufiqul Islam, et al.
Published: (2026)
Gendered Prompting and LLM Code Review: How Gender Cues in the Prompt Shape Code Quality and Evaluation
by: Janzen, Lynn, et al.
Published: (2026)
by: Janzen, Lynn, et al.
Published: (2026)
How to Compare the Security of Code Written by Humans to LLM-generated Code
by: Balebako, Rebecca, et al.
Published: (2026)
by: Balebako, Rebecca, et al.
Published: (2026)
Code Roulette: How Prompt Variability Affects LLM Code Generation
by: Paleyes, Andrei, et al.
Published: (2025)
by: Paleyes, Andrei, et al.
Published: (2025)
Code Review Automation Via Multi-task Federated LLM -- An Empirical Study
by: Kumar, Jahnavi, et al.
Published: (2024)
by: Kumar, Jahnavi, et al.
Published: (2024)
Similar Items
-
The Compliance Paradox: Semantic-Instruction Decoupling in Automated Academic Code Evaluation
by: Sahoo, Devanshu, et al.
Published: (2026) -
When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection
by: Sahoo, Devanshu, et al.
Published: (2025) -
Your Build Scripts Stink: The State of Code Smells in Build Scripts
by: Tamanna, Mahzabin, et al.
Published: (2025) -
Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts
by: Ouyang, Sheng, et al.
Published: (2025) -
Fuzzing with Agents? Generators Are All You Need
by: Vikram, Vasudev, et al.
Published: (2026)