Saved in:
| Main Author: | Ravishankara, Mayank |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.13232 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
by: Ravishankara, Mayank
Published: (2026)
by: Ravishankara, Mayank
Published: (2026)
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
by: Galimzyanov, Timur, et al.
Published: (2024)
by: Galimzyanov, Timur, et al.
Published: (2024)
Cost-Efficient Prompt Engineering for Unsupervised Entity Resolution
by: Nananukul, Navapat, et al.
Published: (2023)
by: Nananukul, Navapat, et al.
Published: (2023)
LLMs: A Game-Changer for Software Engineers?
by: Haque, Md Asraful
Published: (2024)
by: Haque, Md Asraful
Published: (2024)
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
by: Jiang, Shufan, et al.
Published: (2026)
by: Jiang, Shufan, et al.
Published: (2026)
Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering
by: Wang, Ruiqi, et al.
Published: (2025)
by: Wang, Ruiqi, et al.
Published: (2025)
Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs
by: Ngassom, Sylvain Kouemo, et al.
Published: (2024)
by: Ngassom, Sylvain Kouemo, et al.
Published: (2024)
Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering
by: Rodriguez-Cardenas, Daniel, et al.
Published: (2026)
by: Rodriguez-Cardenas, Daniel, et al.
Published: (2026)
Analysis of LLMs vs Human Experts in Requirements Engineering
by: Hymel, Cory, et al.
Published: (2025)
by: Hymel, Cory, et al.
Published: (2025)
Evaluating LLMs for Visualization Tasks
by: Khan, Saadiq Rauf, et al.
Published: (2025)
by: Khan, Saadiq Rauf, et al.
Published: (2025)
LLMs for Engineering: Teaching Models to Design High Powered Rockets
by: Simonds, Toby
Published: (2025)
by: Simonds, Toby
Published: (2025)
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
by: Khati, Dipin, et al.
Published: (2025)
by: Khati, Dipin, et al.
Published: (2025)
Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages
by: Wu, Fan, et al.
Published: (2026)
by: Wu, Fan, et al.
Published: (2026)
Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs
by: Xing, Xing, et al.
Published: (2025)
by: Xing, Xing, et al.
Published: (2025)
Get on the Train or be Left on the Station: Using LLMs for Software Engineering Research
by: Trinkenreich, Bianca, et al.
Published: (2025)
by: Trinkenreich, Bianca, et al.
Published: (2025)
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities
by: Wang, Hanbin, et al.
Published: (2025)
by: Wang, Hanbin, et al.
Published: (2025)
Evaluating the Energy-Efficiency of the Code Generated by LLMs
by: Islam, Md Arman, et al.
Published: (2025)
by: Islam, Md Arman, et al.
Published: (2025)
Evaluating the Generalizability of LLMs in Automated Program Repair
by: Li, Fengjie, et al.
Published: (2025)
by: Li, Fengjie, et al.
Published: (2025)
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering
by: Shah, Syed Tauhid Ullah, et al.
Published: (2025)
by: Shah, Syed Tauhid Ullah, et al.
Published: (2025)
Holistic Evaluation of State-of-the-Art LLMs for Code Generation
by: Zhang, Le, et al.
Published: (2025)
by: Zhang, Le, et al.
Published: (2025)
Using LLMs in Software Requirements Specifications: An Empirical Evaluation
by: Krishna, Madhava, et al.
Published: (2024)
by: Krishna, Madhava, et al.
Published: (2024)
Read, Extract, Classify: A Tool for Smarter Requirements Engineering
by: Bhattacharya, Paheli, et al.
Published: (2026)
by: Bhattacharya, Paheli, et al.
Published: (2026)
Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants
by: Cherny-Shahar, Tsvi, et al.
Published: (2026)
by: Cherny-Shahar, Tsvi, et al.
Published: (2026)
TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance
by: Bruches, Elena, et al.
Published: (2026)
by: Bruches, Elena, et al.
Published: (2026)
Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering
by: Li, Jingyue, et al.
Published: (2026)
by: Li, Jingyue, et al.
Published: (2026)
ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation
by: Xianpeng, et al.
Published: (2026)
by: Xianpeng, et al.
Published: (2026)
Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs
by: Singha, Ananya, et al.
Published: (2025)
by: Singha, Ananya, et al.
Published: (2025)
FullStack Bench: Evaluating LLMs as Full Stack Coders
by: Bytedance-Seed-Foundation-Code-Team, et al.
Published: (2024)
by: Bytedance-Seed-Foundation-Code-Team, et al.
Published: (2024)
Automated Validation of LLM-based Evaluators for Software Engineering Artifacts
by: Fandina, Ora Nova, et al.
Published: (2025)
by: Fandina, Ora Nova, et al.
Published: (2025)
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
by: Pan, Chenkai, et al.
Published: (2026)
by: Pan, Chenkai, et al.
Published: (2026)
FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
by: Zhu, Hongda, et al.
Published: (2025)
by: Zhu, Hongda, et al.
Published: (2025)
Detecting and Correcting Hallucinations in LLM-Generated Code via Deterministic AST Analysis
by: Khati, Dipin, et al.
Published: (2026)
by: Khati, Dipin, et al.
Published: (2026)
An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains
by: Yan, Zihe, et al.
Published: (2025)
by: Yan, Zihe, et al.
Published: (2025)
Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python
by: Adnan, Muntasir, et al.
Published: (2026)
by: Adnan, Muntasir, et al.
Published: (2026)
AI-Assisted Requirements Engineering: An Empirical Evaluation Relative to Expert Judgment
by: Levy, Oz, et al.
Published: (2026)
by: Levy, Oz, et al.
Published: (2026)
Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects
by: Nunes, Henrique, et al.
Published: (2025)
by: Nunes, Henrique, et al.
Published: (2025)
Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs
by: Wang, Shufan, et al.
Published: (2025)
by: Wang, Shufan, et al.
Published: (2025)
Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs
by: Li, Ziyu, et al.
Published: (2024)
by: Li, Ziyu, et al.
Published: (2024)
TREAT: A Code LLMs Trustworthiness / Reliability Evaluation and Testing Framework
by: Gao, Shuzheng, et al.
Published: (2025)
by: Gao, Shuzheng, et al.
Published: (2025)
ToolMisuseBench: An Offline Deterministic Benchmark for Tool Misuse and Recovery in Agentic Systems
by: Sigdel, Akshey, et al.
Published: (2026)
by: Sigdel, Akshey, et al.
Published: (2026)
Similar Items
-
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
by: Ravishankara, Mayank
Published: (2026) -
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
by: Galimzyanov, Timur, et al.
Published: (2024) -
Cost-Efficient Prompt Engineering for Unsupervised Entity Resolution
by: Nananukul, Navapat, et al.
Published: (2023) -
LLMs: A Game-Changer for Software Engineers?
by: Haque, Md Asraful
Published: (2024) -
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
by: Jiang, Shufan, et al.
Published: (2026)