:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Ravishankara, Mayank
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Software Engineering
Online Access:	https://arxiv.org/abs/2602.13232
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
by: Ravishankara, Mayank
Published: (2026)

Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
by: Galimzyanov, Timur, et al.
Published: (2024)

Cost-Efficient Prompt Engineering for Unsupervised Entity Resolution
by: Nananukul, Navapat, et al.
Published: (2023)

LLMs: A Game-Changer for Software Engineers?
by: Haque, Md Asraful
Published: (2024)

GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
by: Jiang, Shufan, et al.
Published: (2026)

Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering
by: Wang, Ruiqi, et al.
Published: (2025)

Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs
by: Ngassom, Sylvain Kouemo, et al.
Published: (2024)

Towards Comprehensive Benchmarking Infrastructure for LLMs In Software Engineering
by: Rodriguez-Cardenas, Daniel, et al.
Published: (2026)

Analysis of LLMs vs Human Experts in Requirements Engineering
by: Hymel, Cory, et al.
Published: (2025)

Evaluating LLMs for Visualization Tasks
by: Khan, Saadiq Rauf, et al.
Published: (2025)

LLMs for Engineering: Teaching Models to Design High Powered Rockets
by: Simonds, Toby
Published: (2025)

Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives
by: Khati, Dipin, et al.
Published: (2025)

Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages
by: Wu, Fan, et al.
Published: (2026)

Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs
by: Xing, Xing, et al.
Published: (2025)

Get on the Train or be Left on the Station: Using LLMs for Software Engineering Research
by: Trinkenreich, Bianca, et al.
Published: (2025)

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities
by: Wang, Hanbin, et al.
Published: (2025)

Evaluating the Energy-Efficiency of the Code Generated by LLMs
by: Islam, Md Arman, et al.
Published: (2025)

Evaluating the Generalizability of LLMs in Automated Program Repair
by: Li, Fengjie, et al.
Published: (2025)

From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering
by: Shah, Syed Tauhid Ullah, et al.
Published: (2025)

Holistic Evaluation of State-of-the-Art LLMs for Code Generation
by: Zhang, Le, et al.
Published: (2025)

Using LLMs in Software Requirements Specifications: An Empirical Evaluation
by: Krishna, Madhava, et al.
Published: (2024)

Read, Extract, Classify: A Tool for Smarter Requirements Engineering
by: Bhattacharya, Paheli, et al.
Published: (2026)

Repository Intelligence Graph: Deterministic Architectural Map for LLM Code Assistants
by: Cherny-Shahar, Tsvi, et al.
Published: (2026)

TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance
by: Bruches, Elena, et al.
Published: (2026)

Reproducible, Explainable, and Effective Evaluations of Agentic AI for Software Engineering
by: Li, Jingyue, et al.
Published: (2026)

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation
by: Xianpeng, et al.
Published: (2026)

Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs
by: Singha, Ananya, et al.
Published: (2025)

FullStack Bench: Evaluating LLMs as Full Stack Coders
by: Bytedance-Seed-Foundation-Code-Team, et al.
Published: (2024)

Automated Validation of LLM-based Evaluators for Software Engineering Artifacts
by: Fandina, Ora Nova, et al.
Published: (2025)

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
by: Pan, Chenkai, et al.
Published: (2026)

FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation
by: Zhu, Hongda, et al.
Published: (2025)

Detecting and Correcting Hallucinations in LLM-Generated Code via Deterministic AST Analysis
by: Khati, Dipin, et al.
Published: (2026)

An LLM-based Quantitative Framework for Evaluating High-Stealthy Backdoor Risks in OSS Supply Chains
by: Yan, Zihe, et al.
Published: (2025)

Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python
by: Adnan, Muntasir, et al.
Published: (2026)

AI-Assisted Requirements Engineering: An Empirical Evaluation Relative to Expert Judgment
by: Levy, Oz, et al.
Published: (2026)

Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects
by: Nunes, Henrique, et al.
Published: (2025)

Open the Oyster: Empirical Evaluation and Improvement of Code Reasoning Confidence in LLMs
by: Wang, Shufan, et al.
Published: (2025)

Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs
by: Li, Ziyu, et al.
Published: (2024)

TREAT: A Code LLMs Trustworthiness / Reliability Evaluation and Testing Framework
by: Gao, Shuzheng, et al.
Published: (2025)

ToolMisuseBench: An Offline Deterministic Benchmark for Tool Misuse and Recovery in Agentic Systems
by: Sigdel, Akshey, et al.
Published: (2026)