:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xing, Jun, Bhatia, Mayur, Phulwani, Sahil, Suresh, Darshan, Matta, Rafik
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Software Engineering
Online Access:	https://arxiv.org/abs/2502.00226
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
by: Wu, Jie JW, et al.
Published: (2025)

An abstraction for solving multi-domain problems using finite element methods
by: Sagiyama, Koki, et al.
Published: (2025)

Social Media Reactions to Open Source Promotions: AI-Powered GitHub Projects on Hacker News
by: Meakpaiboonwattana, Prachnachai, et al.
Published: (2025)

MathDuels: Evaluating LLMs as Problem Posers and Solvers
by: Xu, Zhiqiu, et al.
Published: (2026)

ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants
by: Xu, Xiangzhe, et al.
Published: (2025)

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
by: Peng, Jiaren, et al.
Published: (2026)

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
by: Shi, Jingwei, et al.
Published: (2026)

Using a Feedback Loop for LLM-based Infrastructure as Code Generation
by: Palavalli, Mayur Amarnath, et al.
Published: (2024)

Holistic Evaluation of State-of-the-Art LLMs for Code Generation
by: Zhang, Le, et al.
Published: (2025)

MCeT: Behavioral Model Correctness Evaluation using Large Language Models
by: Ahmed, Khaled, et al.
Published: (2025)

Program Structure Aware Precondition Generation
by: Dinella, Elizabeth, et al.
Published: (2023)

SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization
by: Reddy, Revanth Gangi, et al.
Published: (2025)

Conditional Execution of Transpiler Passes Based on Per-Script Feature Detection
by: Bhatia, Rishipal Singh
Published: (2026)

SweRank: Software Issue Localization with Code Ranking
by: Reddy, Revanth Gangi, et al.
Published: (2025)

A novel instance‐based method for cross‐project just‐in‐time defect prediction
by: Xiaoyan Zhu, et al.
Published: (2024)

Component Matching Approach in Linking Business and Application Architecture
by: Kamath, Suresh
Published: (2024)

Component Matching as a Graph Matching Problem
by: Kamath, Suresh
Published: (2024)

IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
by: Li, Ziyang, et al.
Published: (2024)

Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
by: Deshpande, Darshan, et al.
Published: (2026)

Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities
by: Khare, Avishree, et al.
Published: (2023)

Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation
by: Albertini, Wallace, et al.
Published: (2026)

Harnessing Large Language Models for Seed Generation in Greybox Fuzzing
by: Shi, Wenxuan, et al.
Published: (2024)

ScalerEval: Automated and Consistent Evaluation Testbed for Auto-scalers in Microservices
by: Xie, Shuaiyu, et al.
Published: (2025)

Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks
by: Hu, Xing, et al.
Published: (2025)

Assessing Coherency and Consistency of Code Execution Reasoning by Large Language Models
by: Liu, Changshu, et al.
Published: (2025)

Calibration and Correctness of Language Models for Code
by: Spiess, Claudio, et al.
Published: (2024)

Where's the Bug? Attention Probing for Scalable Fault Localization
by: Stein, Adam, et al.
Published: (2025)

Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness
by: Wang, Wenxuan
Published: (2024)

QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
by: Wang, Claire, et al.
Published: (2025)

EvaluateXAI: A Framework to Evaluate the Reliability and Consistency of Rule-based XAI Techniques for Software Analytics Tasks
by: Awal, Md Abdul, et al.
Published: (2024)

The Road of Adaptive AI for Precision in Cybersecurity
by: Garg, Sahil
Published: (2025)

An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
by: Bhatia, Aaditya, et al.
Published: (2023)

Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations
by: Lin, Guancheng, et al.
Published: (2025)

Launch-Day Diffusion: Tracking Hacker News Impact on GitHub Stars for AI Tools
by: Kraishan, Obada
Published: (2025)

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
by: Naik, Aaditya, et al.
Published: (2023)

Nirjas: An open source framework for extracting metadata from the source code
by: Bhardwaj, Ayush, et al.
Published: (2024)

ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation
by: Xianpeng, et al.
Published: (2026)

Contract2Plan: Verified Contract-Grounded Retrieval-Augmented Optimization for BOM-Aware Procurement and Multi-Echelon Inventory Planning
by: Agarwal, Sahil
Published: (2026)

ConAIR:Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation
by: Dong, Jinhao, et al.
Published: (2024)

Evaluating Robustness of Large Language Models in Enterprise Applications: Benchmarks for Perturbation Consistency Across Formats and Languages
by: Bogavelli, Tara, et al.
Published: (2026)