Saved in:
| Main Authors: | Xing, Jun, Bhatia, Mayur, Phulwani, Sahil, Suresh, Darshan, Matta, Rafik |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.00226 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
by: Wu, Jie JW, et al.
Published: (2025)
by: Wu, Jie JW, et al.
Published: (2025)
An abstraction for solving multi-domain problems using finite element methods
by: Sagiyama, Koki, et al.
Published: (2025)
by: Sagiyama, Koki, et al.
Published: (2025)
Social Media Reactions to Open Source Promotions: AI-Powered GitHub Projects on Hacker News
by: Meakpaiboonwattana, Prachnachai, et al.
Published: (2025)
by: Meakpaiboonwattana, Prachnachai, et al.
Published: (2025)
MathDuels: Evaluating LLMs as Problem Posers and Solvers
by: Xu, Zhiqiu, et al.
Published: (2026)
by: Xu, Zhiqiu, et al.
Published: (2026)
ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants
by: Xu, Xiangzhe, et al.
Published: (2025)
by: Xu, Xiangzhe, et al.
Published: (2025)
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
by: Peng, Jiaren, et al.
Published: (2026)
by: Peng, Jiaren, et al.
Published: (2026)
CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
by: Shi, Jingwei, et al.
Published: (2026)
by: Shi, Jingwei, et al.
Published: (2026)
Using a Feedback Loop for LLM-based Infrastructure as Code Generation
by: Palavalli, Mayur Amarnath, et al.
Published: (2024)
by: Palavalli, Mayur Amarnath, et al.
Published: (2024)
Holistic Evaluation of State-of-the-Art LLMs for Code Generation
by: Zhang, Le, et al.
Published: (2025)
by: Zhang, Le, et al.
Published: (2025)
MCeT: Behavioral Model Correctness Evaluation using Large Language Models
by: Ahmed, Khaled, et al.
Published: (2025)
by: Ahmed, Khaled, et al.
Published: (2025)
Program Structure Aware Precondition Generation
by: Dinella, Elizabeth, et al.
Published: (2023)
by: Dinella, Elizabeth, et al.
Published: (2023)
SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization
by: Reddy, Revanth Gangi, et al.
Published: (2025)
by: Reddy, Revanth Gangi, et al.
Published: (2025)
Conditional Execution of Transpiler Passes Based on Per-Script Feature Detection
by: Bhatia, Rishipal Singh
Published: (2026)
by: Bhatia, Rishipal Singh
Published: (2026)
SweRank: Software Issue Localization with Code Ranking
by: Reddy, Revanth Gangi, et al.
Published: (2025)
by: Reddy, Revanth Gangi, et al.
Published: (2025)
A novel instance‐based method for cross‐project just‐in‐time defect prediction
by: Xiaoyan Zhu, et al.
Published: (2024)
by: Xiaoyan Zhu, et al.
Published: (2024)
Component Matching Approach in Linking Business and Application Architecture
by: Kamath, Suresh
Published: (2024)
by: Kamath, Suresh
Published: (2024)
Component Matching as a Graph Matching Problem
by: Kamath, Suresh
Published: (2024)
by: Kamath, Suresh
Published: (2024)
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
by: Li, Ziyang, et al.
Published: (2024)
by: Li, Ziyang, et al.
Published: (2024)
Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
by: Deshpande, Darshan, et al.
Published: (2026)
by: Deshpande, Darshan, et al.
Published: (2026)
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities
by: Khare, Avishree, et al.
Published: (2023)
by: Khare, Avishree, et al.
Published: (2023)
Statistical Confidence in Functional Correctness: An Approach for AI Product Functional Correctness Evaluation
by: Albertini, Wallace, et al.
Published: (2026)
by: Albertini, Wallace, et al.
Published: (2026)
Harnessing Large Language Models for Seed Generation in Greybox Fuzzing
by: Shi, Wenxuan, et al.
Published: (2024)
by: Shi, Wenxuan, et al.
Published: (2024)
ScalerEval: Automated and Consistent Evaluation Testbed for Auto-scalers in Microservices
by: Xie, Shuaiyu, et al.
Published: (2025)
by: Xie, Shuaiyu, et al.
Published: (2025)
Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks
by: Hu, Xing, et al.
Published: (2025)
by: Hu, Xing, et al.
Published: (2025)
Assessing Coherency and Consistency of Code Execution Reasoning by Large Language Models
by: Liu, Changshu, et al.
Published: (2025)
by: Liu, Changshu, et al.
Published: (2025)
Calibration and Correctness of Language Models for Code
by: Spiess, Claudio, et al.
Published: (2024)
by: Spiess, Claudio, et al.
Published: (2024)
Where's the Bug? Attention Probing for Scalable Fault Localization
by: Stein, Adam, et al.
Published: (2025)
by: Stein, Adam, et al.
Published: (2025)
Testing and Evaluation of Large Language Models: Correctness, Non-Toxicity, and Fairness
by: Wang, Wenxuan
Published: (2024)
by: Wang, Wenxuan
Published: (2024)
QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities
by: Wang, Claire, et al.
Published: (2025)
by: Wang, Claire, et al.
Published: (2025)
EvaluateXAI: A Framework to Evaluate the Reliability and Consistency of Rule-based XAI Techniques for Software Analytics Tasks
by: Awal, Md Abdul, et al.
Published: (2024)
by: Awal, Md Abdul, et al.
Published: (2024)
The Road of Adaptive AI for Precision in Cybersecurity
by: Garg, Sahil
Published: (2025)
by: Garg, Sahil
Published: (2025)
An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software
by: Bhatia, Aaditya, et al.
Published: (2023)
by: Bhatia, Aaditya, et al.
Published: (2023)
Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations
by: Lin, Guancheng, et al.
Published: (2025)
by: Lin, Guancheng, et al.
Published: (2025)
Launch-Day Diffusion: Tracking Hacker News Impact on GitHub Stars for AI Tools
by: Kraishan, Obada
Published: (2025)
by: Kraishan, Obada
Published: (2025)
TorchQL: A Programming Framework for Integrity Constraints in Machine Learning
by: Naik, Aaditya, et al.
Published: (2023)
by: Naik, Aaditya, et al.
Published: (2023)
Nirjas: An open source framework for extracting metadata from the source code
by: Bhardwaj, Ayush, et al.
Published: (2024)
by: Bhardwaj, Ayush, et al.
Published: (2024)
ATime-Consistent Benchmark for Repository-Level Software Engineering Evaluation
by: Xianpeng, et al.
Published: (2026)
by: Xianpeng, et al.
Published: (2026)
Contract2Plan: Verified Contract-Grounded Retrieval-Augmented Optimization for BOM-Aware Procurement and Multi-Echelon Inventory Planning
by: Agarwal, Sahil
Published: (2026)
by: Agarwal, Sahil
Published: (2026)
ConAIR:Consistency-Augmented Iterative Interaction Framework to Enhance the Reliability of Code Generation
by: Dong, Jinhao, et al.
Published: (2024)
by: Dong, Jinhao, et al.
Published: (2024)
Evaluating Robustness of Large Language Models in Enterprise Applications: Benchmarks for Perturbation Consistency Across Formats and Languages
by: Bogavelli, Tara, et al.
Published: (2026)
by: Bogavelli, Tara, et al.
Published: (2026)
Similar Items
-
GrowthHacker: Automated Off-Policy Evaluation Optimization Using Code-Modifying LLM Agents
by: Wu, Jie JW, et al.
Published: (2025) -
An abstraction for solving multi-domain problems using finite element methods
by: Sagiyama, Koki, et al.
Published: (2025) -
Social Media Reactions to Open Source Promotions: AI-Powered GitHub Projects on Hacker News
by: Meakpaiboonwattana, Prachnachai, et al.
Published: (2025) -
MathDuels: Evaluating LLMs as Problem Posers and Solvers
by: Xu, Zhiqiu, et al.
Published: (2026) -
ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants
by: Xu, Xiangzhe, et al.
Published: (2025)