:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Qingni, Fan, Yue, Wang, Xin Eric
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Software Engineering
Online Access:	https://arxiv.org/abs/2602.02419
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
by: Han, Qijun, et al.
Published: (2026)

You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
by: Bian, Yutong, et al.
Published: (2025)

Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety
by: Iyengar, Anirudh, et al.
Published: (2026)

Localized Calibrated Uncertainty in Code Language Models
by: Gros, David, et al.
Published: (2025)

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
by: Qian, Yi, et al.
Published: (2026)

BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
by: Li, Yuanhao, et al.
Published: (2026)

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
by: Yang, Guang, et al.
Published: (2026)

Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction
by: Lie, Knut-Andreas, et al.
Published: (2026)

AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration
by: Wang, Ruiqi, et al.
Published: (2025)

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems
by: Wang, Su, et al.
Published: (2026)

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
by: Ganguly, Debargha, et al.
Published: (2025)

Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial Expectations
by: Li, Jintai, et al.
Published: (2026)

GAN-enhanced Simulation-driven DNN Testing in Absence of Ground Truth
by: Attaoui, Mohammed, et al.
Published: (2025)

Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement
by: Hayashi, Hiroaki, et al.
Published: (2025)

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery
by: Liu, Tianyu, et al.
Published: (2026)

Model Provenance via Model DNA
by: Mu, Xin, et al.
Published: (2023)

Towards Structured, State-Aware, and Execution-Grounded Reasoning for Software Engineering Agents
by: Tse-Hsun, et al.
Published: (2026)

Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution
by: Fendley, Neil, et al.
Published: (2026)

Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
by: Di, Yifeng, et al.
Published: (2025)

AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering
by: Kumar, Rajesh, et al.
Published: (2026)

REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
by: Agrawal, Yuvraj
Published: (2026)

InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
by: Chen, Qiaosheng, et al.
Published: (2025)

GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair
by: Xiao, Yinhao, et al.
Published: (2026)

GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing
by: Chen, Xiaoyi, et al.
Published: (2026)

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents
by: Hong, Sirui, et al.
Published: (2026)

Spec Kit Agents: Context-Grounded Agentic Workflows
by: Taghavi, Pardis, et al.
Published: (2026)

Fragmented Layer Grouping in GUI Designs Through Graph Learning Based on Multimodal Information
by: Chen, Yunnong, et al.
Published: (2024)

AndroidControl-Curated: Revealing the True Potential of GUI Agents through Benchmark Purification
by: Leung, Ho Fai, et al.
Published: (2025)

Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring
by: Borg, Markus
Published: (2024)

When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference
by: Sun, Zhensu, et al.
Published: (2024)

Workflow for Safe-AI
by: Veljanovska, Suzana, et al.
Published: (2025)

Semantic Voting: Execution-Grounded Consensus for LLM Code Generation
by: Jiang, Shan, et al.
Published: (2026)

Large Language Models in Code Co-generation for Safe Autonomous Vehicles
by: Nouri, Ali, et al.
Published: (2025)

InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management
by: Lin, Liangtao, et al.
Published: (2025)

Does In-IDE Calibration of Large Language Models work at Scale?
by: Koohestani, Roham, et al.
Published: (2025)

Beyond Trusting Trust: Multi-Model Validation for Robust Code Generation
by: McDanel, Bradley
Published: (2025)

LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
by: Zhou, Zenghui, et al.
Published: (2026)

When Fuzzing Meets LLMs: Challenges and Opportunities
by: Jiang, Yu, et al.
Published: (2024)

Agentic AI Software Engineers: Programming with Trust
by: Roychoudhury, Abhik, et al.
Published: (2025)

Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation
by: Yeo, Sangyeop, et al.
Published: (2025)