Saved in:
| Main Authors: | Wang, Qingni, Fan, Yue, Wang, Xin Eric |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02419 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
by: Han, Qijun, et al.
Published: (2026)
by: Han, Qijun, et al.
Published: (2026)
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
by: Bian, Yutong, et al.
Published: (2025)
by: Bian, Yutong, et al.
Published: (2025)
Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety
by: Iyengar, Anirudh, et al.
Published: (2026)
by: Iyengar, Anirudh, et al.
Published: (2026)
Localized Calibrated Uncertainty in Code Language Models
by: Gros, David, et al.
Published: (2025)
by: Gros, David, et al.
Published: (2025)
Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
by: Qian, Yi, et al.
Published: (2026)
by: Qian, Yi, et al.
Published: (2026)
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
by: Li, Yuanhao, et al.
Published: (2026)
by: Li, Yuanhao, et al.
Published: (2026)
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation
by: Yang, Guang, et al.
Published: (2026)
by: Yang, Guang, et al.
Published: (2026)
Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction
by: Lie, Knut-Andreas, et al.
Published: (2026)
by: Lie, Knut-Andreas, et al.
Published: (2026)
AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration
by: Wang, Ruiqi, et al.
Published: (2025)
by: Wang, Ruiqi, et al.
Published: (2025)
When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems
by: Wang, Su, et al.
Published: (2026)
by: Wang, Su, et al.
Published: (2026)
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
by: Ganguly, Debargha, et al.
Published: (2025)
by: Ganguly, Debargha, et al.
Published: (2025)
Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial Expectations
by: Li, Jintai, et al.
Published: (2026)
by: Li, Jintai, et al.
Published: (2026)
GAN-enhanced Simulation-driven DNN Testing in Absence of Ground Truth
by: Attaoui, Mohammed, et al.
Published: (2025)
by: Attaoui, Mohammed, et al.
Published: (2025)
Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement
by: Hayashi, Hiroaki, et al.
Published: (2025)
by: Hayashi, Hiroaki, et al.
Published: (2025)
DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery
by: Liu, Tianyu, et al.
Published: (2026)
by: Liu, Tianyu, et al.
Published: (2026)
Model Provenance via Model DNA
by: Mu, Xin, et al.
Published: (2023)
by: Mu, Xin, et al.
Published: (2023)
Towards Structured, State-Aware, and Execution-Grounded Reasoning for Software Engineering Agents
by: Tse-Hsun, et al.
Published: (2026)
by: Tse-Hsun, et al.
Published: (2026)
Comment and Control: Hijacking Agentic Workflows via Context-Grounded Evolution
by: Fendley, Neil, et al.
Published: (2026)
by: Fendley, Neil, et al.
Published: (2026)
Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding
by: Di, Yifeng, et al.
Published: (2025)
by: Di, Yifeng, et al.
Published: (2025)
AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering
by: Kumar, Rajesh, et al.
Published: (2026)
by: Kumar, Rajesh, et al.
Published: (2026)
REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
by: Agrawal, Yuvraj
Published: (2026)
by: Agrawal, Yuvraj
Published: (2026)
InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
by: Chen, Qiaosheng, et al.
Published: (2025)
by: Chen, Qiaosheng, et al.
Published: (2025)
GeoContra: From Fluent GIS Code to Verifiable Spatial Analysis with Geography-Grounded Repair
by: Xiao, Yinhao, et al.
Published: (2026)
by: Xiao, Yinhao, et al.
Published: (2026)
GUITestScape: Towards Open-set Evaluation on Exploratory GUI Testing
by: Chen, Xiaoyi, et al.
Published: (2026)
by: Chen, Xiaoyi, et al.
Published: (2026)
DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents
by: Hong, Sirui, et al.
Published: (2026)
by: Hong, Sirui, et al.
Published: (2026)
Spec Kit Agents: Context-Grounded Agentic Workflows
by: Taghavi, Pardis, et al.
Published: (2026)
by: Taghavi, Pardis, et al.
Published: (2026)
Fragmented Layer Grouping in GUI Designs Through Graph Learning Based on Multimodal Information
by: Chen, Yunnong, et al.
Published: (2024)
by: Chen, Yunnong, et al.
Published: (2024)
AndroidControl-Curated: Revealing the True Potential of GUI Agents through Benchmark Purification
by: Leung, Ho Fai, et al.
Published: (2025)
by: Leung, Ho Fai, et al.
Published: (2025)
Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring
by: Borg, Markus
Published: (2024)
by: Borg, Markus
Published: (2024)
When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference
by: Sun, Zhensu, et al.
Published: (2024)
by: Sun, Zhensu, et al.
Published: (2024)
Workflow for Safe-AI
by: Veljanovska, Suzana, et al.
Published: (2025)
by: Veljanovska, Suzana, et al.
Published: (2025)
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation
by: Jiang, Shan, et al.
Published: (2026)
by: Jiang, Shan, et al.
Published: (2026)
Large Language Models in Code Co-generation for Safe Autonomous Vehicles
by: Nouri, Ali, et al.
Published: (2025)
by: Nouri, Ali, et al.
Published: (2025)
InfraMind: A Novel Exploration-based GUI Agentic Framework for Mission-critical Industrial Management
by: Lin, Liangtao, et al.
Published: (2025)
by: Lin, Liangtao, et al.
Published: (2025)
Does In-IDE Calibration of Large Language Models work at Scale?
by: Koohestani, Roham, et al.
Published: (2025)
by: Koohestani, Roham, et al.
Published: (2025)
Beyond Trusting Trust: Multi-Model Validation for Robust Code Generation
by: McDanel, Bradley
Published: (2025)
by: McDanel, Bradley
Published: (2025)
LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs
by: Zhou, Zenghui, et al.
Published: (2026)
by: Zhou, Zenghui, et al.
Published: (2026)
When Fuzzing Meets LLMs: Challenges and Opportunities
by: Jiang, Yu, et al.
Published: (2024)
by: Jiang, Yu, et al.
Published: (2024)
Agentic AI Software Engineers: Programming with Trust
by: Roychoudhury, Abhik, et al.
Published: (2025)
by: Roychoudhury, Abhik, et al.
Published: (2025)
Chain of Grounded Objectives: Bridging Process and Goal-oriented Prompting for Code Generation
by: Yeo, Sangyeop, et al.
Published: (2025)
by: Yeo, Sangyeop, et al.
Published: (2025)
Similar Items
-
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
by: Han, Qijun, et al.
Published: (2026) -
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation
by: Bian, Yutong, et al.
Published: (2025) -
Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety
by: Iyengar, Anirudh, et al.
Published: (2026) -
Localized Calibrated Uncertainty in Code Language Models
by: Gros, David, et al.
Published: (2025) -
Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?
by: Qian, Yi, et al.
Published: (2026)