:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Barke, Shraddha, Goyal, Arnav, Khare, Alind, Singh, Avaljot, Nath, Suman, Bansal, Chetan
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.02475
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks
by: Jorf, Baraa Al, et al.
Published: (2026)

Willful Disobedience: Automatically Detecting Failures in Agentic Traces
by: Sharma, Reshabh K, et al.
Published: (2026)

Skim: Speculative Execution for Fast and Efficient Web Agents
by: Wong, Mike, et al.
Published: (2026)

Generative Caching for Structurally Similar Prompts and Responses
by: Chakraborty, Sarthak, et al.
Published: (2025)

Building AI Agents for Autonomous Clouds: Challenges and Design Principles
by: Shetty, Manish, et al.
Published: (2024)

ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving
by: Qiu, Haoran, et al.
Published: (2025)

Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems
by: Jaiswal, Shashwat, et al.
Published: (2025)

Debugging the Debuggers: Failure-Anchored Structured Recovery for Software Engineering Agents
by: Zhao, Chenyu, et al.
Published: (2026)

WAREX: Web Agent Reliability Evaluation on Existing Benchmarks
by: Kara, Su, et al.
Published: (2025)

WebXSkill: Skill Learning for Autonomous Web Agents
by: Wang, Zhaoyang, et al.
Published: (2026)

Harnessing AI Agents to Advance Research on Refugee Child Mental Health
by: Shrivastava, Aditya, et al.
Published: (2025)

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories
by: Nian, Junjie, et al.
Published: (2026)

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation
by: Wang, Xinyu, et al.
Published: (2026)

PromptPex: Automatic Test Generation for Language Model Prompts
by: Sharma, Reshabh K, et al.
Published: (2025)

Diagnosing Failure Modes of Shared-State Collaboration in Resource-Constrained Visual Agents
by: Zhou, Yunpeng
Published: (2026)

KLAS: Using Similarity to Stitch Neural Networks for Improved Accuracy-Efficiency Tradeoffs
by: Sanyal, Debopam, et al.
Published: (2026)

HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis
by: Barke, Shraddha, et al.
Published: (2024)

Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents
by: Yu, Xiao, et al.
Published: (2025)

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents
by: Yu, Xiao, et al.
Published: (2025)

AgentBound: Securing Execution Boundaries of AI Agents
by: Bühler, Christoph, et al.
Published: (2025)

RuleFlow : Generating Reusable Program Optimizations with LLMs
by: Singh, Avaljot, et al.
Published: (2026)

Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories
by: Majgaonkar, Oorja, et al.
Published: (2025)

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse
by: Bansal, Aayam, et al.
Published: (2026)

Holistic Evaluation and Failure Diagnosis of AI Agents
by: Madvil, Netta, et al.
Published: (2026)

PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement
by: Zhang, Tuo, et al.
Published: (2026)

SynthAgent: Adapting Web Agents with Synthetic Supervision
by: Wang, Zhaoyang, et al.
Published: (2025)

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories
by: Liu, Yibing, et al.
Published: (2026)

Cocoa: Co-Planning and Co-Execution with AI Agents
by: Feng, K. J. Kevin, et al.
Published: (2024)

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
by: Chen, Yinfang, et al.
Published: (2025)

Modular Jets for Supervised Pipelines: Diagnosing Mirage vs Identifiability
by: Sanyal, Suman
Published: (2025)

Enforcing Cybersecurity Constraints for LLM-driven Robot Agents for Online Transactions
by: Shah, Shraddha Pradipbhai, et al.
Published: (2025)

XAI for Coding Agent Failures: Transforming Raw Execution Traces into Actionable Insights
by: Joshi, Arun
Published: (2026)

BEAP-Agent: Backtrackable Execution and Adaptive Planning for GUI Agents
by: Lu, Ziyu, et al.
Published: (2026)

ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory
by: Zhong, Hongbin, et al.
Published: (2026)

An AI Agent Execution Environment to Safeguard User Data
by: Stanley, Robert, et al.
Published: (2026)

Agent Lifecycle Toolkit (ALTK): Reusable Middleware Components for Robust AI Agents
by: Wright, Zidane, et al.
Published: (2026)

Executable Agentic Memory for GUI Agent
by: Qin, Zerui, et al.
Published: (2026)

TimeClaw: A Time-Series AI Agent with Exploratory Execution Learning
by: Liu, Hangchen, et al.
Published: (2026)

MAD-PINN: A Decentralized Physics-Informed Machine Learning Framework for Safe and Optimal Multi-Agent Control
by: Tayal, Manan, et al.
Published: (2025)

POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems
by: Varela, Iñaki Dellibarda, et al.
Published: (2026)