:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cihon, Peter, Stein, Merlin, Bansal, Gagan, Manning, Sam, Xu, Kevin
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.15212
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Societal Capacity Assessment Framework: Measuring Resilience to Inform Advanced AI Risk Management
by: Gandhi, Milan, et al.
Published: (2025)

Trends in Frontier AI Model Count: A Forecast to 2028
by: Kumar, Iyngkarran, et al.
Published: (2025)

Configurable multi-agent framework for scalable and realistic testing of llm-based agents
by: Wang, Sai, et al.
Published: (2025)

Towards Human-level Dexterity via Robot Learning
by: Khandate, Gagan
Published: (2025)

Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions
by: Vasconcelos, Helena, et al.
Published: (2023)

The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI
by: Stein, Merlin, et al.
Published: (2024)

Towards provable probabilistic safety for scalable embodied AI systems
by: He, Linxuan, et al.
Published: (2025)

Interactive Debugging and Steering of Multi-Agent AI Systems
by: Epperson, Will, et al.
Published: (2025)

The case for delegated AI autonomy for Human AI teaming in healthcare
by: Jia, Yan, et al.
Published: (2025)

Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents
by: Zhang, Enhao, et al.
Published: (2025)

AutoHarness: improving LLM agents by automatically synthesizing a code harness
by: Lou, Xinghua, et al.
Published: (2026)

Generalization in medical AI: a perspective on developing scalable models
by: Zvuloni, Eran, et al.
Published: (2023)

Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction
by: Sorokoletova, Olga, et al.
Published: (2025)

Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study
by: Bansal, Kaushal
Published: (2026)

Advancing Ocean State Estimation with efficient and scalable AI
by: Xiang, Yanfei, et al.
Published: (2025)

Towards Measuring Goal-Directedness in AI Systems
by: Xu, Dylan, et al.
Published: (2024)

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse
by: Bansal, Aayam, et al.
Published: (2026)

In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach
by: Halder, Pallock, et al.
Published: (2026)

Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education
by: Sahu, Vikrant, et al.
Published: (2025)

QuantAgents: Towards Multi-agent Financial System via Simulated Trading
by: Li, Xiangyu, et al.
Published: (2025)

CASET: Complexity Analysis using Simple Execution Traces for CS* submissions
by: Mehta, Aaryen, et al.
Published: (2024)

Log analysis is necessary for credible evaluation of AI agents
by: Kirgis, Peter, et al.
Published: (2026)

A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles
by: Xu, Zhefan, et al.
Published: (2023)

AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
by: Chen, Yinfang, et al.
Published: (2025)

A new approach for encoding code and assisting code understanding
by: Fan, Mengdan, et al.
Published: (2024)

Towards a Standard, Enterprise-Relevant Agentic AI Benchmark: Lessons from 5.5 billion tokens' worth of agentic AI evaluations
by: Roig, JV
Published: (2025)

Aligning LLM agents with human learning and adjustment behavior: a dual agent approach
by: Liu, Tianming, et al.
Published: (2025)

AI co-mathematician: Accelerating mathematicians with agentic AI
by: Zheng, Daniel, et al.
Published: (2026)

Emotional Analysis of Fashion Trends Using Social Media and AI: Sentiment Analysis on Twitter for Fashion Trend Forecasting
by: Bansal, Aayam, et al.
Published: (2025)

Agent psychometrics: Task-level performance prediction in agentic coding benchmarks
by: Ge, Chris, et al.
Published: (2026)

Towards a Science Exocortex
by: Yager, Kevin G.
Published: (2024)

Challenges in Human-Agent Communication
by: Bansal, Gagan, et al.
Published: (2024)

How are AI agents used? Evidence from 177,000 MCP tools
by: Stein, Merlin
Published: (2026)

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
by: Bhatia, Gagan, et al.
Published: (2025)

Measuring What AI Systems Might Do: Towards A Measurement Science in AI
by: Voudouris, Konstantinos, et al.
Published: (2026)

Adaptive routing protocols for determining optimal paths in AI multi-agent systems: a priority- and learning-enhanced approach
by: Panayotov, Theodor, et al.
Published: (2025)

Automated QoR improvement in OpenROAD with coding agents
by: Ghose, Amur, et al.
Published: (2026)

Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
by: Kumarage, Tharindu, et al.
Published: (2025)

Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving
by: Hu, Senkang, et al.
Published: (2023)

Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
by: Schäfer, Pascal, et al.
Published: (2026)