Saved in:
| Main Authors: | Cihon, Peter, Stein, Merlin, Bansal, Gagan, Manning, Sam, Xu, Kevin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.15212 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Societal Capacity Assessment Framework: Measuring Resilience to Inform Advanced AI Risk Management
by: Gandhi, Milan, et al.
Published: (2025)
by: Gandhi, Milan, et al.
Published: (2025)
Trends in Frontier AI Model Count: A Forecast to 2028
by: Kumar, Iyngkarran, et al.
Published: (2025)
by: Kumar, Iyngkarran, et al.
Published: (2025)
Configurable multi-agent framework for scalable and realistic testing of llm-based agents
by: Wang, Sai, et al.
Published: (2025)
by: Wang, Sai, et al.
Published: (2025)
Towards Human-level Dexterity via Robot Learning
by: Khandate, Gagan
Published: (2025)
by: Khandate, Gagan
Published: (2025)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions
by: Vasconcelos, Helena, et al.
Published: (2023)
by: Vasconcelos, Helena, et al.
Published: (2023)
The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI
by: Stein, Merlin, et al.
Published: (2024)
by: Stein, Merlin, et al.
Published: (2024)
Towards provable probabilistic safety for scalable embodied AI systems
by: He, Linxuan, et al.
Published: (2025)
by: He, Linxuan, et al.
Published: (2025)
Interactive Debugging and Steering of Multi-Agent AI Systems
by: Epperson, Will, et al.
Published: (2025)
by: Epperson, Will, et al.
Published: (2025)
The case for delegated AI autonomy for Human AI teaming in healthcare
by: Jia, Yan, et al.
Published: (2025)
by: Jia, Yan, et al.
Published: (2025)
Optimizing Sequential Multi-Step Tasks with Parallel LLM Agents
by: Zhang, Enhao, et al.
Published: (2025)
by: Zhang, Enhao, et al.
Published: (2025)
AutoHarness: improving LLM agents by automatically synthesizing a code harness
by: Lou, Xinghua, et al.
Published: (2026)
by: Lou, Xinghua, et al.
Published: (2026)
Generalization in medical AI: a perspective on developing scalable models
by: Zvuloni, Eran, et al.
Published: (2023)
by: Zvuloni, Eran, et al.
Published: (2023)
Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction
by: Sorokoletova, Olga, et al.
Published: (2025)
by: Sorokoletova, Olga, et al.
Published: (2025)
Philosophical Dispositions as Behavioral Constraints for AI-Assisted Code Review: An Empirical Study
by: Bansal, Kaushal
Published: (2026)
by: Bansal, Kaushal
Published: (2026)
Advancing Ocean State Estimation with efficient and scalable AI
by: Xiang, Yanfei, et al.
Published: (2025)
by: Xiang, Yanfei, et al.
Published: (2025)
Towards Measuring Goal-Directedness in AI Systems
by: Xu, Dylan, et al.
Published: (2024)
by: Xu, Dylan, et al.
Published: (2024)
AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse
by: Bansal, Aayam, et al.
Published: (2026)
by: Bansal, Aayam, et al.
Published: (2026)
In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach
by: Halder, Pallock, et al.
Published: (2026)
by: Halder, Pallock, et al.
Published: (2026)
Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education
by: Sahu, Vikrant, et al.
Published: (2025)
by: Sahu, Vikrant, et al.
Published: (2025)
QuantAgents: Towards Multi-agent Financial System via Simulated Trading
by: Li, Xiangyu, et al.
Published: (2025)
by: Li, Xiangyu, et al.
Published: (2025)
CASET: Complexity Analysis using Simple Execution Traces for CS* submissions
by: Mehta, Aaryen, et al.
Published: (2024)
by: Mehta, Aaryen, et al.
Published: (2024)
Log analysis is necessary for credible evaluation of AI agents
by: Kirgis, Peter, et al.
Published: (2026)
by: Kirgis, Peter, et al.
Published: (2026)
A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles
by: Xu, Zhefan, et al.
Published: (2023)
by: Xu, Zhefan, et al.
Published: (2023)
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
by: Chen, Yinfang, et al.
Published: (2025)
by: Chen, Yinfang, et al.
Published: (2025)
A new approach for encoding code and assisting code understanding
by: Fan, Mengdan, et al.
Published: (2024)
by: Fan, Mengdan, et al.
Published: (2024)
Towards a Standard, Enterprise-Relevant Agentic AI Benchmark: Lessons from 5.5 billion tokens' worth of agentic AI evaluations
by: Roig, JV
Published: (2025)
by: Roig, JV
Published: (2025)
Aligning LLM agents with human learning and adjustment behavior: a dual agent approach
by: Liu, Tianming, et al.
Published: (2025)
by: Liu, Tianming, et al.
Published: (2025)
AI co-mathematician: Accelerating mathematicians with agentic AI
by: Zheng, Daniel, et al.
Published: (2026)
by: Zheng, Daniel, et al.
Published: (2026)
Emotional Analysis of Fashion Trends Using Social Media and AI: Sentiment Analysis on Twitter for Fashion Trend Forecasting
by: Bansal, Aayam, et al.
Published: (2025)
by: Bansal, Aayam, et al.
Published: (2025)
Agent psychometrics: Task-level performance prediction in agentic coding benchmarks
by: Ge, Chris, et al.
Published: (2026)
by: Ge, Chris, et al.
Published: (2026)
Towards a Science Exocortex
by: Yager, Kevin G.
Published: (2024)
by: Yager, Kevin G.
Published: (2024)
Challenges in Human-Agent Communication
by: Bansal, Gagan, et al.
Published: (2024)
by: Bansal, Gagan, et al.
Published: (2024)
How are AI agents used? Evidence from 177,000 MCP tools
by: Stein, Merlin
Published: (2026)
by: Stein, Merlin
Published: (2026)
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
by: Bhatia, Gagan, et al.
Published: (2025)
by: Bhatia, Gagan, et al.
Published: (2025)
Measuring What AI Systems Might Do: Towards A Measurement Science in AI
by: Voudouris, Konstantinos, et al.
Published: (2026)
by: Voudouris, Konstantinos, et al.
Published: (2026)
Adaptive routing protocols for determining optimal paths in AI multi-agent systems: a priority- and learning-enhanced approach
by: Panayotov, Theodor, et al.
Published: (2025)
by: Panayotov, Theodor, et al.
Published: (2025)
Automated QoR improvement in OpenROAD with coding agents
by: Ghose, Amur, et al.
Published: (2026)
by: Ghose, Amur, et al.
Published: (2026)
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
by: Kumarage, Tharindu, et al.
Published: (2025)
by: Kumarage, Tharindu, et al.
Published: (2025)
Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving
by: Hu, Senkang, et al.
Published: (2023)
by: Hu, Senkang, et al.
Published: (2023)
Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
by: Schäfer, Pascal, et al.
Published: (2026)
by: Schäfer, Pascal, et al.
Published: (2026)
Similar Items
-
Societal Capacity Assessment Framework: Measuring Resilience to Inform Advanced AI Risk Management
by: Gandhi, Milan, et al.
Published: (2025) -
Trends in Frontier AI Model Count: A Forecast to 2028
by: Kumar, Iyngkarran, et al.
Published: (2025) -
Configurable multi-agent framework for scalable and realistic testing of llm-based agents
by: Wang, Sai, et al.
Published: (2025) -
Towards Human-level Dexterity via Robot Learning
by: Khandate, Gagan
Published: (2025) -
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions
by: Vasconcelos, Helena, et al.
Published: (2023)