:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Holmes, Matthew, Lacerda, Thiago, Schwartz, Reva
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.06811
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma
by: Schwartz, Reva, et al.
Published: (2026)

CIRCLE: A Framework for Evaluating AI from a Real-World Lens
by: Schwartz, Reva, et al.
Published: (2026)

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
by: Schwartz, Reva, et al.
Published: (2025)

Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts
by: Kryshtal, Andrii
Published: (2026)

Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts
by: Rezaei, Mosab, et al.
Published: (2025)

Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone
by: Vishwarupe, Varad, et al.
Published: (2026)

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols
by: Griffin, Charlie, et al.
Published: (2024)

Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications
by: Tang, Xinye, et al.
Published: (2025)

12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation
by: Ersoz, Ahmet Bahaddin
Published: (2026)

Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
by: Wilson, Sarah, et al.
Published: (2026)

Internal Deployment Gaps in AI Regulation
by: Kwon, Joe, et al.
Published: (2026)

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement
by: Gallego, Víctor
Published: (2025)

Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol
by: Srinivasan, Vasundra
Published: (2026)

Enhancing Multi-Agent Communication through Attention Steering with Context Relevance
by: Zhang, Hongxiang, et al.
Published: (2026)

A Field Guide to Deploying AI Agents in Clinical Practice
by: Gallifant, Jack, et al.
Published: (2025)

Solving Context Window Overflow in AI Agents
by: Labate, Anton Bulle, et al.
Published: (2025)

DAO-AI: Evaluating Collective Decision-Making through Agentic AI in Decentralized Governance
by: Capponi, Agostino, et al.
Published: (2025)

RAN Cortex: Memory-Augmented Intelligence for Context-Aware Decision-Making in AI-Native Networks
by: Barros, Sebastian
Published: (2025)

Monitoring Deployed AI Systems in Health Care
by: Keyes, Timothy, et al.
Published: (2025)

Relevance-driven Decision Making for Safer and More Efficient Human Robot Collaboration
by: Zhang, Xiaotong, et al.
Published: (2024)

Responsible Evaluation of AI for Mental Health
by: Arnaout, Hiba, et al.
Published: (2026)

PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents
by: Arman, Shifat E., et al.
Published: (2026)

Safety Must Precede the Deployment of Open-Ended AI
by: Sheth, Ivaxi, et al.
Published: (2025)

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning
by: Linze, Chen, et al.
Published: (2026)

Interactive AI Alignment: Specification, Process, and Evaluation Alignment
by: Terry, Michael, et al.
Published: (2023)

Can AI Make Energy Retrofit Decisions? An Evaluation of Large Language Models
by: Shu, Lei, et al.
Published: (2025)

Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
by: Kennedy, Wm. Matthew, et al.
Published: (2025)

The Ethics of AI in Education
by: Porayska-Pomsta, Kaska, et al.
Published: (2024)

CATCODER: Repository-Level Code Generation with Relevant Code and Type Context
by: Pan, Zhiyuan, et al.
Published: (2024)

Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine
by: Ke, Yu He, et al.
Published: (2024)

AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents
by: Chen, Jiaxiang, et al.
Published: (2025)

CRANE: Causal Relevance Analysis of Language-Specific Neurons in Multilingual Large Language Models
by: Le, Yifan, et al.
Published: (2026)

In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
by: Liu, Sheng, et al.
Published: (2023)

The Deployment Gap in AI Media Detection: Platform-Aware and Visually Constrained Adversarial Evaluation
by: Budhkar, Aishwarya, et al.
Published: (2026)

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
by: Tian, Runchu, et al.
Published: (2024)

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
by: Bertsch, Amanda, et al.
Published: (2025)

Contextual Moral Value Alignment Through Context-Based Aggregation
by: Dognin, Pierre, et al.
Published: (2024)

XChoice: Explainable Evaluation of AI-Human Alignment in LLM-based Constrained Choice Decision Making
by: Qi, Weihong, et al.
Published: (2026)

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents
by: Meng, M.
Published: (2026)

Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation
by: Arrieta, Aitor, et al.
Published: (2025)