Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cihon, Peter, Stein, Merlin, Bansal, Gagan, Manning, Sam, Xu, Kevin
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.15212
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916624351625216
author	Cihon, Peter Stein, Merlin Bansal, Gagan Manning, Sam Xu, Kevin
author_facet	Cihon, Peter Stein, Merlin Bansal, Gagan Manning, Sam Xu, Kevin
contents	AI agents are AI systems that can achieve complex goals autonomously. Assessing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations -- observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the orchestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_15212
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Measuring AI agent autonomy: Towards a scalable approach with code inspection Cihon, Peter Stein, Merlin Bansal, Gagan Manning, Sam Xu, Kevin Artificial Intelligence AI agents are AI systems that can achieve complex goals autonomously. Assessing the level of agent autonomy is crucial for understanding both their potential benefits and risks. Current assessments of autonomy often focus on specific risks and rely on run-time evaluations -- observations of agent actions during operation. We introduce a code-based assessment of autonomy that eliminates the need to run an AI agent to perform specific tasks, thereby reducing the costs and risks associated with run-time evaluations. Using this code-based framework, the orchestration code used to run an AI agent can be scored according to a taxonomy that assesses attributes of autonomy: impact and oversight. We demonstrate this approach with the AutoGen framework and select applications.
title	Measuring AI agent autonomy: Towards a scalable approach with code inspection
topic	Artificial Intelligence
url	https://arxiv.org/abs/2502.15212

Similar Items