Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autor principal:	Dhrif, Hassen
Formato:	Preprint
Publicado:	2025
Materias:	Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2509.23006
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866916973312475136
author	Dhrif, Hassen
author_facet	Dhrif, Hassen
contents	Agentic AI represents a paradigm shift in enhancing the capabilities of generative AI models. While these systems demonstrate immense potential and power, current evaluation techniques primarily focus on assessing their efficacy in identifying appropriate agents, tools, and parameters. However, a critical gap exists in evaluating the alignment between an Agentic AI system's tasks and its overarching goals. This paper introduces the Creative Adversarial Testing (CAT) framework, a novel approach designed to capture and analyze the complex relationship between Agentic AI tasks and the system's intended objectives. We validate the CAT framework through extensive simulation using synthetic interaction data modeled after Alexa+ audio services, a sophisticated Agentic AI system that shapes the user experience for millions of users globally. This synthetic data approach enables comprehensive testing of edge cases and failure modes while protecting user privacy. Our results demonstrate that the CAT framework provides unprecedented insights into goal-task alignment, enabling more effective optimization and development of Agentic AI systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_23006
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Creative Adversarial Testing (CAT): A Novel Framework for Evaluating Goal-Oriented Agentic AI Systems Dhrif, Hassen Artificial Intelligence Agentic AI represents a paradigm shift in enhancing the capabilities of generative AI models. While these systems demonstrate immense potential and power, current evaluation techniques primarily focus on assessing their efficacy in identifying appropriate agents, tools, and parameters. However, a critical gap exists in evaluating the alignment between an Agentic AI system's tasks and its overarching goals. This paper introduces the Creative Adversarial Testing (CAT) framework, a novel approach designed to capture and analyze the complex relationship between Agentic AI tasks and the system's intended objectives. We validate the CAT framework through extensive simulation using synthetic interaction data modeled after Alexa+ audio services, a sophisticated Agentic AI system that shapes the user experience for millions of users globally. This synthetic data approach enables comprehensive testing of edge cases and failure modes while protecting user privacy. Our results demonstrate that the CAT framework provides unprecedented insights into goal-task alignment, enabling more effective optimization and development of Agentic AI systems.
title	Creative Adversarial Testing (CAT): A Novel Framework for Evaluating Goal-Oriented Agentic AI Systems
topic	Artificial Intelligence
url	https://arxiv.org/abs/2509.23006

Ejemplares similares