Saved in:
Bibliographic Details
Main Authors: Furmakiewicz, Michal, Liu, Chang, Taylor, Angus, Venger, Ilya
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.09512
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929419478630400
author Furmakiewicz, Michal
Liu, Chang
Taylor, Angus
Venger, Ilya
author_facet Furmakiewicz, Michal
Liu, Chang
Taylor, Angus
Venger, Ilya
contents Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively. A case study of developing copilot templates for the retail domain by Microsoft is used to illustrate the role and importance of each aspect. The first section explores the key technical components of a copilot's architecture, including the LLM, plugins for knowledge retrieval and actions, orchestration, system prompts, and responsible AI guardrails. The second section discusses testing and evaluation as a principled way to promote desired outcomes and manage unintended consequences when using AI in a business context. We discuss how to measure and improve its quality and safety, through the lens of an end-to-end human-AI decision loop framework. By providing insights into the anatomy of a copilot and the critical aspects of testing and evaluation, this paper provides concrete evidence of how good design and evaluation practices are essential for building effective, human-centered AI assistants.
format Preprint
id arxiv_https___arxiv_org_abs_2407_09512
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Design and evaluation of AI copilots -- case studies of retail copilot templates
Furmakiewicz, Michal
Liu, Chang
Taylor, Angus
Venger, Ilya
Human-Computer Interaction
Artificial Intelligence
Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively. A case study of developing copilot templates for the retail domain by Microsoft is used to illustrate the role and importance of each aspect. The first section explores the key technical components of a copilot's architecture, including the LLM, plugins for knowledge retrieval and actions, orchestration, system prompts, and responsible AI guardrails. The second section discusses testing and evaluation as a principled way to promote desired outcomes and manage unintended consequences when using AI in a business context. We discuss how to measure and improve its quality and safety, through the lens of an end-to-end human-AI decision loop framework. By providing insights into the anatomy of a copilot and the critical aspects of testing and evaluation, this paper provides concrete evidence of how good design and evaluation practices are essential for building effective, human-centered AI assistants.
title Design and evaluation of AI copilots -- case studies of retail copilot templates
topic Human-Computer Interaction
Artificial Intelligence
url https://arxiv.org/abs/2407.09512