Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cybenko, George, Ackerman, Joshua, Lintilhac, Paul
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2404.10200
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911842029273088
author	Cybenko, George Ackerman, Joshua Lintilhac, Paul
author_facet	Cybenko, George Ackerman, Joshua Lintilhac, Paul
contents	Language Models have demonstrated remarkable capabilities on some tasks while failing dramatically on others. The situation has generated considerable interest in understanding and comparing the capabilities of various Language Models (LMs) but those efforts have been largely ad hoc with results that are often little more than anecdotal. This is in stark contrast with testing and evaluation processes used in healthcare, radar signal processing, and other defense areas. In this paper, we describe Test and Evaluation of Language Models (TEL'M) as a principled approach for assessing the value of current and future LMs focused on high-value commercial, government and national security applications. We believe that this methodology could be applied to other Artificial Intelligence (AI) technologies as part of the larger goal of "industrializing" AI.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_10200
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	TEL'M: Test and Evaluation of Language Models Cybenko, George Ackerman, Joshua Lintilhac, Paul Artificial Intelligence Language Models have demonstrated remarkable capabilities on some tasks while failing dramatically on others. The situation has generated considerable interest in understanding and comparing the capabilities of various Language Models (LMs) but those efforts have been largely ad hoc with results that are often little more than anecdotal. This is in stark contrast with testing and evaluation processes used in healthcare, radar signal processing, and other defense areas. In this paper, we describe Test and Evaluation of Language Models (TEL'M) as a principled approach for assessing the value of current and future LMs focused on high-value commercial, government and national security applications. We believe that this methodology could be applied to other Artificial Intelligence (AI) technologies as part of the larger goal of "industrializing" AI.
title	TEL'M: Test and Evaluation of Language Models
topic	Artificial Intelligence
url	https://arxiv.org/abs/2404.10200

Similar Items