Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Weitang, Li, Ying Wai, Li, Yuelei, Wang, Zihan, You, Yi-Zhuang, Shang, Jingbo
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2312.03291
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913574683672576
author	Liu, Weitang Li, Ying Wai Li, Yuelei Wang, Zihan You, Yi-Zhuang Shang, Jingbo
author_facet	Liu, Weitang Li, Ying Wai Li, Yuelei Wang, Zihan You, Yi-Zhuang Shang, Jingbo
contents	Evaluating models on datasets often fails to capture their behavior when faced with unexpected and diverse types of inputs. It would be beneficial if we could evaluate the difference between human annotation and model prediction for an internet number of inputs, or more generally, for an input space that enumeration is computationally impractical. Traditional model evaluation methods rely on precision and recall (PR) as metrics, which are typically estimated by comparing human annotations with model predictions on a specific dataset. This is feasible because enumerating thousands of test inputs is manageable. However, estimating PR across a large input space is challenging because enumeration becomes computationally infeasible. We propose OmniInput, a novel approach to evaluate and compare NNs by the PR of an input space. OmniInput is distinctive from previous works as its estimated PR reflects the estimation of the differences between human annotation and model prediction in the input space which is usually too huge to be enumerated. We empirically validate our method within an enumerable input space, and our experiments demonstrate that OmniInput can effectively estimate and compare precision and recall for (large) language models within a broad input space that is not enumerable.
format	Preprint
id	arxiv_https___arxiv_org_abs_2312_03291
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Evaluation of human-model prediction difference on the Internet Scale of Data Liu, Weitang Li, Ying Wai Li, Yuelei Wang, Zihan You, Yi-Zhuang Shang, Jingbo Machine Learning Artificial Intelligence Evaluating models on datasets often fails to capture their behavior when faced with unexpected and diverse types of inputs. It would be beneficial if we could evaluate the difference between human annotation and model prediction for an internet number of inputs, or more generally, for an input space that enumeration is computationally impractical. Traditional model evaluation methods rely on precision and recall (PR) as metrics, which are typically estimated by comparing human annotations with model predictions on a specific dataset. This is feasible because enumerating thousands of test inputs is manageable. However, estimating PR across a large input space is challenging because enumeration becomes computationally infeasible. We propose OmniInput, a novel approach to evaluate and compare NNs by the PR of an input space. OmniInput is distinctive from previous works as its estimated PR reflects the estimation of the differences between human annotation and model prediction in the input space which is usually too huge to be enumerated. We empirically validate our method within an enumerable input space, and our experiments demonstrate that OmniInput can effectively estimate and compare precision and recall for (large) language models within a broad input space that is not enumerable.
title	Evaluation of human-model prediction difference on the Internet Scale of Data
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2312.03291

Similar Items