Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Resnick, Paul, Kong, Yuqing, Schoenebeck, Grant, Weninger, Tim
Format:	Preprint
Published:	2021
Subjects:	Machine Learning Human-Computer Interaction Multiagent Systems
Online Access:	https://arxiv.org/abs/2106.01254
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

In many classification tasks, there is no definitive ground truth, only human judgments that may disagree. We address two challenges that arise in such settings: (1) how to use human raters to score classifiers, and (2) how to use them for comparison benchmarks. For the first, the common practice is to score classifiers against the majority vote of an evaluation panel of several human raters. We argue that this is not justified when either of two properties fails: objectivity or equanimity. Instead, under a utility model appropriate for such settings, scoring against one rater at a time and averaging the scores across raters is a more principled approach. For the second, we introduce the concept of rater equivalence: the smallest number of human raters whose combined judgment matches the classifier's performance. We provide a provably optimal algorithm for combining benchmark panel labels, and demonstrate the framework through case studies.

Similar Items