Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Diaz, Fernando, Ekstrand, Michael D., Mitra, Bhaskar
Format:	Preprint
Published:	2023
Subjects:	Information Retrieval
Online Access:	https://arxiv.org/abs/2302.11370
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915040905396224
author	Diaz, Fernando Ekstrand, Michael D. Mitra, Bhaskar
author_facet	Diaz, Fernando Ekstrand, Michael D. Mitra, Bhaskar
contents	Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in rankings from a formal perspective. Our analysis is composed of three tenets: recall, robustness, and lexicographic evaluation. First, we formally define `recall-orientation' as the sensitivity of a metric to a user interested in finding every relevant item. Second, we analyze recall-orientation from the perspective of robustness with respect to possible content consumers and providers, connecting recall to recent conversations about fair ranking. Finally, we extend this conceptual and theoretical treatment of recall by developing a practical preference-based evaluation method based on lexicographic comparison. Through extensive empirical analysis across three recommendation tasks and 17 information retrieval tasks, we establish that our new evaluation method, lexirecall, has convergent validity (i.e., it is correlated with existing recall metrics) and exhibits substantially higher sensitivity in terms of discriminative power and stability in the presence of missing labels. Our conceptual, theoretical, and empirical analysis substantially deepens our understanding of recall and motivates its adoption through connections to robustness and fairness.
format	Preprint
id	arxiv_https___arxiv_org_abs_2302_11370
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Recall, Robustness, and Lexicographic Evaluation Diaz, Fernando Ekstrand, Michael D. Mitra, Bhaskar Information Retrieval Although originally developed to evaluate sets of items, recall is often used to evaluate rankings of items, including those produced by recommender, retrieval, and other machine learning systems. The application of recall without a formal evaluative motivation has led to criticism of recall as a vague or inappropriate measure. In light of this debate, we reflect on the measurement of recall in rankings from a formal perspective. Our analysis is composed of three tenets: recall, robustness, and lexicographic evaluation. First, we formally define `recall-orientation' as the sensitivity of a metric to a user interested in finding every relevant item. Second, we analyze recall-orientation from the perspective of robustness with respect to possible content consumers and providers, connecting recall to recent conversations about fair ranking. Finally, we extend this conceptual and theoretical treatment of recall by developing a practical preference-based evaluation method based on lexicographic comparison. Through extensive empirical analysis across three recommendation tasks and 17 information retrieval tasks, we establish that our new evaluation method, lexirecall, has convergent validity (i.e., it is correlated with existing recall metrics) and exhibits substantially higher sensitivity in terms of discriminative power and stability in the presence of missing labels. Our conceptual, theoretical, and empirical analysis substantially deepens our understanding of recall and motivates its adoption through connections to robustness and fairness.
title	Recall, Robustness, and Lexicographic Evaluation
topic	Information Retrieval
url	https://arxiv.org/abs/2302.11370

Similar Items