Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fisch, Adam, Maynez, Joshua, Hofer, R. Alex, Dhingra, Bhuwan, Globerson, Amir, Cohen, William W.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.04291
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912142726266880
author	Fisch, Adam Maynez, Joshua Hofer, R. Alex Dhingra, Bhuwan Globerson, Amir Cohen, William W.
author_facet	Fisch, Adam Maynez, Joshua Hofer, R. Alex Dhingra, Bhuwan Globerson, Amir Cohen, William W.
contents	Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_04291
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation Fisch, Adam Maynez, Joshua Hofer, R. Alex Dhingra, Bhuwan Globerson, Amir Cohen, William W. Machine Learning Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.
title	Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
topic	Machine Learning
url	https://arxiv.org/abs/2406.04291

Similar Items