Salvato in:
Dettagli Bibliografici
Autori principali: Hu, Kangping, Mussmann, Stephen
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2510.09877
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866918490223411200
author Hu, Kangping
Mussmann, Stephen
author_facet Hu, Kangping
Mussmann, Stephen
contents Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.
format Preprint
id arxiv_https___arxiv_org_abs_2510_09877
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Batch Bayesian Active Learning with Partial Batch Label Sampling
Hu, Kangping
Mussmann, Stephen
Machine Learning
Artificial Intelligence
Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.
title Batch Bayesian Active Learning with Partial Batch Label Sampling
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2510.09877