Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Nair-Kanneganti, Aparna, Chan, Trevor J., Goldfinger, Shir, Mackay, Emily, Anthony, Brian, Pouch, Alison
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.04048
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914075860008960
author	Nair-Kanneganti, Aparna Chan, Trevor J. Goldfinger, Shir Mackay, Emily Anthony, Brian Pouch, Alison
author_facet	Nair-Kanneganti, Aparna Chan, Trevor J. Goldfinger, Shir Mackay, Emily Anthony, Brian Pouch, Alison
contents	Despite huge advances, LLMs still lack convenient and reliable methods to quantify the uncertainty in their responses, making them difficult to trust in high-stakes applications. One of the simplest approaches to eliciting more accurate answers is to select the mode of many responses, a technique known as ensembling. In this work, we expand on typical ensembling approaches by looking at ensembles with a variable voting threshold. We introduce a theoretical framework for question answering and show that, by permitting ensembles to "abstain" from providing an answer when the dominant response falls short of the threshold, it is possible to dramatically increase the trustworthiness of the remaining answers. From this framework, we derive theoretical results as well as report experimental results on two problem domains: arithmetic problem solving and clinical-note question-answering. In both domains, we observe that large gains in answer trustworthiness can be achieved using highly restrictive voting ensembles, while incurring relatively modest reductions in response yield and accuracy. Due to this quality, voting ensembles may be particularly useful in applications - such as healthcare and data annotation - that require a high degree of certainty but which may not require that every question receive an automated answer.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_04048
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Increasing LLM response trustworthiness using voting ensembles Nair-Kanneganti, Aparna Chan, Trevor J. Goldfinger, Shir Mackay, Emily Anthony, Brian Pouch, Alison Artificial Intelligence Despite huge advances, LLMs still lack convenient and reliable methods to quantify the uncertainty in their responses, making them difficult to trust in high-stakes applications. One of the simplest approaches to eliciting more accurate answers is to select the mode of many responses, a technique known as ensembling. In this work, we expand on typical ensembling approaches by looking at ensembles with a variable voting threshold. We introduce a theoretical framework for question answering and show that, by permitting ensembles to "abstain" from providing an answer when the dominant response falls short of the threshold, it is possible to dramatically increase the trustworthiness of the remaining answers. From this framework, we derive theoretical results as well as report experimental results on two problem domains: arithmetic problem solving and clinical-note question-answering. In both domains, we observe that large gains in answer trustworthiness can be achieved using highly restrictive voting ensembles, while incurring relatively modest reductions in response yield and accuracy. Due to this quality, voting ensembles may be particularly useful in applications - such as healthcare and data annotation - that require a high degree of certainty but which may not require that every question receive an automated answer.
title	Increasing LLM response trustworthiness using voting ensembles
topic	Artificial Intelligence
url	https://arxiv.org/abs/2510.04048

Similar Items