Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Bo-Wei, Chen, Chung-Chi, Yen, An-Zi
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.22090
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918356782678016
author	Chen, Bo-Wei Chen, Chung-Chi Yen, An-Zi
author_facet	Chen, Bo-Wei Chen, Chung-Chi Yen, An-Zi
contents	Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to GPT-4o API calls, it reduces token usage by approximately 60\%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_22090
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference Chen, Bo-Wei Chen, Chung-Chi Yen, An-Zi Computation and Language Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to GPT-4o API calls, it reduces token usage by approximately 60\%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.
title	Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference
topic	Computation and Language
url	https://arxiv.org/abs/2602.22090

Similar Items