Saved in:
Bibliographic Details
Main Authors: Chen, Bo-Wei, Chen, Chung-Chi, Yen, An-Zi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.22090
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918356782678016
author Chen, Bo-Wei
Chen, Chung-Chi
Yen, An-Zi
author_facet Chen, Bo-Wei
Chen, Chung-Chi
Yen, An-Zi
contents Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to GPT-4o API calls, it reduces token usage by approximately 60\%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.
format Preprint
id arxiv_https___arxiv_org_abs_2602_22090
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference
Chen, Bo-Wei
Chen, Chung-Chi
Yen, An-Zi
Computation and Language
Large Language Models (LLMs) have revolutionized inference across diverse natural language tasks, with larger models performing better but at higher computational costs. We propose a confidence-driven strategy that dynamically selects the most suitable model based on confidence estimates. By assessing a model's confidence in handling the task and response accuracy, tasks that are likely to be solved correctly are retained, while more uncertain or complex cases are delegated to a larger model, ensuring reliability while minimizing computation. Specifically, we evaluate a model's likelihood of knowing the correct answer and the probability that its response is accurate. Experiments on the Massive Multitask Language Understanding (MMLU) benchmark show that our approach achieves accuracy comparable to the largest model while reducing computational costs by 20\% to 40\%. When applied to GPT-4o API calls, it reduces token usage by approximately 60\%, further improving cost efficiency. These findings indicate the potential of confidence-based model selection to enhance real-world LLM deployment, particularly in resource-constrained settings such as edge devices and commercial API applications.
title Confidence-Driven Multi-Scale Model Selection for Cost-Efficient Inference
topic Computation and Language
url https://arxiv.org/abs/2602.22090