Saved in:
Bibliographic Details
Main Authors: Cao, Qi, Zhang, Shuhao, Zhou, Ruizhe, Zhang, Ruiyi, Qin, Peijia, Xie, Pengtao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.22323
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908834226765824
author Cao, Qi
Zhang, Shuhao
Zhou, Ruizhe
Zhang, Ruiyi
Qin, Peijia
Xie, Pengtao
author_facet Cao, Qi
Zhang, Shuhao
Zhou, Ruizhe
Zhang, Ruiyi
Qin, Peijia
Xie, Pengtao
contents Model routing chooses which language model to use for each query. By sending easy queries to cheaper models and hard queries to stronger ones, it can significantly reduce inference cost while maintaining high accuracy. However, most existing routers treat this as a fixed choice among a small set of models, which makes them hard to adapt to new models or changing budget constraints. In this paper, we propose SCOPE (Scalable and Controllable Outcome Performance Estimator), a routing framework that goes beyond model selection by predicting their cost and performance. Trained with reinforcement learning, SCOPE makes reasoning-based predictions by retrieving how models behave on similar problems, rather than relying on fixed model names, enabling it to work with new, unseen models. Moreover, by explicitly predicting how accurate and how expensive a model will be, it turns routing into a dynamic decision problem, allowing users to easily control the trade-off between accuracy and cost. Experiments show that SCOPE is more than just a cost-saving tool. It flexibly adapts to user needs: it can boost accuracy by up to 25.7% when performance is the priority, or cut costs by up to 95.1% when efficiency matters most.
format Preprint
id arxiv_https___arxiv_org_abs_2601_22323
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning
Cao, Qi
Zhang, Shuhao
Zhou, Ruizhe
Zhang, Ruiyi
Qin, Peijia
Xie, Pengtao
Machine Learning
Model routing chooses which language model to use for each query. By sending easy queries to cheaper models and hard queries to stronger ones, it can significantly reduce inference cost while maintaining high accuracy. However, most existing routers treat this as a fixed choice among a small set of models, which makes them hard to adapt to new models or changing budget constraints. In this paper, we propose SCOPE (Scalable and Controllable Outcome Performance Estimator), a routing framework that goes beyond model selection by predicting their cost and performance. Trained with reinforcement learning, SCOPE makes reasoning-based predictions by retrieving how models behave on similar problems, rather than relying on fixed model names, enabling it to work with new, unseen models. Moreover, by explicitly predicting how accurate and how expensive a model will be, it turns routing into a dynamic decision problem, allowing users to easily control the trade-off between accuracy and cost. Experiments show that SCOPE is more than just a cost-saving tool. It flexibly adapts to user needs: it can boost accuracy by up to 25.7% when performance is the priority, or cut costs by up to 95.1% when efficiency matters most.
title Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning
topic Machine Learning
url https://arxiv.org/abs/2601.22323