Guardado en:
Detalles Bibliográficos
Autores principales: Yan, Cheng, Zhang, Wuyang, Ning, Zhiyuan, Xu, Fan, Tao, Ziyang, Zhang, Lu, Yin, Bing, Zhang, Yanyong
Formato: Preprint
Publicado: 2026
Materias:
Acceso en línea:https://arxiv.org/abs/2601.06220
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917194241146880
author Yan, Cheng
Zhang, Wuyang
Ning, Zhiyuan
Xu, Fan
Tao, Ziyang
Zhang, Lu
Yin, Bing
Zhang, Yanyong
author_facet Yan, Cheng
Zhang, Wuyang
Ning, Zhiyuan
Xu, Fan
Tao, Ziyang
Zhang, Lu
Yin, Bing
Zhang, Yanyong
contents The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.
format Preprint
id arxiv_https___arxiv_org_abs_2601_06220
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space
Yan, Cheng
Zhang, Wuyang
Ning, Zhiyuan
Xu, Fan
Tao, Ziyang
Zhang, Lu
Yin, Bing
Zhang, Yanyong
Machine Learning
Artificial Intelligence
The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.
title Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2601.06220