Guardado en:
| Autores principales: | , , , , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2601.06220 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866917194241146880 |
|---|---|
| author | Yan, Cheng Zhang, Wuyang Ning, Zhiyuan Xu, Fan Tao, Ziyang Zhang, Lu Yin, Bing Zhang, Yanyong |
| author_facet | Yan, Cheng Zhang, Wuyang Ning, Zhiyuan Xu, Fan Tao, Ziyang Zhang, Lu Yin, Bing Zhang, Yanyong |
| contents | The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_06220 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space Yan, Cheng Zhang, Wuyang Ning, Zhiyuan Xu, Fan Tao, Ziyang Zhang, Lu Yin, Bing Zhang, Yanyong Machine Learning Artificial Intelligence The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency. |
| title | Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2601.06220 |