Saved in:
| Main Authors: | Liu, Xunzhuo, Chen, Huamin, Lu, Samzong, Ovadia, Yossi, Wen, Guohong, Wu, Hao, Tan, Zhengda, Zhang, Jintao, Zedan, Senan, Kerido, Yehudit, Weiss, Liav, Zhang, Haichen, Yu, Bishen, Balum, Asaad, Limoy, Noa, Samara, Abdallah, Fan, Baofa, Salisbury, Brent, Cook, Ryan, Wang, Zhijie, Pan, Qiping, Khan, Rehan, Goswami, Avishek, Zhang, Houston H., Wang, Shuyi, Tang, Ziang, Han, Fang, Hassan, Zohaib, Zheng, Jianqiao, Changrani, Avinash |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.04444 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
When to Reason: Semantic Router for vLLM
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models
by: Tsfaty, Noam, et al.
Published: (2025)
by: Tsfaty, Noam, et al.
Published: (2025)
From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
vLLM Hook v0: A Plug-in for Programming Model Internals on vLLM
by: Ko, Ching-Yun, et al.
Published: (2026)
by: Ko, Ching-Yun, et al.
Published: (2026)
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Counting words without strictly increasing subwords of fixed length
by: Sekhon, Senan
Published: (2025)
by: Sekhon, Senan
Published: (2025)
A Necessary and Sufficient Condition for Uniqueness of Euclidean Division
by: Sekhon, Senan
Published: (2026)
by: Sekhon, Senan
Published: (2026)
LA EXCEPCIÓN EN EL DERECHO. DISCUSIÓN DEL ESTADO DE EXCEPCIÓN EN LA TEORÍA JURÍDICO POLÍTICA
by: Marcela Chahuán Zedán
Published: (2013)
by: Marcela Chahuán Zedán
Published: (2013)
Central Limit Theorem for Mutation Systems
by: Koram, Liav, et al.
Published: (2026)
by: Koram, Liav, et al.
Published: (2026)
EL CONSUMO EN TIEMPOS DE CRISIS: UNA APROXIMACIÓN SOCIOLÓGICA A LA DISTRIBUCIÓN DEL GASTO EN ESPAÑA
by: Gaspar Brändle Señán
Published: (2010)
by: Gaspar Brändle Señán
Published: (2010)
Endogenous knowledge and ethnobotanical importance of Treculia africana Decne. ssp. var. africana in southern Benin (West Africa)
by: Pascaline Sènan, Davoudou
Published: (2024)
by: Pascaline Sènan, Davoudou
Published: (2024)
Consumo y cambio social en España: evolución en el equipamiento doméstico (1983-2005)
by: Gaspar Brändle Señán
Published: (2007)
by: Gaspar Brändle Señán
Published: (2007)
Causal Mediation in Natural Experiments
by: Hogan-Hennessy, Senan
Published: (2025)
by: Hogan-Hennessy, Senan
Published: (2025)
Context-Aware Knowledge Distillation with Adaptive Weighting for Image Classification
by: Li, Zhengda
Published: (2025)
by: Li, Zhengda
Published: (2025)
The Benard-Conway invariant of two-component links
by: Liu, Zedan, et al.
Published: (2024)
by: Liu, Zedan, et al.
Published: (2024)
How Does Tenure Status Impact Library Usage: A Study of LaGuardia Community College
by: Ovadia, Steven
Published: (2009)
by: Ovadia, Steven
Published: (2009)
Working without a Crystal Ball: Predicting Web Trends for Web Services Librarians
by: Ovadia, Steven
Published: (2008)
by: Ovadia, Steven
Published: (2008)
Exploring the Potential of Twitter as a Research Tool
by: Ovadia, Steven
Published: (2009)
by: Ovadia, Steven
Published: (2009)
Quora.com: Another Place for Users to Ask Questions
by: Ovadia, Steven
Published: (2011)
by: Ovadia, Steven
Published: (2011)
The Viability of Google Wave as an Online Collaboration Tool
by: Ovadia, Steven
Published: (2010)
by: Ovadia, Steven
Published: (2010)
The Role of Big Data in the Social Sciences
by: Ovadia, Steven
Published: (2013)
by: Ovadia, Steven
Published: (2013)
Chemical Composition Regulation for Tuning the Luminescent Properties of Organic–Inorganic Metal Halides
by: Zhizhuan Zhang, et al.
Published: (2025)
by: Zhizhuan Zhang, et al.
Published: (2025)
Brandt's vole (Lasiopodomys brandtii) affects the dominant position of three gramineous species by altering defense traits and interspecific competition
by: Yanjin Xie, et al.
Published: (2024)
by: Yanjin Xie, et al.
Published: (2024)
Training a Distributed Acoustic Sensing Traffic Monitoring Network With Video Inputs
by: Cohen, Khen, et al.
Published: (2024)
by: Cohen, Khen, et al.
Published: (2024)
Injury screening for young competitive female gymnasts: A 2‐year follow‐up
by: Nili Steinberg, et al.
Published: (2026)
by: Nili Steinberg, et al.
Published: (2026)
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
by: Lu, Yifan, et al.
Published: (2025)
by: Lu, Yifan, et al.
Published: (2025)
Mixture of Routers
by: Zhang, Jia-Chen, et al.
Published: (2025)
by: Zhang, Jia-Chen, et al.
Published: (2025)
Similar Items
-
98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026) -
When to Reason: Semantic Router for vLLM
by: Wang, Chen, et al.
Published: (2025) -
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026) -
Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems
by: Liu, Xunzhuo, et al.
Published: (2026) -
Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)