Saved in:
| Main Authors: | Liu, Xunzhuo, Wu, Hao, Chen, Huamin, He, Bowei, Liu, Xue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.18174 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
When to Reason: Semantic Router for vLLM
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
by: Lu, Yifan, et al.
Published: (2025)
by: Lu, Yifan, et al.
Published: (2025)
Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
by: Chen, Yankai, et al.
Published: (2026)
by: Chen, Yankai, et al.
Published: (2026)
Punctuation and Predicates in Language Models
by: Chauhan, Sonakshi, et al.
Published: (2025)
by: Chauhan, Sonakshi, et al.
Published: (2025)
Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations
by: Askin, Baris, et al.
Published: (2026)
by: Askin, Baris, et al.
Published: (2026)
DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton
by: Sun, Yiyou, et al.
Published: (2024)
by: Sun, Yiyou, et al.
Published: (2024)
Routers in Vision Mixture of Experts: An Empirical Study
by: Liu, Tianlin, et al.
Published: (2024)
by: Liu, Tianlin, et al.
Published: (2024)
Eagle: Efficient Training-Free Router for Multi-LLM Inference
by: Zhao, Zesen, et al.
Published: (2024)
by: Zhao, Zesen, et al.
Published: (2024)
A Free Probabilistic Framework for Analyzing the Transformer-based Language Models
by: Das, Swagatam
Published: (2025)
by: Das, Swagatam
Published: (2025)
Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization
by: Yang, Yonghan, et al.
Published: (2026)
by: Yang, Yonghan, et al.
Published: (2026)
RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
by: Chen, Shuhao, et al.
Published: (2024)
by: Chen, Shuhao, et al.
Published: (2024)
Semantic Probabilistic Control of Language Models
by: Ahmed, Kareem, et al.
Published: (2025)
by: Ahmed, Kareem, et al.
Published: (2025)
SinkRouter: Sink-Aware Routing for Efficient Long-Context Decoding in Large Language and Multimodal Models
by: Liu, Junnan, et al.
Published: (2026)
by: Liu, Junnan, et al.
Published: (2026)
Free-Rider and Conflict Aware Collaboration Formation for Cross-Silo Federated Learning
by: Chen, Mengmeng, et al.
Published: (2024)
by: Chen, Mengmeng, et al.
Published: (2024)
Learning Context-Conditioned Predicate Semantics via Prototype Feedback
by: Jung, NamGyu, et al.
Published: (2026)
by: Jung, NamGyu, et al.
Published: (2026)
Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models
by: Tian, Bowei, et al.
Published: (2025)
by: Tian, Bowei, et al.
Published: (2025)
MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis
by: Anupam, Sagnik, et al.
Published: (2024)
by: Anupam, Sagnik, et al.
Published: (2024)
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
by: Liang, Yichao, et al.
Published: (2024)
by: Liang, Yichao, et al.
Published: (2024)
DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling
by: Sahyouni, Bucher, et al.
Published: (2026)
by: Sahyouni, Bucher, et al.
Published: (2026)
Using (Not-so) Large Language Models to Generate Simulation Models in a Formal DSL: A Study on Reaction Networks
by: Kreikemeyer, Justin N., et al.
Published: (2025)
by: Kreikemeyer, Justin N., et al.
Published: (2025)
Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing
by: Zhang, Yichi, et al.
Published: (2025)
by: Zhang, Yichi, et al.
Published: (2025)
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)
by: Xie, Yanyue, et al.
Published: (2024)
Rankability-enhanced Revenue Uplift Modeling Framework for Online Marketing
by: He, Bowei, et al.
Published: (2024)
by: He, Bowei, et al.
Published: (2024)
Offline Imitation Learning with Variational Counterfactual Reasoning
by: He, Bowei, et al.
Published: (2023)
by: He, Bowei, et al.
Published: (2023)
ICL-Router: In-Context Learned Model Representations for LLM Routing
by: Wang, Chenxu, et al.
Published: (2025)
by: Wang, Chenxu, et al.
Published: (2025)
VL-RouterBench: A Benchmark for Vision-Language Model Routing
by: Huang, Zhehao, et al.
Published: (2025)
by: Huang, Zhehao, et al.
Published: (2025)
Similar Items
-
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026) -
From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
by: Chen, Huamin, et al.
Published: (2026) -
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026) -
98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026) -
Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)