:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Xunzhuo, Wu, Hao, Chen, Huamin, He, Bowei, Liu, Xue
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.18174
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

From Inference Routing to Agent Orchestration: Declarative Policy Compilation with Cross-Layer Verification
by: Chen, Huamin, et al.
Published: (2026)

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)

98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026)

Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems
by: Liu, Xunzhuo, et al.
Published: (2026)

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

When to Reason: Semantic Router for vLLM
by: Wang, Chen, et al.
Published: (2025)

Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)

Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers
by: Lu, Yifan, et al.
Published: (2025)

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
by: Chen, Yankai, et al.
Published: (2026)

Punctuation and Predicates in Language Models
by: Chauhan, Sonakshi, et al.
Published: (2025)

Federate the Router: Learning Language Model Routers with Sparse and Decentralized Evaluations
by: Askin, Baris, et al.
Published: (2026)

DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton
by: Sun, Yiyou, et al.
Published: (2024)

Routers in Vision Mixture of Experts: An Empirical Study
by: Liu, Tianlin, et al.
Published: (2024)

Eagle: Efficient Training-Free Router for Multi-LLM Inference
by: Zhao, Zesen, et al.
Published: (2024)

A Free Probabilistic Framework for Analyzing the Transformer-based Language Models
by: Das, Swagatam
Published: (2025)

Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization
by: Yang, Yonghan, et al.
Published: (2026)

RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
by: Chen, Shuhao, et al.
Published: (2024)

Semantic Probabilistic Control of Language Models
by: Ahmed, Kareem, et al.
Published: (2025)

SinkRouter: Sink-Aware Routing for Efficient Long-Context Decoding in Large Language and Multimodal Models
by: Liu, Junnan, et al.
Published: (2026)

Free-Rider and Conflict Aware Collaboration Formation for Cross-Silo Federated Learning
by: Chen, Mengmeng, et al.
Published: (2024)

Learning Context-Conditioned Predicate Semantics via Prototype Feedback
by: Jung, NamGyu, et al.
Published: (2026)

Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models
by: Tian, Bowei, et al.
Published: (2025)

MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis
by: Anupam, Sagnik, et al.
Published: (2024)

VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
by: Liang, Yichao, et al.
Published: (2024)

DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling
by: Sahyouni, Bucher, et al.
Published: (2026)

Using (Not-so) Large Language Models to Generate Simulation Models in a Formal DSL: A Study on Reaction Networks
by: Kreikemeyer, Justin N., et al.
Published: (2025)

Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing
by: Zhang, Yichi, et al.
Published: (2025)

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)

Rankability-enhanced Revenue Uplift Modeling Framework for Online Marketing
by: He, Bowei, et al.
Published: (2024)

Offline Imitation Learning with Variational Counterfactual Reasoning
by: He, Bowei, et al.
Published: (2023)

ICL-Router: In-Context Learned Model Representations for LLM Routing
by: Wang, Chenxu, et al.
Published: (2025)

VL-RouterBench: A Benchmark for Vision-Language Model Routing
by: Huang, Zhehao, et al.
Published: (2025)