:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Huamin, Liu, Xunzhuo, He, Bowei, Liu, Xue
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.27299
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL
by: Liu, Xunzhuo, et al.
Published: (2026)

Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)

FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)

Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems
by: Liu, Xunzhuo, et al.
Published: (2026)

Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)

Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)

inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)

98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026)

Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents
by: Liu, Xunzhuo, et al.
Published: (2026)

Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
by: Chen, Yankai, et al.
Published: (2026)

From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents
by: Wang, Yuan, et al.
Published: (2025)

Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR
by: Min, Zijun, et al.
Published: (2026)

InferF: Declarative Factorization of AI/ML Inferences over Joins
by: Chowdhury, Kanchan, et al.
Published: (2025)

RoboNeuron: A Middle-Layer Infrastructure for Agent-Driven Orchestration in Embodied AI
by: Guan, Weifan, et al.
Published: (2025)

Group-in-Group Policy Optimization for LLM Agent Training
by: Feng, Lang, et al.
Published: (2025)

DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
by: Yang, Ning, et al.
Published: (2025)

Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters
by: Yao, Guanzi, et al.
Published: (2025)

From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos
by: Wang, Xun, et al.
Published: (2025)

Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization
by: Yang, Yonghan, et al.
Published: (2026)

IterIS: Iterative Inference-Solving Alignment for LoRA Merging
by: Chen, Hongxu, et al.
Published: (2024)

Offline Imitation Learning with Variational Counterfactual Reasoning
by: He, Bowei, et al.
Published: (2023)

Orchestration Framework for Financial Agents: From Algorithmic Trading to Agentic Trading
by: Li, Jifeng, et al.
Published: (2025)

Learning to Orchestrate Agents under Uncertainty
by: Oliver, Mary Chriselda Antony, et al.
Published: (2026)

HLL: Can Agents Cross Humanity's Last Line of Verification?
by: Song, Xinhao, et al.
Published: (2026)

AutoBencher: Towards Declarative Benchmark Construction
by: Li, Xiang Lisa, et al.
Published: (2024)

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
by: Liu, Weidong, et al.
Published: (2023)

Towards Sharper Risk Bounds for Minimax Problems
by: Zhu, Bowei, et al.
Published: (2024)

Statistical Inference for Responsiveness Verification
by: Cheon, Seung Hyun, et al.
Published: (2025)

A Multi-Agent, Policy-Gradient approach to Network Routing
by: Tao, Nigel, et al.
Published: (2025)

Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
by: Bao, Rui, et al.
Published: (2025)

Learning to Orchestrate Agents in Natural Language with the Conductor
by: Nielsen, Stefan, et al.
Published: (2025)

Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents
by: He, Pengfei, et al.
Published: (2026)

Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy
by: Gao, Chengrui, et al.
Published: (2023)

LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers
by: Yu, Enda, et al.
Published: (2025)

Neural Paging: Learning Context Management Policies for Turing-Complete Agents
by: Chen, Liang, et al.
Published: (2026)