Saved in:
| Main Authors: | Chen, Huamin, Liu, Xunzhuo, He, Bowei, Liu, Xue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.27299 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Adaptive Vision-Language Model Routing for Computer Use Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
Dual-Pool Token-Budget Routing for Cost-Efficient and Reliable LLM Serving
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Visual Confused Deputy: Exploiting and Defending Perception Failures in Computer-Using Agents
by: Liu, Xunzhuo, et al.
Published: (2026)
by: Liu, Xunzhuo, et al.
Published: (2026)
Category-Aware Semantic Caching for Heterogeneous LLM Workloads
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption
by: Chen, Yankai, et al.
Published: (2026)
by: Chen, Yankai, et al.
Published: (2026)
From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents
by: Wang, Yuan, et al.
Published: (2025)
by: Wang, Yuan, et al.
Published: (2025)
Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR
by: Min, Zijun, et al.
Published: (2026)
by: Min, Zijun, et al.
Published: (2026)
InferF: Declarative Factorization of AI/ML Inferences over Joins
by: Chowdhury, Kanchan, et al.
Published: (2025)
by: Chowdhury, Kanchan, et al.
Published: (2025)
RoboNeuron: A Middle-Layer Infrastructure for Agent-Driven Orchestration in Embodied AI
by: Guan, Weifan, et al.
Published: (2025)
by: Guan, Weifan, et al.
Published: (2025)
Group-in-Group Policy Optimization for LLM Agent Training
by: Feng, Lang, et al.
Published: (2025)
by: Feng, Lang, et al.
Published: (2025)
DASH: Input-Aware Dynamic Layer Skipping for Efficient LLM Inference with Markov Decision Policies
by: Yang, Ning, et al.
Published: (2025)
by: Yang, Ning, et al.
Published: (2025)
Multi-Agent Reinforcement Learning for Adaptive Resource Orchestration in Cloud-Native Clusters
by: Yao, Guanzi, et al.
Published: (2025)
by: Yao, Guanzi, et al.
Published: (2025)
From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos
by: Wang, Xun, et al.
Published: (2025)
by: Wang, Xun, et al.
Published: (2025)
Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization
by: Yang, Yonghan, et al.
Published: (2026)
by: Yang, Yonghan, et al.
Published: (2026)
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
by: Chen, Hongxu, et al.
Published: (2024)
by: Chen, Hongxu, et al.
Published: (2024)
Offline Imitation Learning with Variational Counterfactual Reasoning
by: He, Bowei, et al.
Published: (2023)
by: He, Bowei, et al.
Published: (2023)
Orchestration Framework for Financial Agents: From Algorithmic Trading to Agentic Trading
by: Li, Jifeng, et al.
Published: (2025)
by: Li, Jifeng, et al.
Published: (2025)
Learning to Orchestrate Agents under Uncertainty
by: Oliver, Mary Chriselda Antony, et al.
Published: (2026)
by: Oliver, Mary Chriselda Antony, et al.
Published: (2026)
HLL: Can Agents Cross Humanity's Last Line of Verification?
by: Song, Xinhao, et al.
Published: (2026)
by: Song, Xinhao, et al.
Published: (2026)
AutoBencher: Towards Declarative Benchmark Construction
by: Li, Xiang Lisa, et al.
Published: (2024)
by: Li, Xiang Lisa, et al.
Published: (2024)
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
by: Liu, Weidong, et al.
Published: (2023)
by: Liu, Weidong, et al.
Published: (2023)
Towards Sharper Risk Bounds for Minimax Problems
by: Zhu, Bowei, et al.
Published: (2024)
by: Zhu, Bowei, et al.
Published: (2024)
Statistical Inference for Responsiveness Verification
by: Cheon, Seung Hyun, et al.
Published: (2025)
by: Cheon, Seung Hyun, et al.
Published: (2025)
A Multi-Agent, Policy-Gradient approach to Network Routing
by: Tao, Nigel, et al.
Published: (2025)
by: Tao, Nigel, et al.
Published: (2025)
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
by: Bao, Rui, et al.
Published: (2025)
by: Bao, Rui, et al.
Published: (2025)
Learning to Orchestrate Agents in Natural Language with the Conductor
by: Nielsen, Stefan, et al.
Published: (2025)
by: Nielsen, Stefan, et al.
Published: (2025)
Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents
by: He, Pengfei, et al.
Published: (2026)
by: He, Pengfei, et al.
Published: (2026)
Towards Generalizable Neural Solvers for Vehicle Routing Problems via Ensemble with Transferrable Local Policy
by: Gao, Chengrui, et al.
Published: (2023)
by: Gao, Chengrui, et al.
Published: (2023)
LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers
by: Yu, Enda, et al.
Published: (2025)
by: Yu, Enda, et al.
Published: (2025)
Neural Paging: Learning Context Management Policies for Turing-Complete Agents
by: Chen, Liang, et al.
Published: (2026)
by: Chen, Liang, et al.
Published: (2026)
Similar Items
-
Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL
by: Liu, Xunzhuo, et al.
Published: (2026) -
Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
by: Chen, Huamin, et al.
Published: (2026) -
Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
by: Chen, Huamin, et al.
Published: (2026) -
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026) -
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism
by: Chen, Huamin, et al.
Published: (2026)