Saved in:
| Main Authors: | Shahout, Rana, Tirmazi, Hayder, Yu, Minlan, Mitzenmacher, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.13605 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Queueing, Predictions, and LLMs: Challenges and Open Problems
by: Mitzenmacher, Michael, et al.
Published: (2025)
by: Mitzenmacher, Michael, et al.
Published: (2025)
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
by: Shahout, Rana, et al.
Published: (2025)
by: Shahout, Rana, et al.
Published: (2025)
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
by: Hankendi, Can, et al.
Published: (2026)
by: Hankendi, Can, et al.
Published: (2026)
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Intra-request branch orchestration for efficient LLM reasoning
by: Jiang, Weifan, et al.
Published: (2025)
by: Jiang, Weifan, et al.
Published: (2025)
Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
SkipPredict: When to Invest in Predictions for Scheduling
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs
by: McMurray, Chelsea, et al.
Published: (2025)
by: McMurray, Chelsea, et al.
Published: (2025)
Don't Stop Me Now: Embedding Based Scheduling for LLMs
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
Learning-Augmented Frequency Estimation in Sliding Windows
by: Shahout, Rana, et al.
Published: (2024)
by: Shahout, Rana, et al.
Published: (2024)
DRCY: Agentic Hardware Design Reviews
by: Dumont, Kyle, et al.
Published: (2026)
by: Dumont, Kyle, et al.
Published: (2026)
Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
by: Brown, Katrina, et al.
Published: (2026)
by: Brown, Katrina, et al.
Published: (2026)
All Proof of Work But No Proof of Play
by: Tirmazi, Hayder
Published: (2025)
by: Tirmazi, Hayder
Published: (2025)
Random Number Generation from Pulsars
by: Tirmazi, Hayder
Published: (2025)
by: Tirmazi, Hayder
Published: (2025)
LSM Trees in Adversarial Environments
by: Tirmazi, Hayder
Published: (2025)
by: Tirmazi, Hayder
Published: (2025)
Adversarially Robust Bloom Filters: Privacy, Reductions, and Open Problems
by: Tirmazi, Hayder
Published: (2025)
by: Tirmazi, Hayder
Published: (2025)
THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
by: Li, Minghao, et al.
Published: (2023)
by: Li, Minghao, et al.
Published: (2023)
An LLM-based Agentic Framework for Accessible Network Control
by: Lin, Samuel, et al.
Published: (2025)
by: Lin, Samuel, et al.
Published: (2025)
Computational Complexity of Game Boy Games
by: Tirmazi, Hayder, et al.
Published: (2024)
by: Tirmazi, Hayder, et al.
Published: (2024)
MIRIX: Multi-Agent Memory System for LLM-Based Agents
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
by: Kamahori, Keisuke, et al.
Published: (2026)
by: Kamahori, Keisuke, et al.
Published: (2026)
Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
by: Feng, Weiqi, et al.
Published: (2024)
by: Feng, Weiqi, et al.
Published: (2024)
GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems
by: Rana, Ashish, et al.
Published: (2024)
by: Rana, Ashish, et al.
Published: (2024)
Parallel Context Compaction for Long-Horizon LLM Agent Serving
by: Cim, Musa, et al.
Published: (2026)
by: Cim, Musa, et al.
Published: (2026)
Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving
by: Huan, Chengying, et al.
Published: (2025)
by: Huan, Chengying, et al.
Published: (2025)
PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems
by: Bhatti, Amit Singh, et al.
Published: (2026)
by: Bhatti, Amit Singh, et al.
Published: (2026)
ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
by: Xiang, Xingyu, et al.
Published: (2025)
by: Xiang, Xingyu, et al.
Published: (2025)
Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling
by: Rana, Annu, et al.
Published: (2025)
by: Rana, Annu, et al.
Published: (2025)
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
by: Huang, Jen-tse, et al.
Published: (2024)
by: Huang, Jen-tse, et al.
Published: (2024)
Federated Learning Clients Clustering with Adaptation to Data Drifts
by: Li, Minghao, et al.
Published: (2024)
by: Li, Minghao, et al.
Published: (2024)
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
by: Jiang, Xuanlin, et al.
Published: (2024)
by: Jiang, Xuanlin, et al.
Published: (2024)
$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025)
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025)
Insight Agents: An LLM-Based Multi-Agent System for Data Insights
by: Bai, Jincheng, et al.
Published: (2026)
by: Bai, Jincheng, et al.
Published: (2026)
TinyServe: Query-Aware Cache Selection for Efficient LLM Serving
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity
by: Yang, Yingxuan, et al.
Published: (2026)
by: Yang, Yingxuan, et al.
Published: (2026)
Adversary Resilient Learned Bloom Filters
by: Almashaqbeh, Ghada, et al.
Published: (2024)
by: Almashaqbeh, Ghada, et al.
Published: (2024)
GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving
by: Jayakody, Shakya, et al.
Published: (2026)
by: Jayakody, Shakya, et al.
Published: (2026)
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
by: Luo, Michael, et al.
Published: (2025)
by: Luo, Michael, et al.
Published: (2025)
Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
by: Nie, Xiaohang, et al.
Published: (2026)
by: Nie, Xiaohang, et al.
Published: (2026)
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Similar Items
-
Queueing, Predictions, and LLMs: Challenges and Open Problems
by: Mitzenmacher, Michael, et al.
Published: (2025) -
From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
by: Shahout, Rana, et al.
Published: (2025) -
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
by: Hankendi, Can, et al.
Published: (2026) -
Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024) -
Intra-request branch orchestration for efficient LLM reasoning
by: Jiang, Weifan, et al.
Published: (2025)