:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Shahout, Rana, Tirmazi, Hayder, Yu, Minlan, Mitzenmacher, Michael
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.13605
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Queueing, Predictions, and LLMs: Challenges and Open Problems
by: Mitzenmacher, Michael, et al.
Published: (2025)

From Score Distributions to Balance: Plug-and-Play Mixture-of-Experts Routing
by: Shahout, Rana, et al.
Published: (2025)

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
by: Hankendi, Can, et al.
Published: (2026)

Fast Inference for Augmented Large Language Models
by: Shahout, Rana, et al.
Published: (2024)

Intra-request branch orchestration for efficient LLM reasoning
by: Jiang, Weifan, et al.
Published: (2025)

Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams
by: Shahout, Rana, et al.
Published: (2024)

SkipPredict: When to Invest in Predictions for Scheduling
by: Shahout, Rana, et al.
Published: (2024)

Whistledown: Combining User-Level Privacy with Conversational Coherence in LLMs
by: McMurray, Chelsea, et al.
Published: (2025)

Don't Stop Me Now: Embedding Based Scheduling for LLMs
by: Shahout, Rana, et al.
Published: (2024)

Learning-Augmented Frequency Estimation in Sliding Windows
by: Shahout, Rana, et al.
Published: (2024)

DRCY: Agentic Hardware Design Reviews
by: Dumont, Kyle, et al.
Published: (2026)

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
by: Brown, Katrina, et al.
Published: (2026)

All Proof of Work But No Proof of Play
by: Tirmazi, Hayder
Published: (2025)

Random Number Generation from Pulsars
by: Tirmazi, Hayder
Published: (2025)

LSM Trees in Adversarial Environments
by: Tirmazi, Hayder
Published: (2025)

Adversarially Robust Bloom Filters: Privacy, Reductions, and Open Problems
by: Tirmazi, Hayder
Published: (2025)

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression
by: Li, Minghao, et al.
Published: (2023)

An LLM-based Agentic Framework for Accessible Network Control
by: Lin, Samuel, et al.
Published: (2025)

Computational Complexity of Game Boy Games
by: Tirmazi, Hayder, et al.
Published: (2024)

MIRIX: Multi-Agent Memory System for LLM-Based Agents
by: Wang, Yu, et al.
Published: (2025)

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
by: Kamahori, Keisuke, et al.
Published: (2026)

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation
by: Feng, Weiqi, et al.
Published: (2024)

GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems
by: Rana, Ashish, et al.
Published: (2024)

Parallel Context Compaction for Long-Horizon LLM Agent Serving
by: Cim, Musa, et al.
Published: (2026)

Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving
by: Huan, Chengying, et al.
Published: (2025)

PROTEUS: SLA-Aware Routing via Lagrangian RL for Multi-LLM Serving Systems
by: Bhatti, Amit Singh, et al.
Published: (2026)

ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
by: Xiang, Xingyu, et al.
Published: (2025)

Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling
by: Rana, Annu, et al.
Published: (2025)

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents
by: Huang, Jen-tse, et al.
Published: (2024)

Federated Learning Clients Clustering with Adaptation to Data Drifts
by: Li, Minghao, et al.
Published: (2024)

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
by: Jiang, Xuanlin, et al.
Published: (2024)

$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2025)

Insight Agents: An LLM-Based Multi-Agent System for Data Insights
by: Bai, Jincheng, et al.
Published: (2026)

TinyServe: Query-Aware Cache Selection for Efficient LLM Serving
by: Liu, Dong, et al.
Published: (2025)

Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity
by: Yang, Yingxuan, et al.
Published: (2026)

Adversary Resilient Learned Bloom Filters
by: Almashaqbeh, Ghada, et al.
Published: (2024)

GhostServe: A Lightweight Checkpointing System in the Shadow for Fault-Tolerant LLM Serving
by: Jayakody, Shakya, et al.
Published: (2026)

Autellix: An Efficient Serving Engine for LLM Agents as General Programs
by: Luo, Michael, et al.
Published: (2025)

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web
by: Nie, Xiaohang, et al.
Published: (2026)

LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
by: Li, Haoyang, et al.
Published: (2025)