:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Feng, Qin, Ruoyu, Ma, Teng, Cai, Shangming, Liu, Zheng, Lei, Chao, Zhu, Dejiang, Yang, Ke, Li, Zheming, Cui, Jialei, Huang, Weixiao, Zhao, Yikai, Zhang, Yineng, Wu, Hao, Gao, Xiang, Fu, Yuhao, Jiang, Jinlei, Wu, Yongwei, Zhang, Mingxing
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2604.00368
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
by: Qin, Ruoyu, et al.
Published: (2024)

Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation
by: Chen, Shaoyuan, et al.
Published: (2024)

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
by: Qin, Ruoyu, et al.
Published: (2025)

Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving
by: Li, Zongze, et al.
Published: (2026)

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter
by: Qin, Ruoyu, et al.
Published: (2026)

Efficient Graph-Based Approximate Nearest Neighbor Search Achieving: Low Latency Without Throughput Loss
by: Luo, Jingjia, et al.
Published: (2025)

HydraInfer: Hybrid Disaggregated Scheduling for Multimodal Large Language Model Serving
by: Dong, Xianzhe, et al.
Published: (2025)

Physical parameter regression from black hole images via a multiscale adaptive neural network
by: Wei, Jialei, et al.
Published: (2025)

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
by: Ye, Zihao, et al.
Published: (2025)

DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM Serving
by: Ruan, Chaoyi, et al.
Published: (2025)

P/D-Serve: Serving Disaggregated Large Language Model at Scale
by: Jin, Yibo, et al.
Published: (2024)

BestServe: Serving Strategies with Optimal Goodput in Collocation and Disaggregation Architectures
by: Hu, Xiannan, et al.
Published: (2025)

TrEnv-X: Transparently Share Serverless Execution Environments Across Different Functions and Nodes
by: Huang, Jialiang, et al.
Published: (2025)

SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM
by: Tian, Yuhao, et al.
Published: (2025)

Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
by: Cheng, Ke, et al.
Published: (2024)

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
by: Zhong, Yinmin, et al.
Published: (2024)

Trinity: Disaggregating Vector Search from Prefill-Decode Disaggregation in LLM Serving
by: Liu, Yi, et al.
Published: (2025)

ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving
by: Qiu, Haoran, et al.
Published: (2025)

End Khovanov homology and exotic Lagrangian planes
by: Teng, Yikai
Published: (2025)

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
by: Zhu, Ruidong, et al.
Published: (2025)

Efficiently Serving Large Multimodal Models Using EPD Disaggregation
by: Singh, Gursimran, et al.
Published: (2024)

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: He, Yiyuan, et al.
Published: (2025)

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: Yiyuan He, et al.
Published: (2026)

Uncertainty Quantification and Flow Dynamics in Rotating Detonation Engines
by: Kumar, Vinay, et al.
Published: (2025)

semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage
by: Hong, Ke, et al.
Published: (2025)

GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
by: Shi, Tianyao, et al.
Published: (2024)

MCN-CL: Multimodal Cross-Attention Network and Contrastive Learning for Multimodal Emotion Recognition
by: Li, Feng, et al.
Published: (2025)

MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
by: Hu, Cunchen, et al.
Published: (2024)

StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving
by: Kumar, Satyam, et al.
Published: (2026)

TENT5/FAM46: An Enigmatic Family of Secretory Tuners
by: Daniel Lacidogna, et al.
Published: (2025)

The Effects of Circadian Rhythms and Exercise Preconditioning on Cardiac Troponin T Levels Following Graded Exercise
by: Jinlei Nie, et al.
Published: (2025)

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving
by: Wu, Hanjiang, et al.
Published: (2026)

Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
by: Sun, Xun, et al.
Published: (2026)

EPD-Serve: A Flexible Multimodal EPD Disaggregation Inference Serving System On Ascend
by: Bai, Fan, et al.
Published: (2026)

Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture
by: Wu, Yu, et al.
Published: (2025)

TokenScale: Timely and Accurate Autoscaling for Disaggregated LLM Serving with Token Velocity
by: Lai, Ruiqi, et al.
Published: (2025)

OBSERVATIONS ON TENT-USING IN THE CAROLLINE BAT RHINOPYLLA PUMILIO IN SOUTHEASTERN BRAZIL
by: Zortéa, Marlon
Published: (1995)

Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving
by: Ding, Jianru, et al.
Published: (2026)

Efficient Multi-round LLM Inference over Disaggregated Serving
by: He, Wenhao, et al.
Published: (2026)

Recursive Offloading for LLM Serving in Multi-tier Networks
by: Wu, Zhiyuan, et al.
Published: (2025)