:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Jun, Yao, Yunxiang, Kuang, Wenwei, Mao, Runze, Sun, Zhenhao, Tao, Zhuang, Zhang, Ziyang, Li, Dengyu, Chen, Jiajun, Wang, Zhili, Cui, Kai, Cai, Congzhi, Lan, Longwen, Zhang, Ken
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2511.22481
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Prompt-Aware Scheduling for Low-Latency LLM Serving
by: Tao, Yiheng, et al.
Published: (2025)

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
by: Agrawal, Amey, et al.
Published: (2024)

SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding
by: Wang, Zhenglin, et al.
Published: (2024)

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
by: Zhuang, Jinming, et al.
Published: (2024)

ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput
by: Kim, Junsoo, et al.
Published: (2025)

Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
by: Dai, Yinwei, et al.
Published: (2023)

CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing
by: Yuan, Yitao, et al.
Published: (2025)

Causal Walk: Debiasing Multi-Hop Fact Verification with Front-Door Adjustment
by: Zhang, Congzhi, et al.
Published: (2024)

Vortex: Hosting ML Inference and Knowledge Retrieval Services With Tight Latency and Throughput Requirements
by: Yang, Yuting, et al.
Published: (2025)

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
by: Miao, Xupeng, et al.
Published: (2023)

Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
by: Cheng, Ke, et al.
Published: (2024)

A hybrid reconstruction of piece-wise smooth functions from non-uniform Fourier data
by: Song, Guohui, et al.
Published: (2026)

Accelerating Parallel Diffusion Model Serving with Residual Compression
by: Luo, Jiajun, et al.
Published: (2025)

Two-body interaction induced phase transitions and intermediate phases in nonreciprocal non-Hermitian quasicrystals
by: Zhang, Yalun, et al.
Published: (2024)

Development and clinical application of a high‐performance medical static computed tomography system
by: Haining Ding, et al.
Published: (2026)

Performance Analysis of uRLLC in scalable Cell-free Radio Access Network System
by: Zhang, Ziyang, et al.
Published: (2024)

An End‐to‐End Pillar Feature Based Neural Network Improved by Attention Modules for Object Detection of Autonomous Vehicles
by: Bin Zhang, et al.
Published: (2025)

GRF-based Predictive Flocking Control with Dynamic Pattern Formation
by: Yu, Chenghao, et al.
Published: (2024)

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
by: Ye, Zihao, et al.
Published: (2025)

Quantum geometry and geometric entanglement entropy of one-dimensional Floquet topological matter
by: Zhou, Longwen
Published: (2024)

Entanglement phase transitions in non-Hermitian Floquet systems
by: Zhou, Longwen
Published: (2023)

Entanglement phase transitions in non-Hermitian Kitaev chains
by: Zhou, Longwen
Published: (2024)

Topology and edge modes surviving criticality in non-Hermitian Floquet systems
by: Zhou, Longwen
Published: (2026)

Non-Abelian generalization of non-Hermitian quasicrystal: PT-symmetry breaking, localization, entanglement and topological transitions
by: Zhou, Longwen
Published: (2023)

Entanglement phase transitions in non-Hermitian quasicrystals
by: Zhou, Longwen
Published: (2023)

Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model
by: Wang, Tianyi, et al.
Published: (2026)

Incomplete Data Multi-Source Static Computed Tomography Reconstruction with Diffusion Priors and Implicit Neural Representation
by: Shen, Ziju, et al.
Published: (2025)

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
by: Cheng, Xize, et al.
Published: (2024)

SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
by: Fu, Yunxiang, et al.
Published: (2024)

Dumbo-NG: Fast Asynchronous BFT Consensus with Throughput-Oblivious Latency
by: Gao, Yingzi, et al.
Published: (2022)

Size conditions and spectral conditions for generalized factor-critical (bicritical) graphs and $k$-$d$-critical graphs
by: Zhang, Zhenhao, et al.
Published: (2026)

MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation
by: Wang, Weihang, et al.
Published: (2025)

Learning Efficient Flocking Control based on Gibbs Random Fields
by: Zhang, Dengyu, et al.
Published: (2025)

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?
by: Liang, Susan, et al.
Published: (2026)

Towards Pareto Optimal Throughput in Small Language Model Serving
by: Recasens, Pol G., et al.
Published: (2024)

Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading
by: Yu, Hanfei, et al.
Published: (2025)

Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput
by: Dai, Xiaohai, et al.
Published: (2025)

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
by: Zhu, Ruidong, et al.
Published: (2025)

Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment
by: Zhang, Congzhi, et al.
Published: (2024)

StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving
by: Kumar, Satyam, et al.
Published: (2026)