Saved in:
| Main Authors: | Wang, Jun, Yao, Yunxiang, Kuang, Wenwei, Mao, Runze, Sun, Zhenhao, Tao, Zhuang, Zhang, Ziyang, Li, Dengyu, Chen, Jiajun, Wang, Zhili, Cui, Kai, Cai, Congzhi, Lan, Longwen, Zhang, Ken |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.22481 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prompt-Aware Scheduling for Low-Latency LLM Serving
by: Tao, Yiheng, et al.
Published: (2025)
by: Tao, Yiheng, et al.
Published: (2025)
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
by: Agrawal, Amey, et al.
Published: (2024)
by: Agrawal, Amey, et al.
Published: (2024)
SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding
by: Wang, Zhenglin, et al.
Published: (2024)
by: Wang, Zhenglin, et al.
Published: (2024)
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
by: Zhuang, Jinming, et al.
Published: (2024)
by: Zhuang, Jinming, et al.
Published: (2024)
ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput
by: Kim, Junsoo, et al.
Published: (2025)
by: Kim, Junsoo, et al.
Published: (2025)
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
by: Dai, Yinwei, et al.
Published: (2023)
by: Dai, Yinwei, et al.
Published: (2023)
CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing
by: Yuan, Yitao, et al.
Published: (2025)
by: Yuan, Yitao, et al.
Published: (2025)
Causal Walk: Debiasing Multi-Hop Fact Verification with Front-Door Adjustment
by: Zhang, Congzhi, et al.
Published: (2024)
by: Zhang, Congzhi, et al.
Published: (2024)
Vortex: Hosting ML Inference and Knowledge Retrieval Services With Tight Latency and Throughput Requirements
by: Yang, Yuting, et al.
Published: (2025)
by: Yang, Yuting, et al.
Published: (2025)
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
by: Miao, Xupeng, et al.
Published: (2023)
by: Miao, Xupeng, et al.
Published: (2023)
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
by: Cheng, Ke, et al.
Published: (2024)
by: Cheng, Ke, et al.
Published: (2024)
A hybrid reconstruction of piece-wise smooth functions from non-uniform Fourier data
by: Song, Guohui, et al.
Published: (2026)
by: Song, Guohui, et al.
Published: (2026)
Accelerating Parallel Diffusion Model Serving with Residual Compression
by: Luo, Jiajun, et al.
Published: (2025)
by: Luo, Jiajun, et al.
Published: (2025)
Two-body interaction induced phase transitions and intermediate phases in nonreciprocal non-Hermitian quasicrystals
by: Zhang, Yalun, et al.
Published: (2024)
by: Zhang, Yalun, et al.
Published: (2024)
Development and clinical application of a high‐performance medical static computed tomography system
by: Haining Ding, et al.
Published: (2026)
by: Haining Ding, et al.
Published: (2026)
Performance Analysis of uRLLC in scalable Cell-free Radio Access Network System
by: Zhang, Ziyang, et al.
Published: (2024)
by: Zhang, Ziyang, et al.
Published: (2024)
An End‐to‐End Pillar Feature Based Neural Network Improved by Attention Modules for Object Detection of Autonomous Vehicles
by: Bin Zhang, et al.
Published: (2025)
by: Bin Zhang, et al.
Published: (2025)
GRF-based Predictive Flocking Control with Dynamic Pattern Formation
by: Yu, Chenghao, et al.
Published: (2024)
by: Yu, Chenghao, et al.
Published: (2024)
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
by: Ye, Zihao, et al.
Published: (2025)
by: Ye, Zihao, et al.
Published: (2025)
Quantum geometry and geometric entanglement entropy of one-dimensional Floquet topological matter
by: Zhou, Longwen
Published: (2024)
by: Zhou, Longwen
Published: (2024)
Entanglement phase transitions in non-Hermitian Floquet systems
by: Zhou, Longwen
Published: (2023)
by: Zhou, Longwen
Published: (2023)
Entanglement phase transitions in non-Hermitian Kitaev chains
by: Zhou, Longwen
Published: (2024)
by: Zhou, Longwen
Published: (2024)
Topology and edge modes surviving criticality in non-Hermitian Floquet systems
by: Zhou, Longwen
Published: (2026)
by: Zhou, Longwen
Published: (2026)
Non-Abelian generalization of non-Hermitian quasicrystal: PT-symmetry breaking, localization, entanglement and topological transitions
by: Zhou, Longwen
Published: (2023)
by: Zhou, Longwen
Published: (2023)
Entanglement phase transitions in non-Hermitian quasicrystals
by: Zhou, Longwen
Published: (2023)
by: Zhou, Longwen
Published: (2023)
Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model
by: Wang, Tianyi, et al.
Published: (2026)
by: Wang, Tianyi, et al.
Published: (2026)
Incomplete Data Multi-Source Static Computed Tomography Reconstruction with Diffusion Priors and Implicit Neural Representation
by: Shen, Ziju, et al.
Published: (2025)
by: Shen, Ziju, et al.
Published: (2025)
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
by: Cheng, Xize, et al.
Published: (2024)
by: Cheng, Xize, et al.
Published: (2024)
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
by: Fu, Yunxiang, et al.
Published: (2024)
by: Fu, Yunxiang, et al.
Published: (2024)
Dumbo-NG: Fast Asynchronous BFT Consensus with Throughput-Oblivious Latency
by: Gao, Yingzi, et al.
Published: (2022)
by: Gao, Yingzi, et al.
Published: (2022)
Size conditions and spectral conditions for generalized factor-critical (bicritical) graphs and $k$-$d$-critical graphs
by: Zhang, Zhenhao, et al.
Published: (2026)
by: Zhang, Zhenhao, et al.
Published: (2026)
MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation
by: Wang, Weihang, et al.
Published: (2025)
by: Wang, Weihang, et al.
Published: (2025)
Learning Efficient Flocking Control based on Gibbs Random Fields
by: Zhang, Dengyu, et al.
Published: (2025)
by: Zhang, Dengyu, et al.
Published: (2025)
Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?
by: Liang, Susan, et al.
Published: (2026)
by: Liang, Susan, et al.
Published: (2026)
Towards Pareto Optimal Throughput in Small Language Model Serving
by: Recasens, Pol G., et al.
Published: (2024)
by: Recasens, Pol G., et al.
Published: (2024)
Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading
by: Yu, Hanfei, et al.
Published: (2025)
by: Yu, Hanfei, et al.
Published: (2025)
Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput
by: Dai, Xiaohai, et al.
Published: (2025)
by: Dai, Xiaohai, et al.
Published: (2025)
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
by: Zhu, Ruidong, et al.
Published: (2025)
by: Zhu, Ruidong, et al.
Published: (2025)
Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment
by: Zhang, Congzhi, et al.
Published: (2024)
by: Zhang, Congzhi, et al.
Published: (2024)
StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving
by: Kumar, Satyam, et al.
Published: (2026)
by: Kumar, Satyam, et al.
Published: (2026)
Similar Items
-
Prompt-Aware Scheduling for Low-Latency LLM Serving
by: Tao, Yiheng, et al.
Published: (2025) -
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
by: Agrawal, Amey, et al.
Published: (2024) -
SEED: Accelerating Reasoning Tree Construction via Scheduled Speculative Decoding
by: Wang, Zhenglin, et al.
Published: (2024) -
SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
by: Zhuang, Jinming, et al.
Published: (2024) -
ADOR: A Design Exploration Framework for LLM Serving with Enhanced Latency and Throughput
by: Kim, Junsoo, et al.
Published: (2025)