:: Library Catalog

Imagem da capa

Na minha lista:

Detalhes bibliográficos
Principais autores:	Tian, Jian, Li, Shuailong, Cao, Yang, Cui, Wenbo, Zhu, Minghan, Wu, Wenkang, Zhang, Jianming, Wang, Yanpeng, Xiao, Zhiwen, Hou, Zhenyu, Shen, Dou
Formato:	Preprint
Publicado em:	2025
Assuntos:	Distributed, Parallel, and Cluster Computing Machine Learning
Acesso em linha:	https://arxiv.org/abs/2512.16134
Tags:	Adicionar Tag Sem tags, seja o primeiro a adicionar uma tag!

Registros relacionados

Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
por: Pang, Bowen, et al.
Publicado em: (2025)

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
por: Chen, Qiaoling, et al.
Publicado em: (2026)

Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference
por: Xu, Yaodan, et al.
Publicado em: (2025)

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
por: Zheng, Zhen, et al.
Publicado em: (2024)

HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
por: Chen, Jiabin, et al.
Publicado em: (2024)

Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
por: Cheng, Ke, et al.
Publicado em: (2024)

Batch-Schedule-Execute: On Optimizing Concurrent Deterministic Scheduling for Blockchains (Extended Version)
por: Hay, Yaron, et al.
Publicado em: (2024)

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
por: Xu, Tairan, et al.
Publicado em: (2025)

ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge Environments
por: Zhou, Ao, et al.
Publicado em: (2025)

Multi-Bin Batching for Increasing LLM Inference Throughput
por: Guldogan, Ozgur, et al.
Publicado em: (2024)

On the Efficiency of Dynamic Transaction Scheduling in Blockchain Sharding
por: Adhikari, Ramesh, et al.
Publicado em: (2025)

FairBatching: Fairness-Aware Batch Formation for LLM Inference
por: Lyu, Hongtao, et al.
Publicado em: (2025)

AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
por: Li, Siyuan, et al.
Publicado em: (2024)

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
por: Hidayetoglu, Mert, et al.
Publicado em: (2025)

Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling
por: Li, Yan, et al.
Publicado em: (2025)

TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference
por: Zhang, Hongbin, et al.
Publicado em: (2025)

SLO-Aware Scheduling for Large Language Model Inferences
por: Huang, Jinqi, et al.
Publicado em: (2025)

Towards Energy Efficient Co-Scheduling in HPC
por: Zheng, Zhong, et al.
Publicado em: (2026)

A HPC Co-Scheduler with Reinforcement Learning
por: Souza, Abel, et al.
Publicado em: (2024)

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
por: Sakip, Akhmed, et al.
Publicado em: (2026)

Round-optimal $n$-Block Broadcast Schedules in Logarithmic Time
por: Träff, Jesper Larsson
Publicado em: (2023)

Arrow: Adaptive Scheduling Mechanisms for Disaggregated LLM Inference Architecture
por: Wu, Yu, et al.
Publicado em: (2025)

Argus: Token Aware Distributed LLM Inference Optimization
por: Wu, Panlong, et al.
Publicado em: (2025)

Towards Fast Setup and High Throughput of GPU Serverless Computing
por: Zhao, Han, et al.
Publicado em: (2024)

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
por: Chen, Lequn, et al.
Publicado em: (2023)

Taming Request Imbalance: SLO-Aware Scheduling for Disaggregated LLM Inference
por: Wang, Qipeng
Publicado em: (2026)

SLICE: SLO-Driven Scheduling for LLM Inference on Edge Computing Devices
por: Chow, Will
Publicado em: (2025)

Research on fault diagnosis and root cause analysis based on full stack observability
por: Hou, Jian
Publicado em: (2025)

Scaling Up Throughput-oriented LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context Management
por: Phung, Thanh Son, et al.
Publicado em: (2025)

Constraint Programming Models For Serial Batch Scheduling With Minimum Batch Size
por: Huertas, Jorge A., et al.
Publicado em: (2025)

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices
por: Lin, Zheng, et al.
Publicado em: (2024)

Adaptive Heuristics for Scheduling DNN Inferencing on Edge and Cloud for Personalized UAV Fleets
por: Raj, Suman, et al.
Publicado em: (2024)

SneakPeek: Data-Aware Model Selection and Scheduling for Inference Serving on the Edge
por: Wolfrath, Joel, et al.
Publicado em: (2025)

DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs
por: Babaei, Amir Fakhim, et al.
Publicado em: (2025)

Orchestrating Joint Offloading and Scheduling for Low-Latency Edge SLAM
por: Zhang, Yao, et al.
Publicado em: (2025)

CoCoI: Distributed Coded Inference System for Straggler Mitigation
por: Liu, Xing, et al.
Publicado em: (2025)

Designing Co-operation in Systems of Hierarchical, Multi-objective Schedulers for Stream Processing
por: Dangwal, Animesh, et al.
Publicado em: (2025)

AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems
por: Wang, Lehao, et al.
Publicado em: (2024)

KV Cache Compression for Inference Efficiency in LLMs: A Review
por: Liu, Yanyu, et al.
Publicado em: (2025)

Night-Window Batching versus Carbon-Aware Scheduling for Clinical AI GPU Workloads
por: Doshi, Nishi, et al.
Publicado em: (2026)