:: Library Catalog

Imatge de la portada

Guardat en:

Dades bibliogràfiques
Autors principals:	Zheng, Zhen, Ji, Xin, Fang, Taosong, Zhou, Fanghao, Liu, Chuanjie, Peng, Gang
Format:	Preprint
Publicat:	2024
Matèries:	Computation and Language Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning
Accés en línia:	https://arxiv.org/abs/2412.03594
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Ítems similars

Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
per: Pang, Bowen, et al.
Publicat: (2025)

FairBatching: Fairness-Aware Batch Formation for LLM Inference
per: Lyu, Hongtao, et al.
Publicat: (2025)

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
per: Chen, Qiaoling, et al.
Publicat: (2026)

Multi-Bin Batching for Increasing LLM Inference Throughput
per: Guldogan, Ozgur, et al.
Publicat: (2024)

Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference
per: Tian, Jian, et al.
Publicat: (2025)

HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
per: Chen, Jiabin, et al.
Publicat: (2024)

AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System
per: Bai, Fengyao, et al.
Publicat: (2026)

AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
per: Li, Siyuan, et al.
Publicat: (2024)

Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference
per: Xu, Yaodan, et al.
Publicat: (2025)

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
per: Recasens, Pol G., et al.
Publicat: (2025)

PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference
per: Ning, Rui, et al.
Publicat: (2026)

BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving
per: Zheng, Wanyi, et al.
Publicat: (2025)

Batch Query Processing and Optimization for Agentic Workflows
per: Shen, Junyi, et al.
Publicat: (2025)

Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning
per: Chen, Hongyao, et al.
Publicat: (2025)

GPU-Accelerated Batch-Dynamic Subgraph Matching
per: Qiu, Linshan, et al.
Publicat: (2024)

Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs
per: Ekelund, Jonah, et al.
Publicat: (2025)

Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching
per: Peng, Haosong, et al.
Publicat: (2024)

Collaborative Batch Size Optimization for Federated Learning
per: Geimer, Arno, et al.
Publicat: (2025)

BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems
per: Duan, Tao, et al.
Publicat: (2025)

Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
per: Cheng, Ke, et al.
Publicat: (2024)

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
per: Xu, Tairan, et al.
Publicat: (2025)

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
per: Chen, Lequn, et al.
Publicat: (2023)

SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference
per: Zhao, Yihao, et al.
Publicat: (2025)

Are Your Epochs Too Epic? Batch Free Can Be Harmful
per: Kim, Daewoo, et al.
Publicat: (2024)

Batched DGEMMs for scientific codes running on long vector architectures
per: Banchelli, Fabio, et al.
Publicat: (2025)

Batch Denoising for AIGC Service Provisioning in Wireless Edge Networks
per: Xu, Jinghang, et al.
Publicat: (2025)

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
per: Sakip, Akhmed, et al.
Publicat: (2026)

Ark: Offchain Transaction Batching in Bitcoin
per: Keer, Pim, et al.
Publicat: (2026)

Constraint Programming Models For Serial Batch Scheduling With Minimum Batch Size
per: Huertas, Jorge A., et al.
Publicat: (2025)

Schedule-Level Shared-Prefix Reuse for LLM RL Training
per: Li, Pengbo, et al.
Publicat: (2026)

On Using Large-Batches in Federated Learning
per: Tyagi, Sahil
Publicat: (2025)

A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs
per: Kolker-Hicks, Elliot, et al.
Publicat: (2024)

Herring: Parallel Batch-Order-Fairness on DAG-based Blockchain Consensus
per: Putnik, Marko, et al.
Publicat: (2026)

Batch-Schedule-Execute: On Optimizing Concurrent Deterministic Scheduling for Blockchains (Extended Version)
per: Hay, Yaron, et al.
Publicat: (2024)

SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
per: Xu, Yaodan, et al.
Publicat: (2025)

Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
per: Barrak, Amine, et al.
Publicat: (2025)

Optimal Batch Allocation for Wireless Federated Learning
per: Song, Jaeyoung, et al.
Publicat: (2024)

Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches
per: Fang, Shaoke, et al.
Publicat: (2026)

On Optimal Batch Size in Coded Computing
per: Saha, Swapnil, et al.
Publicat: (2025)

Argus: Token Aware Distributed LLM Inference Optimization
per: Wu, Panlong, et al.
Publicat: (2025)