Guardat en:
| Autors principals: | Zheng, Zhen, Ji, Xin, Fang, Taosong, Zhou, Fanghao, Liu, Chuanjie, Peng, Gang |
|---|---|
| Format: | Preprint |
| Publicat: |
2024
|
| Matèries: | |
| Accés en línia: | https://arxiv.org/abs/2412.03594 |
| Etiquetes: |
Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
Ítems similars
Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
per: Pang, Bowen, et al.
Publicat: (2025)
per: Pang, Bowen, et al.
Publicat: (2025)
FairBatching: Fairness-Aware Batch Formation for LLM Inference
per: Lyu, Hongtao, et al.
Publicat: (2025)
per: Lyu, Hongtao, et al.
Publicat: (2025)
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
per: Chen, Qiaoling, et al.
Publicat: (2026)
per: Chen, Qiaoling, et al.
Publicat: (2026)
Multi-Bin Batching for Increasing LLM Inference Throughput
per: Guldogan, Ozgur, et al.
Publicat: (2024)
per: Guldogan, Ozgur, et al.
Publicat: (2024)
Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference
per: Tian, Jian, et al.
Publicat: (2025)
per: Tian, Jian, et al.
Publicat: (2025)
HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
per: Chen, Jiabin, et al.
Publicat: (2024)
per: Chen, Jiabin, et al.
Publicat: (2024)
AlignedServe: Orchestrating Prefix-aware Batching to Build a High-throughput and Computing-efficient LLM Serving System
per: Bai, Fengyao, et al.
Publicat: (2026)
per: Bai, Fengyao, et al.
Publicat: (2026)
AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
per: Li, Siyuan, et al.
Publicat: (2024)
per: Li, Siyuan, et al.
Publicat: (2024)
Joint Optimization of Offloading, Batching and DVFS for Multiuser Co-Inference
per: Xu, Yaodan, et al.
Publicat: (2025)
per: Xu, Yaodan, et al.
Publicat: (2025)
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
per: Recasens, Pol G., et al.
Publicat: (2025)
per: Recasens, Pol G., et al.
Publicat: (2025)
PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference
per: Ning, Rui, et al.
Publicat: (2026)
per: Ning, Rui, et al.
Publicat: (2026)
BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving
per: Zheng, Wanyi, et al.
Publicat: (2025)
per: Zheng, Wanyi, et al.
Publicat: (2025)
Batch Query Processing and Optimization for Agentic Workflows
per: Shen, Junyi, et al.
Publicat: (2025)
per: Shen, Junyi, et al.
Publicat: (2025)
Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning
per: Chen, Hongyao, et al.
Publicat: (2025)
per: Chen, Hongyao, et al.
Publicat: (2025)
GPU-Accelerated Batch-Dynamic Subgraph Matching
per: Qiu, Linshan, et al.
Publicat: (2024)
per: Qiu, Linshan, et al.
Publicat: (2024)
Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA Graphs
per: Ekelund, Jonah, et al.
Publicat: (2025)
per: Ekelund, Jonah, et al.
Publicat: (2025)
Tangram: High-resolution Video Analytics on Serverless Platform with SLO-aware Batching
per: Peng, Haosong, et al.
Publicat: (2024)
per: Peng, Haosong, et al.
Publicat: (2024)
Collaborative Batch Size Optimization for Federated Learning
per: Geimer, Arno, et al.
Publicat: (2025)
per: Geimer, Arno, et al.
Publicat: (2025)
BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems
per: Duan, Tao, et al.
Publicat: (2025)
per: Duan, Tao, et al.
Publicat: (2025)
Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction
per: Cheng, Ke, et al.
Publicat: (2024)
per: Cheng, Ke, et al.
Publicat: (2024)
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
per: Xu, Tairan, et al.
Publicat: (2025)
per: Xu, Tairan, et al.
Publicat: (2025)
Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
per: Chen, Lequn, et al.
Publicat: (2023)
per: Chen, Lequn, et al.
Publicat: (2023)
SeaLLM: Service-Aware and Latency-Optimized Resource Sharing for Large Language Model Inference
per: Zhao, Yihao, et al.
Publicat: (2025)
per: Zhao, Yihao, et al.
Publicat: (2025)
Are Your Epochs Too Epic? Batch Free Can Be Harmful
per: Kim, Daewoo, et al.
Publicat: (2024)
per: Kim, Daewoo, et al.
Publicat: (2024)
Batched DGEMMs for scientific codes running on long vector architectures
per: Banchelli, Fabio, et al.
Publicat: (2025)
per: Banchelli, Fabio, et al.
Publicat: (2025)
Batch Denoising for AIGC Service Provisioning in Wireless Edge Networks
per: Xu, Jinghang, et al.
Publicat: (2025)
per: Xu, Jinghang, et al.
Publicat: (2025)
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
per: Sakip, Akhmed, et al.
Publicat: (2026)
per: Sakip, Akhmed, et al.
Publicat: (2026)
Ark: Offchain Transaction Batching in Bitcoin
per: Keer, Pim, et al.
Publicat: (2026)
per: Keer, Pim, et al.
Publicat: (2026)
Constraint Programming Models For Serial Batch Scheduling With Minimum Batch Size
per: Huertas, Jorge A., et al.
Publicat: (2025)
per: Huertas, Jorge A., et al.
Publicat: (2025)
Schedule-Level Shared-Prefix Reuse for LLM RL Training
per: Li, Pengbo, et al.
Publicat: (2026)
per: Li, Pengbo, et al.
Publicat: (2026)
On Using Large-Batches in Federated Learning
per: Tyagi, Sahil
Publicat: (2025)
per: Tyagi, Sahil
Publicat: (2025)
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs
per: Kolker-Hicks, Elliot, et al.
Publicat: (2024)
per: Kolker-Hicks, Elliot, et al.
Publicat: (2024)
Herring: Parallel Batch-Order-Fairness on DAG-based Blockchain Consensus
per: Putnik, Marko, et al.
Publicat: (2026)
per: Putnik, Marko, et al.
Publicat: (2026)
Batch-Schedule-Execute: On Optimizing Concurrent Deterministic Scheduling for Blockchains (Extended Version)
per: Hay, Yaron, et al.
Publicat: (2024)
per: Hay, Yaron, et al.
Publicat: (2024)
SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
per: Xu, Yaodan, et al.
Publicat: (2025)
per: Xu, Yaodan, et al.
Publicat: (2025)
Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
per: Barrak, Amine, et al.
Publicat: (2025)
per: Barrak, Amine, et al.
Publicat: (2025)
Optimal Batch Allocation for Wireless Federated Learning
per: Song, Jaeyoung, et al.
Publicat: (2024)
per: Song, Jaeyoung, et al.
Publicat: (2024)
Not All Tokens Are Worth Caching: Learning Semantic-Aware Eviction for LLM Prefix Caches
per: Fang, Shaoke, et al.
Publicat: (2026)
per: Fang, Shaoke, et al.
Publicat: (2026)
On Optimal Batch Size in Coded Computing
per: Saha, Swapnil, et al.
Publicat: (2025)
per: Saha, Swapnil, et al.
Publicat: (2025)
Argus: Token Aware Distributed LLM Inference Optimization
per: Wu, Panlong, et al.
Publicat: (2025)
per: Wu, Panlong, et al.
Publicat: (2025)
Ítems similars
-
Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
per: Pang, Bowen, et al.
Publicat: (2025) -
FairBatching: Fairness-Aware Batch Formation for LLM Inference
per: Lyu, Hongtao, et al.
Publicat: (2025) -
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
per: Chen, Qiaoling, et al.
Publicat: (2026) -
Multi-Bin Batching for Increasing LLM Inference Throughput
per: Guldogan, Ozgur, et al.
Publicat: (2024) -
Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference
per: Tian, Jian, et al.
Publicat: (2025)