:: Library Catalog

Omslagsbild

Sparad:

Bibliografiska uppgifter
Huvudupphovsmän:	Guldogan, Ozgur, Kunde, Jackson, Lee, Kangwook, Pedarsani, Ramtin
Materialtyp:	Preprint
Publicerad:	2024
Ämnen:	Computation and Language Distributed, Parallel, and Cluster Computing Machine Learning Systems and Control
Länkar:	https://arxiv.org/abs/2412.04504
Taggar:	Lägg till en tagg Inga taggar, Lägg till första taggen!

Liknande verk

Quantized Decentralized Stochastic Learning over Directed Graphs
av: Taheri, Hossein, et al.
Publicerad: (2020)

BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
av: Zheng, Zhen, et al.
Publicerad: (2024)

SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
av: Xu, Yaodan, et al.
Publicerad: (2025)

Robust Decentralized Learning with Local Updates and Gradient Tracking
av: Ghiasvand, Sajjad, et al.
Publicerad: (2024)

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
av: Xu, Tairan, et al.
Publicerad: (2025)

JITServe: SLO-aware LLM Serving with Imprecise Request Information
av: Zhang, Wei, et al.
Publicerad: (2025)

Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference
av: Tian, Jian, et al.
Publicerad: (2025)

A Multi-Level Approach for Class Imbalance Problem in Federated Learning for Remote Industry 4.0 Applications
av: Hussain, Razin Farhan, et al.
Publicerad: (2024)

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
av: Shukla, Shikhar
Publicerad: (2026)

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
av: Recasens, Pol G., et al.
Publicerad: (2025)

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
av: Agrawal, Amey, et al.
Publicerad: (2024)

Technical Report on Reinforcement Learning Control on the Lucas-Nülle Inverted Pendulum
av: Schenke, Maximilian, et al.
Publicerad: (2024)

Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning
av: Vercellino, Roberto, et al.
Publicerad: (2026)

Prediction of Permissioned Blockchain Performance for Resource Scaling Configurations
av: Jung, Seungwoo, et al.
Publicerad: (2025)

LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services
av: Łazuka, Małgorzata, et al.
Publicerad: (2024)

PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
av: Patel, Ishan, et al.
Publicerad: (2026)

AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
av: Li, Siyuan, et al.
Publicerad: (2024)

FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees
av: Oliaro, Gabriele, et al.
Publicerad: (2024)

PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference
av: Ning, Rui, et al.
Publicerad: (2026)

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
av: Butler, Branden, et al.
Publicerad: (2024)

Multi-Source to Multi-Target Decentralized Federated Domain Adaptation
av: Wang, Su, et al.
Publicerad: (2023)

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
av: Ye, Rui, et al.
Publicerad: (2024)

Towards Building the Federated GPT: Federated Instruction Tuning
av: Zhang, Jianyi, et al.
Publicerad: (2023)

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems
av: Bamidele, Emmanuel
Publicerad: (2026)

Equilibrium in the Computing Continuum through Active Inference
av: Sedlak, Boris, et al.
Publicerad: (2023)

ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs
av: Parikh, Dhruv, et al.
Publicerad: (2025)

Decentralized Online Learning for Random Inverse Problems Over Graphs
av: Zhang, Xiwei, et al.
Publicerad: (2023)

Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
av: Behera, Adarsh Prasad, et al.
Publicerad: (2025)

AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration
av: McDanel, Bradley
Publicerad: (2024)

Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
av: Taufique, Zain, et al.
Publicerad: (2023)

ISO: Overlap of Computation and Communication within Seqenence For LLM Inference
av: Xiao, Bin, et al.
Publicerad: (2024)

SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs
av: Lee, Jin, et al.
Publicerad: (2026)

Token Management in Multi-Tenant AI Inference Platforms
av: Cunningham, William J.
Publicerad: (2026)

Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
av: Pang, Bowen, et al.
Publicerad: (2025)

LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference
av: Du, Yin, et al.
Publicerad: (2026)

Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
av: Doostmohammadian, Mohammadreza, et al.
Publicerad: (2025)

A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion
av: Hanzely, Filip, et al.
Publicerad: (2019)

Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
av: Barrak, Amine, et al.
Publicerad: (2025)

Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure
av: Zhang, Chaoyun, et al.
Publicerad: (2024)

CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
av: Chen, Qiaoling, et al.
Publicerad: (2026)