Sparad:
| Huvudupphovsmän: | Guldogan, Ozgur, Kunde, Jackson, Lee, Kangwook, Pedarsani, Ramtin |
|---|---|
| Materialtyp: | Preprint |
| Publicerad: |
2024
|
| Ämnen: | |
| Länkar: | https://arxiv.org/abs/2412.04504 |
| Taggar: |
Lägg till en tagg
Inga taggar, Lägg till första taggen!
|
- Beståndsuppgifter
- Beskrivning
- Innehållsförteckning
- Kommentarer
- Liknande verk
- Katalogiseringsuppgifter
Liknande verk
Quantized Decentralized Stochastic Learning over Directed Graphs
av: Taheri, Hossein, et al.
Publicerad: (2020)
av: Taheri, Hossein, et al.
Publicerad: (2020)
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
av: Zheng, Zhen, et al.
Publicerad: (2024)
av: Zheng, Zhen, et al.
Publicerad: (2024)
SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
av: Xu, Yaodan, et al.
Publicerad: (2025)
av: Xu, Yaodan, et al.
Publicerad: (2025)
Robust Decentralized Learning with Local Updates and Gradient Tracking
av: Ghiasvand, Sajjad, et al.
Publicerad: (2024)
av: Ghiasvand, Sajjad, et al.
Publicerad: (2024)
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
av: Xu, Tairan, et al.
Publicerad: (2025)
av: Xu, Tairan, et al.
Publicerad: (2025)
JITServe: SLO-aware LLM Serving with Imprecise Request Information
av: Zhang, Wei, et al.
Publicerad: (2025)
av: Zhang, Wei, et al.
Publicerad: (2025)
Staggered Batch Scheduling: Co-optimizing Time-to-First-Token and Throughput for High-Efficiency LLM Inference
av: Tian, Jian, et al.
Publicerad: (2025)
av: Tian, Jian, et al.
Publicerad: (2025)
A Multi-Level Approach for Class Imbalance Problem in Federated Learning for Remote Industry 4.0 Applications
av: Hussain, Razin Farhan, et al.
Publicerad: (2024)
av: Hussain, Razin Farhan, et al.
Publicerad: (2024)
SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection
av: Shukla, Shikhar
Publicerad: (2026)
av: Shukla, Shikhar
Publicerad: (2026)
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference
av: Recasens, Pol G., et al.
Publicerad: (2025)
av: Recasens, Pol G., et al.
Publicerad: (2025)
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
av: Agrawal, Amey, et al.
Publicerad: (2024)
av: Agrawal, Amey, et al.
Publicerad: (2024)
Technical Report on Reinforcement Learning Control on the Lucas-Nülle Inverted Pendulum
av: Schenke, Maximilian, et al.
Publicerad: (2024)
av: Schenke, Maximilian, et al.
Publicerad: (2024)
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning
av: Vercellino, Roberto, et al.
Publicerad: (2026)
av: Vercellino, Roberto, et al.
Publicerad: (2026)
Prediction of Permissioned Blockchain Performance for Resource Scaling Configurations
av: Jung, Seungwoo, et al.
Publicerad: (2025)
av: Jung, Seungwoo, et al.
Publicerad: (2025)
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services
av: Łazuka, Małgorzata, et al.
Publicerad: (2024)
av: Łazuka, Małgorzata, et al.
Publicerad: (2024)
PolyKV: A Shared Asymmetrically-Compressed KV Cache Pool for Multi-Agent LLM Inference
av: Patel, Ishan, et al.
Publicerad: (2026)
av: Patel, Ishan, et al.
Publicerad: (2026)
AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
av: Li, Siyuan, et al.
Publicerad: (2024)
av: Li, Siyuan, et al.
Publicerad: (2024)
FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees
av: Oliaro, Gabriele, et al.
Publicerad: (2024)
av: Oliaro, Gabriele, et al.
Publicerad: (2024)
PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference
av: Ning, Rui, et al.
Publicerad: (2026)
av: Ning, Rui, et al.
Publicerad: (2026)
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
av: Butler, Branden, et al.
Publicerad: (2024)
av: Butler, Branden, et al.
Publicerad: (2024)
Multi-Source to Multi-Target Decentralized Federated Domain Adaptation
av: Wang, Su, et al.
Publicerad: (2023)
av: Wang, Su, et al.
Publicerad: (2023)
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
av: Ye, Rui, et al.
Publicerad: (2024)
av: Ye, Rui, et al.
Publicerad: (2024)
Towards Building the Federated GPT: Federated Instruction Tuning
av: Zhang, Jianyi, et al.
Publicerad: (2023)
av: Zhang, Jianyi, et al.
Publicerad: (2023)
AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems
av: Bamidele, Emmanuel
Publicerad: (2026)
av: Bamidele, Emmanuel
Publicerad: (2026)
Equilibrium in the Computing Continuum through Active Inference
av: Sedlak, Boris, et al.
Publicerad: (2023)
av: Sedlak, Boris, et al.
Publicerad: (2023)
ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs
av: Parikh, Dhruv, et al.
Publicerad: (2025)
av: Parikh, Dhruv, et al.
Publicerad: (2025)
Decentralized Online Learning for Random Inverse Problems Over Graphs
av: Zhang, Xiwei, et al.
Publicerad: (2023)
av: Zhang, Xiwei, et al.
Publicerad: (2023)
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
av: Behera, Adarsh Prasad, et al.
Publicerad: (2025)
av: Behera, Adarsh Prasad, et al.
Publicerad: (2025)
AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration
av: McDanel, Bradley
Publicerad: (2024)
av: McDanel, Bradley
Publicerad: (2024)
Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
av: Taufique, Zain, et al.
Publicerad: (2023)
av: Taufique, Zain, et al.
Publicerad: (2023)
ISO: Overlap of Computation and Communication within Seqenence For LLM Inference
av: Xiao, Bin, et al.
Publicerad: (2024)
av: Xiao, Bin, et al.
Publicerad: (2024)
SPARe: Stacked Parallelism with Adaptive Reordering for Fault-Tolerant LLM Pretraining Systems with 100k+ GPUs
av: Lee, Jin, et al.
Publicerad: (2026)
av: Lee, Jin, et al.
Publicerad: (2026)
Token Management in Multi-Tenant AI Inference Platforms
av: Cunningham, William J.
Publicerad: (2026)
av: Cunningham, William J.
Publicerad: (2026)
Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
av: Pang, Bowen, et al.
Publicerad: (2025)
av: Pang, Bowen, et al.
Publicerad: (2025)
LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference
av: Du, Yin, et al.
Publicerad: (2026)
av: Du, Yin, et al.
Publicerad: (2026)
Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
av: Doostmohammadian, Mohammadreza, et al.
Publicerad: (2025)
av: Doostmohammadian, Mohammadreza, et al.
Publicerad: (2025)
A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise Insertion
av: Hanzely, Filip, et al.
Publicerad: (2019)
av: Hanzely, Filip, et al.
Publicerad: (2019)
Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions
av: Barrak, Amine, et al.
Publicerad: (2025)
av: Barrak, Amine, et al.
Publicerad: (2025)
Deoxys: A Causal Inference Engine for Unhealthy Node Mitigation in Large-scale Cloud Infrastructure
av: Zhang, Chaoyun, et al.
Publicerad: (2024)
av: Zhang, Chaoyun, et al.
Publicerad: (2024)
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control
av: Chen, Qiaoling, et al.
Publicerad: (2026)
av: Chen, Qiaoling, et al.
Publicerad: (2026)
Liknande verk
-
Quantized Decentralized Stochastic Learning over Directed Graphs
av: Taheri, Hossein, et al.
Publicerad: (2020) -
BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching
av: Zheng, Zhen, et al.
Publicerad: (2024) -
SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
av: Xu, Yaodan, et al.
Publicerad: (2025) -
Robust Decentralized Learning with Local Updates and Gradient Tracking
av: Ghiasvand, Sajjad, et al.
Publicerad: (2024) -
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
av: Xu, Tairan, et al.
Publicerad: (2025)