:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Akewar, Mayur, Madireddy, Sandeep, Luo, Dongsheng, Bhimani, Janki
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.10246
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
by: Luo, Xinhao, et al.
Published: (2025)

Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling
by: Da, Wei, et al.
Published: (2025)

Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling
by: Jadhav, Prachi, et al.
Published: (2025)

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention
by: Liao, Mengqi, et al.
Published: (2026)

B-PASTE: Beam-Aware Pattern-Guided Speculative Execution for Resource-Constrained LLM Agents
by: Song, Yanfei
Published: (2026)

Idiosyncrasies of Programmable Caching Engines
by: Peixoto, José, et al.
Published: (2026)

Accelerated Digital Twin Learning for Edge AI: A Comparison of FPGA and Mobile GPU
by: Xu, Bin, et al.
Published: (2025)

FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference
by: Liu, Xing, et al.
Published: (2025)

Simplifying Root Cause Analysis in Kubernetes with StateGraph and LLM
by: Xiang, Yong, et al.
Published: (2025)

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs
by: Kubwimana, Benjamin, et al.
Published: (2025)

SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint Satisfaction
by: Zhang, Wuyang, et al.
Published: (2025)

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs
by: Chen, Aodong, et al.
Published: (2023)

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
by: Li, Zhonggen, et al.
Published: (2025)

Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware
by: Khalil, Alex, et al.
Published: (2025)

Transforming Future Data Center Operations and Management via Physical AI
by: Cao, Zhiwei, et al.
Published: (2025)

xLLM Technical Report
by: Liu, Tongxuan, et al.
Published: (2025)

Elastic On-Device LLM Service
by: Yin, Wangsong, et al.
Published: (2024)

SparOA: Sparse and Operator-aware Hybrid Scheduling for Edge DNN Inference
by: Zhang, Ziyang, et al.
Published: (2025)

A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM
by: Xi, Shaoke, et al.
Published: (2026)

AI Benchmarks and Datasets for LLM Evaluation
by: Ivanov, Todor, et al.
Published: (2024)

Learning Provably Correct Distributed Protocols Without Human Knowledge
by: Hui, Yujie, et al.
Published: (2026)

PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants
by: Yu, Mingkun, et al.
Published: (2025)

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning
by: Liaw, Yong-Cheng, et al.
Published: (2025)

A Planet Scale Spatial-Temporal Knowledge Graph Based On OpenStreetMap And H3 Grid
by: Böckling, Martin, et al.
Published: (2024)

Revisiting Parameter Server in LLM Post-Training
by: Wan, Xinyi, et al.
Published: (2026)

Accelerating LLM Inference with Precomputed Query Storage
by: Park, Jay H., et al.
Published: (2025)

High-Throughput LLM inference on Heterogeneous Clusters
by: Xiong, Yi, et al.
Published: (2025)

Byzantine-Robust Decentralized Coordination of LLM Agents
by: Jo, Yongrae, et al.
Published: (2025)

Tutoring LLM into a Better CUDA Optimizer
by: Brabec, Matyáš, et al.
Published: (2025)

A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search
by: Futuhi, Ehsan, et al.
Published: (2025)

A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning
by: Ogunsina, Kolawole E., et al.
Published: (2025)

LLM Inference Serving: Survey of Recent Advances and Opportunities
by: Li, Baolin, et al.
Published: (2024)

Decentralized AI: Permissionless LLM Inference on POKT Network
by: Olshansky, Daniel, et al.
Published: (2024)

ECCENTRIC: Edge-Cloud Collaboration Framework for Distributed Inference Using Knowledge Adaptation
by: Kamani, Mohammad Mahdi, et al.
Published: (2025)

PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization
by: Lei, Kelun, et al.
Published: (2025)

Privacy-Preserving Federated Learning: Integrating Zero-Knowledge Proofs in Scalable Distributed Architectures
by: Gupta, Divya
Published: (2026)

LAPS: A Length-Aware-Prefill LLM Serving System
by: She, Jianshu, et al.
Published: (2026)

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines
by: Wagenländer, Marcel, et al.
Published: (2026)

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
by: Hankendi, Can, et al.
Published: (2026)

FairBatching: Fairness-Aware Batch Formation for LLM Inference
by: Lyu, Hongtao, et al.
Published: (2025)