:: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Gallego, Víctor
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2605.09708
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search
by: Nichols, Daniel, et al.
Published: (2026)

Liger Kernel: Efficient Triton Kernels for LLM Training
by: Hsu, Pin-Lun, et al.
Published: (2024)

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra
by: Ochiai, Yoichi
Published: (2026)

LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025)

MatKV: Trading Compute for Flash Storage in LLM Inference
by: Shin, Kun-Woo, et al.
Published: (2025)

Native LLM and MLLM Inference at Scale on Apple Silicon
by: Barrios, Wayner
Published: (2026)

MoEless: Efficient MoE LLM Serving via Serverless Computing
by: Yu, Hanfei, et al.
Published: (2026)

Fine-Tuning GPT-5 for GPU Kernel Generation
by: Tehrani, Ali, et al.
Published: (2026)

GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
by: Chu, Ruifan, et al.
Published: (2025)

Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision
by: Hudson, Nathaniel, et al.
Published: (2024)

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
by: Liu, Jing, et al.
Published: (2025)

Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications
by: Liu, Youyuan, et al.
Published: (2024)

Canvas: End-to-End Kernel Architecture Search in Neural Networks
by: Zhao, Chenggang, et al.
Published: (2023)

Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems
by: Jaiswal, Shashwat, et al.
Published: (2025)

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
by: Tang, Zhenheng, et al.
Published: (2024)

LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
by: Gond, Raja, et al.
Published: (2026)

HtFLlib: A Comprehensive Heterogeneous Federated Learning Library and Benchmark
by: Zhang, Jianqing, et al.
Published: (2025)

TokenPowerBench: Benchmarking the Power Consumption of LLM Inference
by: Niu, Chenxu, et al.
Published: (2025)

Federated Neural Architecture Search with Model-Agnostic Meta Learning
by: Huang, Xinyuan, et al.
Published: (2025)

CoLLM: Continuous Adaptation for SLO-Aware LLM Serving on Shared GPU Clusters
by: Huang, Shaoyuan, et al.
Published: (2026)

A Survey of Resource-efficient LLM and Multimodal Foundation Models
by: Xu, Mengwei, et al.
Published: (2024)

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
by: Zhao, Juntao, et al.
Published: (2024)

VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving
by: Yu, Jiahuan, et al.
Published: (2025)

Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
by: Liu, Zhihong, et al.
Published: (2024)

Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
by: Deshmukh, Dhruv, et al.
Published: (2025)

Combining Cloud and Mobile Computing for Machine Learning
by: Xu, Ruiqi, et al.
Published: (2024)

Lodestar: An Online-Learning LLM Inference Router
by: Lim, Gangmuk, et al.
Published: (2026)

Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data
by: Salmeron, Jose L., et al.
Published: (2024)

Niyama : Breaking the Silos of LLM Inference Serving
by: Goel, Kanishk, et al.
Published: (2025)

On Evaluating Performance of LLM Inference Serving Systems
by: Agrawal, Amey, et al.
Published: (2025)

Robust LLM Training Infrastructure at ByteDance
by: Wan, Borui, et al.
Published: (2025)

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
by: Jimenez-Gutierrez, Daniel M., et al.
Published: (2026)

Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks
by: Deng, Xiumei, et al.
Published: (2025)

Bayesian Federated Model Compression for Communication and Computation Efficiency
by: Xia, Chengyu, et al.
Published: (2024)

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
by: Li, Xiaocan, et al.
Published: (2025)

Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
by: Feng, Yicheng, et al.
Published: (2026)

Efficient Serving of LLM Applications with Probabilistic Demand Modeling
by: Liu, Yifei, et al.
Published: (2025)

Frontier: Simulating the Next Generation of LLM Inference Systems
by: Feng, Yicheng, et al.
Published: (2025)

RL in the Wild: Characterizing RLVR Training in LLM Deployment
by: Zhou, Jiecheng, et al.
Published: (2025)

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts
by: Zhang, Shulai, et al.
Published: (2025)