Saved in:
| Main Author: | Gallego, Víctor |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.09708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search
by: Nichols, Daniel, et al.
Published: (2026)
by: Nichols, Daniel, et al.
Published: (2026)
Liger Kernel: Efficient Triton Kernels for LLM Training
by: Hsu, Pin-Lun, et al.
Published: (2024)
by: Hsu, Pin-Lun, et al.
Published: (2024)
Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra
by: Ochiai, Yoichi
Published: (2026)
by: Ochiai, Yoichi
Published: (2026)
LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025)
by: Hu, Huanqi, et al.
Published: (2025)
MatKV: Trading Compute for Flash Storage in LLM Inference
by: Shin, Kun-Woo, et al.
Published: (2025)
by: Shin, Kun-Woo, et al.
Published: (2025)
Native LLM and MLLM Inference at Scale on Apple Silicon
by: Barrios, Wayner
Published: (2026)
by: Barrios, Wayner
Published: (2026)
MoEless: Efficient MoE LLM Serving via Serverless Computing
by: Yu, Hanfei, et al.
Published: (2026)
by: Yu, Hanfei, et al.
Published: (2026)
Fine-Tuning GPT-5 for GPU Kernel Generation
by: Tehrani, Ali, et al.
Published: (2026)
by: Tehrani, Ali, et al.
Published: (2026)
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
by: Chu, Ruifan, et al.
Published: (2025)
by: Chu, Ruifan, et al.
Published: (2025)
Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision
by: Hudson, Nathaniel, et al.
Published: (2024)
by: Hudson, Nathaniel, et al.
Published: (2024)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey
by: Liu, Jing, et al.
Published: (2025)
by: Liu, Jing, et al.
Published: (2025)
Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications
by: Liu, Youyuan, et al.
Published: (2024)
by: Liu, Youyuan, et al.
Published: (2024)
Canvas: End-to-End Kernel Architecture Search in Neural Networks
by: Zhao, Chenggang, et al.
Published: (2023)
by: Zhao, Chenggang, et al.
Published: (2023)
Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems
by: Jaiswal, Shashwat, et al.
Published: (2025)
by: Jaiswal, Shashwat, et al.
Published: (2025)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression
by: Tang, Zhenheng, et al.
Published: (2024)
by: Tang, Zhenheng, et al.
Published: (2024)
LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
by: Gond, Raja, et al.
Published: (2026)
by: Gond, Raja, et al.
Published: (2026)
HtFLlib: A Comprehensive Heterogeneous Federated Learning Library and Benchmark
by: Zhang, Jianqing, et al.
Published: (2025)
by: Zhang, Jianqing, et al.
Published: (2025)
TokenPowerBench: Benchmarking the Power Consumption of LLM Inference
by: Niu, Chenxu, et al.
Published: (2025)
by: Niu, Chenxu, et al.
Published: (2025)
Federated Neural Architecture Search with Model-Agnostic Meta Learning
by: Huang, Xinyuan, et al.
Published: (2025)
by: Huang, Xinyuan, et al.
Published: (2025)
CoLLM: Continuous Adaptation for SLO-Aware LLM Serving on Shared GPU Clusters
by: Huang, Shaoyuan, et al.
Published: (2026)
by: Huang, Shaoyuan, et al.
Published: (2026)
A Survey of Resource-efficient LLM and Multimodal Foundation Models
by: Xu, Mengwei, et al.
Published: (2024)
by: Xu, Mengwei, et al.
Published: (2024)
LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
by: Zhao, Juntao, et al.
Published: (2024)
by: Zhao, Juntao, et al.
Published: (2024)
VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving
by: Yu, Jiahuan, et al.
Published: (2025)
by: Yu, Jiahuan, et al.
Published: (2025)
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
by: Liu, Zhihong, et al.
Published: (2024)
by: Liu, Zhihong, et al.
Published: (2024)
Kascade: A Practical Sparse Attention Method for Long-Context LLM Inference
by: Deshmukh, Dhruv, et al.
Published: (2025)
by: Deshmukh, Dhruv, et al.
Published: (2025)
Combining Cloud and Mobile Computing for Machine Learning
by: Xu, Ruiqi, et al.
Published: (2024)
by: Xu, Ruiqi, et al.
Published: (2024)
Lodestar: An Online-Learning LLM Inference Router
by: Lim, Gangmuk, et al.
Published: (2026)
by: Lim, Gangmuk, et al.
Published: (2026)
Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data
by: Salmeron, Jose L., et al.
Published: (2024)
by: Salmeron, Jose L., et al.
Published: (2024)
Niyama : Breaking the Silos of LLM Inference Serving
by: Goel, Kanishk, et al.
Published: (2025)
by: Goel, Kanishk, et al.
Published: (2025)
On Evaluating Performance of LLM Inference Serving Systems
by: Agrawal, Amey, et al.
Published: (2025)
by: Agrawal, Amey, et al.
Published: (2025)
Robust LLM Training Infrastructure at ByteDance
by: Wan, Borui, et al.
Published: (2025)
by: Wan, Borui, et al.
Published: (2025)
Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
by: Jimenez-Gutierrez, Daniel M., et al.
Published: (2026)
by: Jimenez-Gutierrez, Daniel M., et al.
Published: (2026)
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks
by: Deng, Xiumei, et al.
Published: (2025)
by: Deng, Xiumei, et al.
Published: (2025)
Bayesian Federated Model Compression for Communication and Computation Efficiency
by: Xia, Chengyu, et al.
Published: (2024)
by: Xia, Chengyu, et al.
Published: (2024)
A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
by: Li, Xiaocan, et al.
Published: (2025)
by: Li, Xiaocan, et al.
Published: (2025)
Frontier: Towards Comprehensive and Accurate LLM Inference Simulation
by: Feng, Yicheng, et al.
Published: (2026)
by: Feng, Yicheng, et al.
Published: (2026)
Efficient Serving of LLM Applications with Probabilistic Demand Modeling
by: Liu, Yifei, et al.
Published: (2025)
by: Liu, Yifei, et al.
Published: (2025)
Frontier: Simulating the Next Generation of LLM Inference Systems
by: Feng, Yicheng, et al.
Published: (2025)
by: Feng, Yicheng, et al.
Published: (2025)
RL in the Wild: Characterizing RLVR Training in LLM Deployment
by: Zhou, Jiecheng, et al.
Published: (2025)
by: Zhou, Jiecheng, et al.
Published: (2025)
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts
by: Zhang, Shulai, et al.
Published: (2025)
by: Zhang, Shulai, et al.
Published: (2025)
Similar Items
-
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search
by: Nichols, Daniel, et al.
Published: (2026) -
Liger Kernel: Efficient Triton Kernels for LLM Training
by: Hsu, Pin-Lun, et al.
Published: (2024) -
Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra
by: Ochiai, Yoichi
Published: (2026) -
LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025) -
MatKV: Trading Compute for Flash Storage in LLM Inference
by: Shin, Kun-Woo, et al.
Published: (2025)