:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Jiménez, Arturo Urías
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.11614
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
by: Hwang, Changho, et al.
Published: (2025)

Accelerated Digital Twin Learning for Edge AI: A Comparison of FPGA and Mobile GPU
by: Xu, Bin, et al.
Published: (2025)

GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems
by: VG, Jithin, et al.
Published: (2025)

UCCL-Zip: Lossless Compression Supercharged GPU Communication
by: Ma, Shuang, et al.
Published: (2026)

Power- and Fragmentation-aware Online Scheduling for GPU Datacenters
by: Lettich, Francesco, et al.
Published: (2024)

SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
by: Tschand, Arya, et al.
Published: (2025)

Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
by: Li, Yanchen, et al.
Published: (2024)

Accelerating Large Language Model Training with Hybrid GPU-based Compression
by: Xu, Lang, et al.
Published: (2024)

An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
by: Yang, Ruijia, et al.
Published: (2026)

Speeding up Local Optimization in Vehicle Routing with Tensor-based GPU Acceleration
by: Lei, Zhenyu, et al.
Published: (2025)

Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
by: Shi, Jiabo, et al.
Published: (2025)

Reducing Fragmentation and Starvation in GPU Clusters through Dynamic Multi-Objective Scheduling
by: Mamirov, Akhmadillo
Published: (2025)

Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU
by: Spoczynski, Marcin, et al.
Published: (2026)

Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure
by: Guo, Yongjian, et al.
Published: (2026)

PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants
by: Yu, Mingkun, et al.
Published: (2025)

FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
by: Zhao, Bingzhe, et al.
Published: (2025)

Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
by: Chen, Siyuan, et al.
Published: (2024)

Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
by: Argerich, Mauricio Fadel, et al.
Published: (2026)

A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems
by: Wu, Qi, et al.
Published: (2026)

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
by: Zhao, Dan, et al.
Published: (2024)

Debunking the CUDA Myth Towards GPU-based AI Systems
by: Lee, Yunjae, et al.
Published: (2024)

A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search
by: Futuhi, Ehsan, et al.
Published: (2025)

ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
by: Stanisic, Andrija, et al.
Published: (2025)

GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems
by: Wawdhane, Sourish, et al.
Published: (2026)

Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines
by: Kishnani, Jatin, et al.
Published: (2026)

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
by: Singh, Siddharth, et al.
Published: (2025)

Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS
by: Pennati, Luca, et al.
Published: (2026)

Beyond the Buzz: A Pragmatic Take on Inference Disaggregation
by: Mitra, Tiyasa, et al.
Published: (2025)

GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
by: Chu, Ruifan, et al.
Published: (2025)

GPU Memory Prediction for Multimodal Model Training
by: Jeong, Jinwoo, et al.
Published: (2025)

Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
by: Jimenez-Gutierrez, Daniel M., et al.
Published: (2026)

AI Benchmarks and Datasets for LLM Evaluation
by: Ivanov, Todor, et al.
Published: (2024)

LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems
by: Li, Yufei, et al.
Published: (2025)

Towards Resource-Efficient Compound AI Systems
by: Chaudhry, Gohar Irfan, et al.
Published: (2025)

AI Factories: It's time to rethink the Cloud-HPC divide
by: Lopez, Pedro Garcia, et al.
Published: (2025)

The AI_INFN Platform: Artificial Intelligence Development in the Cloud
by: Anderlini, Lucio, et al.
Published: (2025)

Designing Datacenter Power Delivery Hierarchies for the AI Era
by: Wilkins, Grant, et al.
Published: (2026)

The infrastructure powering IBM's Gen AI model development
by: Gershon, Talia, et al.
Published: (2024)

Adaptation of AI-accelerated CFD Simulations to the IPU platform
by: Rosciszewski, P., et al.
Published: (2026)

Decentralized AI: Permissionless LLM Inference on POKT Network
by: Olshansky, Daniel, et al.
Published: (2024)