Saved in:
| Main Author: | Jiménez, Arturo Urías |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.11614 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
by: Hwang, Changho, et al.
Published: (2025)
by: Hwang, Changho, et al.
Published: (2025)
Accelerated Digital Twin Learning for Edge AI: A Comparison of FPGA and Mobile GPU
by: Xu, Bin, et al.
Published: (2025)
by: Xu, Bin, et al.
Published: (2025)
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems
by: VG, Jithin, et al.
Published: (2025)
by: VG, Jithin, et al.
Published: (2025)
UCCL-Zip: Lossless Compression Supercharged GPU Communication
by: Ma, Shuang, et al.
Published: (2026)
by: Ma, Shuang, et al.
Published: (2026)
Power- and Fragmentation-aware Online Scheduling for GPU Datacenters
by: Lettich, Francesco, et al.
Published: (2024)
by: Lettich, Francesco, et al.
Published: (2024)
SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
by: Tschand, Arya, et al.
Published: (2025)
by: Tschand, Arya, et al.
Published: (2025)
Towards Scalable GPU-Accelerated SNN Training via Temporal Fusion
by: Li, Yanchen, et al.
Published: (2024)
by: Li, Yanchen, et al.
Published: (2024)
Accelerating Large Language Model Training with Hybrid GPU-based Compression
by: Xu, Lang, et al.
Published: (2024)
by: Xu, Lang, et al.
Published: (2024)
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
by: Yang, Ruijia, et al.
Published: (2026)
by: Yang, Ruijia, et al.
Published: (2026)
Speeding up Local Optimization in Vehicle Routing with Tensor-based GPU Acceleration
by: Lei, Zhenyu, et al.
Published: (2025)
by: Lei, Zhenyu, et al.
Published: (2025)
Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
by: Shi, Jiabo, et al.
Published: (2025)
by: Shi, Jiabo, et al.
Published: (2025)
Reducing Fragmentation and Starvation in GPU Clusters through Dynamic Multi-Objective Scheduling
by: Mamirov, Akhmadillo
Published: (2025)
by: Mamirov, Akhmadillo
Published: (2025)
Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU
by: Spoczynski, Marcin, et al.
Published: (2026)
by: Spoczynski, Marcin, et al.
Published: (2026)
Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure
by: Guo, Yongjian, et al.
Published: (2026)
by: Guo, Yongjian, et al.
Published: (2026)
PolyKAN: Efficient Fused GPU Operators for Polynomial Kolmogorov-Arnold Network Variants
by: Yu, Mingkun, et al.
Published: (2025)
by: Yu, Mingkun, et al.
Published: (2025)
FairKV: Balancing Per-Head KV Cache for Fast Multi-GPU Inference
by: Zhao, Bingzhe, et al.
Published: (2025)
by: Zhao, Bingzhe, et al.
Published: (2025)
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors
by: Chen, Siyuan, et al.
Published: (2024)
by: Chen, Siyuan, et al.
Published: (2024)
Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
by: Argerich, Mauricio Fadel, et al.
Published: (2026)
by: Argerich, Mauricio Fadel, et al.
Published: (2026)
A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems
by: Wu, Qi, et al.
Published: (2026)
by: Wu, Qi, et al.
Published: (2026)
Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
by: Zhao, Dan, et al.
Published: (2024)
by: Zhao, Dan, et al.
Published: (2024)
Debunking the CUDA Myth Towards GPU-based AI Systems
by: Lee, Yunjae, et al.
Published: (2024)
by: Lee, Yunjae, et al.
Published: (2024)
A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search
by: Futuhi, Ehsan, et al.
Published: (2025)
by: Futuhi, Ehsan, et al.
Published: (2025)
ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
by: Stanisic, Andrija, et al.
Published: (2025)
by: Stanisic, Andrija, et al.
Published: (2025)
GEM: GPU-Variability-Aware Expert to GPU Mapping for MoE Systems
by: Wawdhane, Sourish, et al.
Published: (2026)
by: Wawdhane, Sourish, et al.
Published: (2026)
Fine-Tuning and Serving Gemma 4 31B on Google Cloud TPU: A Technical Comparison with GPU Baselines
by: Kishnani, Jatin, et al.
Published: (2026)
by: Kishnani, Jatin, et al.
Published: (2026)
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
by: Singh, Siddharth, et al.
Published: (2025)
by: Singh, Siddharth, et al.
Published: (2025)
Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS
by: Pennati, Luca, et al.
Published: (2026)
by: Pennati, Luca, et al.
Published: (2026)
Beyond the Buzz: A Pragmatic Take on Inference Disaggregation
by: Mitra, Tiyasa, et al.
Published: (2025)
by: Mitra, Tiyasa, et al.
Published: (2025)
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
by: Chu, Ruifan, et al.
Published: (2025)
by: Chu, Ruifan, et al.
Published: (2025)
GPU Memory Prediction for Multimodal Model Training
by: Jeong, Jinwoo, et al.
Published: (2025)
by: Jeong, Jinwoo, et al.
Published: (2025)
Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
by: Jimenez-Gutierrez, Daniel M., et al.
Published: (2026)
by: Jimenez-Gutierrez, Daniel M., et al.
Published: (2026)
AI Benchmarks and Datasets for LLM Evaluation
by: Ivanov, Todor, et al.
Published: (2024)
by: Ivanov, Todor, et al.
Published: (2024)
LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems
by: Li, Yufei, et al.
Published: (2025)
by: Li, Yufei, et al.
Published: (2025)
Towards Resource-Efficient Compound AI Systems
by: Chaudhry, Gohar Irfan, et al.
Published: (2025)
by: Chaudhry, Gohar Irfan, et al.
Published: (2025)
AI Factories: It's time to rethink the Cloud-HPC divide
by: Lopez, Pedro Garcia, et al.
Published: (2025)
by: Lopez, Pedro Garcia, et al.
Published: (2025)
The AI_INFN Platform: Artificial Intelligence Development in the Cloud
by: Anderlini, Lucio, et al.
Published: (2025)
by: Anderlini, Lucio, et al.
Published: (2025)
Designing Datacenter Power Delivery Hierarchies for the AI Era
by: Wilkins, Grant, et al.
Published: (2026)
by: Wilkins, Grant, et al.
Published: (2026)
The infrastructure powering IBM's Gen AI model development
by: Gershon, Talia, et al.
Published: (2024)
by: Gershon, Talia, et al.
Published: (2024)
Adaptation of AI-accelerated CFD Simulations to the IPU platform
by: Rosciszewski, P., et al.
Published: (2026)
by: Rosciszewski, P., et al.
Published: (2026)
Decentralized AI: Permissionless LLM Inference on POKT Network
by: Olshansky, Daniel, et al.
Published: (2024)
by: Olshansky, Daniel, et al.
Published: (2024)
Similar Items
-
MSCCL++: Rethinking GPU Communication Abstractions for AI Inference
by: Hwang, Changho, et al.
Published: (2025) -
Accelerated Digital Twin Learning for Edge AI: A Comparison of FPGA and Mobile GPU
by: Xu, Bin, et al.
Published: (2025) -
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems
by: VG, Jithin, et al.
Published: (2025) -
UCCL-Zip: Lossless Compression Supercharged GPU Communication
by: Ma, Shuang, et al.
Published: (2026) -
Power- and Fragmentation-aware Online Scheduling for GPU Datacenters
by: Lettich, Francesco, et al.
Published: (2024)