Guardado en:
| Autores principales: | Matsumura, Kazuaki, De Gonzalo, Simon Garcia, Peña, Antonio J. |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2306.13002 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries
por: Amoros, Oscar, et al.
Publicado: (2025)
por: Amoros, Oscar, et al.
Publicado: (2025)
Syncopate: Efficient Multi-GPU AI Kernels via Automatic Chunk-Centric Compute-Communication Overlap
por: Qiang, Xinwei, et al.
Publicado: (2026)
por: Qiang, Xinwei, et al.
Publicado: (2026)
QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation
por: Zhu, Xinguo, et al.
Publicado: (2025)
por: Zhu, Xinguo, et al.
Publicado: (2025)
ZEUS: An Efficient GPU Optimization Method Integrating PSO, BFGS, and Automatic Differentiation
por: Soos, Dominik, et al.
Publicado: (2026)
por: Soos, Dominik, et al.
Publicado: (2026)
GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography
por: Liberati, Tiziana, et al.
Publicado: (2026)
por: Liberati, Tiziana, et al.
Publicado: (2026)
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
por: Jangda, Abhinav, et al.
Publicado: (2023)
por: Jangda, Abhinav, et al.
Publicado: (2023)
Accelerating Particle-in-Cell Monte Carlo Simulations with MPI, OpenMP/OpenACC and Asynchronous Multi-GPU Programming
por: Williams, Jeremy J., et al.
Publicado: (2024)
por: Williams, Jeremy J., et al.
Publicado: (2024)
SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
por: Tschand, Arya, et al.
Publicado: (2025)
por: Tschand, Arya, et al.
Publicado: (2025)
Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
por: Qiao, Tong, et al.
Publicado: (2025)
por: Qiao, Tong, et al.
Publicado: (2025)
GPU acceleration of non-equilibrium Green's function calculation using OpenACC and CUDA FORTRAN
por: Yin, Jia, et al.
Publicado: (2025)
por: Yin, Jia, et al.
Publicado: (2025)
Xe-Forge: Multi-Stage LLM-Powered Kernel Optimization for Intel GPU
por: Spoczynski, Marcin, et al.
Publicado: (2026)
por: Spoczynski, Marcin, et al.
Publicado: (2026)
KEET: Explaining Performance of GPU Kernels Using LLM Agents
por: Davis, Joshua H., et al.
Publicado: (2026)
por: Davis, Joshua H., et al.
Publicado: (2026)
On Similarity of Computational Kernels in our Codes and Proxies
por: McKinsey, Michael, et al.
Publicado: (2026)
por: McKinsey, Michael, et al.
Publicado: (2026)
Optimizing Bloom Filters for Modern GPU Architectures
por: Jünger, Daniel, et al.
Publicado: (2025)
por: Jünger, Daniel, et al.
Publicado: (2025)
Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
por: Nichols, Daniel, et al.
Publicado: (2025)
por: Nichols, Daniel, et al.
Publicado: (2025)
NCCLZ: Compression-Enabled GPU Collectives with Decoupled Quantization and Entropy Coding
por: Wang, Jiamin, et al.
Publicado: (2026)
por: Wang, Jiamin, et al.
Publicado: (2026)
DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference
por: Lin, Shouxu, et al.
Publicado: (2026)
por: Lin, Shouxu, et al.
Publicado: (2026)
A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
por: He, Jinghai, et al.
Publicado: (2024)
por: He, Jinghai, et al.
Publicado: (2024)
Zorse: Optimizing LLM Training Efficiency on Heterogeneous GPU Clusters
por: Guo, Runsheng Benson, et al.
Publicado: (2025)
por: Guo, Runsheng Benson, et al.
Publicado: (2025)
PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization
por: Lei, Kelun, et al.
Publicado: (2025)
por: Lei, Kelun, et al.
Publicado: (2025)
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
por: Wu, Wenqing
Publicado: (2023)
por: Wu, Wenqing
Publicado: (2023)
A Multi-Objective Framework for Optimizing GPU-Enabled VM Placement in Cloud Data Centers with Multi-Instance GPU Technology
por: Siavashi, Ahmad, et al.
Publicado: (2025)
por: Siavashi, Ahmad, et al.
Publicado: (2025)
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
por: Maurya, Avinash, et al.
Publicado: (2024)
por: Maurya, Avinash, et al.
Publicado: (2024)
From Sequential to Parallel: Reformulating Dynamic Programming as GPU Kernels for Large-Scale Stochastic Combinatorial Optimization
por: Zhao, Jingyi, et al.
Publicado: (2026)
por: Zhao, Jingyi, et al.
Publicado: (2026)
Accelerating the Particle-In-Cell code ECsim with OpenACC
por: Boella, Elisabetta, et al.
Publicado: (2026)
por: Boella, Elisabetta, et al.
Publicado: (2026)
FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing
por: Wu, Hao, et al.
Publicado: (2024)
por: Wu, Hao, et al.
Publicado: (2024)
Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration
por: Li, Zhonggen, et al.
Publicado: (2025)
por: Li, Zhonggen, et al.
Publicado: (2025)
KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
por: Wiedemann, Nina, et al.
Publicado: (2026)
por: Wiedemann, Nina, et al.
Publicado: (2026)
GPU Sharing with Triples Mode
por: Byun, Chansup, et al.
Publicado: (2024)
por: Byun, Chansup, et al.
Publicado: (2024)
Optimizing Allreduce Operations for Modern Heterogeneous Architectures with Multiple Processes per GPU
por: Adams, Michael, et al.
Publicado: (2025)
por: Adams, Michael, et al.
Publicado: (2025)
Heimdall++: Optimizing GPU Utilization and Pipeline Parallelism for Efficient Single-Pulse Detection
por: Xia, Bingzheng, et al.
Publicado: (2025)
por: Xia, Bingzheng, et al.
Publicado: (2025)
SIMT/GPU Data Race Verification using ISCC and Intermediary Code Representations: A Case Study
por: Osterhout, Andrew, et al.
Publicado: (2025)
por: Osterhout, Andrew, et al.
Publicado: (2025)
ParallelKittens: Systematic and Practical Simplification of Multi-GPU AI Kernels
por: Sul, Stuart H., et al.
Publicado: (2025)
por: Sul, Stuart H., et al.
Publicado: (2025)
Optimizing the Variant Calling Pipeline Execution on Human Genomes Using GPU-Enabled Machines
por: Kumar, Ajay, et al.
Publicado: (2025)
por: Kumar, Ajay, et al.
Publicado: (2025)
FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing
por: Agarwal, Aarush, et al.
Publicado: (2025)
por: Agarwal, Aarush, et al.
Publicado: (2025)
MERBIT: A GPU-Based SpMV Method for Iterative Workloads
por: Zhang, Qi, et al.
Publicado: (2026)
por: Zhang, Qi, et al.
Publicado: (2026)
GPU-Based Parallel Computing Methods for Medical Photoacoustic Image Reconstruction
por: Yi, Xinyao, et al.
Publicado: (2024)
por: Yi, Xinyao, et al.
Publicado: (2024)
A GPU Accelerated Temporal Window-Based Random Walk Sampler
por: Salehin, Md Ashfaq, et al.
Publicado: (2026)
por: Salehin, Md Ashfaq, et al.
Publicado: (2026)
Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration
por: Wang, Tianyu, et al.
Publicado: (2024)
por: Wang, Tianyu, et al.
Publicado: (2024)
Resource Optimization with MPI Process Malleability for Dynamic Workloads in HPC Clusters
por: Iserte, Sergio, et al.
Publicado: (2025)
por: Iserte, Sergio, et al.
Publicado: (2025)
Ejemplares similares
-
The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries
por: Amoros, Oscar, et al.
Publicado: (2025) -
Syncopate: Efficient Multi-GPU AI Kernels via Automatic Chunk-Centric Compute-Communication Overlap
por: Qiang, Xinwei, et al.
Publicado: (2026) -
QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation
por: Zhu, Xinguo, et al.
Publicado: (2025) -
ZEUS: An Efficient GPU Optimization Method Integrating PSO, BFGS, and Automatic Differentiation
por: Soos, Dominik, et al.
Publicado: (2026) -
GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography
por: Liberati, Tiziana, et al.
Publicado: (2026)