Saved in:
| Main Authors: | Singh, Abhinav, Kraatz, Landfried, Yaskovets, Serhii, Incardona, Pietro, Sbalzarini, Ivo F. |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.05331 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Proven Distributed Memory Parallelization of Particle Methods
by: Pahlke, Johannes, et al.
Published: (2024)
by: Pahlke, Johannes, et al.
Published: (2024)
Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
by: Nichols, Daniel, et al.
Published: (2025)
by: Nichols, Daniel, et al.
Published: (2025)
Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver
by: Olenik, Gregor, et al.
Published: (2025)
by: Olenik, Gregor, et al.
Published: (2025)
GoldbachGPU: An Open Source GPU-Accelerated Framework for Verification of Goldbach's Conjecture
by: Llorente-Saguer, Isaac
Published: (2026)
by: Llorente-Saguer, Isaac
Published: (2026)
Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)
by: Wang, Hansheng, et al.
Published: (2025)
Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM
by: Carrica, Vicki, et al.
Published: (2025)
by: Carrica, Vicki, et al.
Published: (2025)
Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable Frameworks
by: Villalobos, Johansell, et al.
Published: (2025)
by: Villalobos, Johansell, et al.
Published: (2025)
Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnects
by: Bellavita, Julian, et al.
Published: (2026)
by: Bellavita, Julian, et al.
Published: (2026)
HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages
by: Chaturvedi, Aman, et al.
Published: (2024)
by: Chaturvedi, Aman, et al.
Published: (2024)
Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
by: Ringoot, Evelyne, et al.
Published: (2025)
by: Ringoot, Evelyne, et al.
Published: (2025)
Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators
by: Fridman, Yehonatan, et al.
Published: (2024)
by: Fridman, Yehonatan, et al.
Published: (2024)
Xabclib:A Fully Auto-tuned Sparse Iterative Solver
by: Katagiri, Takahiro, et al.
Published: (2024)
by: Katagiri, Takahiro, et al.
Published: (2024)
Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions
by: Ataie, Ehsan, et al.
Published: (2026)
by: Ataie, Ehsan, et al.
Published: (2026)
TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes
by: Zhao, Xingzhong, et al.
Published: (2026)
by: Zhao, Xingzhong, et al.
Published: (2026)
On the energy efficiency of sparse matrix computations on multi-GPU clusters
by: Bernaschi, Massimo, et al.
Published: (2025)
by: Bernaschi, Massimo, et al.
Published: (2025)
LLOR: Automated Repair of OpenMP Programs
by: Bora, Utpal, et al.
Published: (2024)
by: Bora, Utpal, et al.
Published: (2024)
Multi-GPU Acceleration of PALABOS Fluid Solver using C++ Standard Parallelism
by: Latt, Jonas, et al.
Published: (2025)
by: Latt, Jonas, et al.
Published: (2025)
Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs
by: Ringoot, Evelyne, et al.
Published: (2025)
by: Ringoot, Evelyne, et al.
Published: (2025)
Model-guided Fuzzing of Distributed Systems
by: Gulcan, Ege Berkay, et al.
Published: (2024)
by: Gulcan, Ege Berkay, et al.
Published: (2024)
CARISMA: CAR-Integrated Service Mesh Architecture
by: Klein, Kevin, et al.
Published: (2024)
by: Klein, Kevin, et al.
Published: (2024)
Addressing Reproducibility Challenges in HPC with Continuous Integration
by: Hayot-Sasson, Valérie, et al.
Published: (2025)
by: Hayot-Sasson, Valérie, et al.
Published: (2025)
GPU Implementations for Midsize Integer Addition and Multiplication
by: Oancea, Cosmin E., et al.
Published: (2024)
by: Oancea, Cosmin E., et al.
Published: (2024)
Parallel Sparse and Data-Sparse Factorization-based Linear Solvers
by: Li, Xiaoye Sherry, et al.
Published: (2026)
by: Li, Xiaoye Sherry, et al.
Published: (2026)
Efficiently Reproducing Distributed Workflows in Notebook-based Systems
by: Azaz, Talha, et al.
Published: (2026)
by: Azaz, Talha, et al.
Published: (2026)
TraceMesh: Scalable and Streaming Sampling for Distributed Traces
by: Chen, Zhuangbin, et al.
Published: (2024)
by: Chen, Zhuangbin, et al.
Published: (2024)
Multi-Grained Specifications for Distributed System Model Checking and Verification
by: Ouyang, Lingzhi, et al.
Published: (2024)
by: Ouyang, Lingzhi, et al.
Published: (2024)
Configurable Runtime Orchestration for Dynamic Data Retrieval in Distributed Systems
by: Kandiraju, Abhiram
Published: (2026)
by: Kandiraju, Abhiram
Published: (2026)
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era
by: Zhang, Lei, et al.
Published: (2026)
by: Zhang, Lei, et al.
Published: (2026)
A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC
by: Sandås, Petter, et al.
Published: (2026)
by: Sandås, Petter, et al.
Published: (2026)
LLM-HPC++: Evaluating LLM-Generated Modern C++ and MPI+OpenMP Codes for Scalable Mandelbrot Set Computation
by: Diehl, Patrick, et al.
Published: (2025)
by: Diehl, Patrick, et al.
Published: (2025)
SPUMA: a minimally invasive approach to the GPU porting of OPENFOAM
by: Bnà, Simone, et al.
Published: (2025)
by: Bnà, Simone, et al.
Published: (2025)
A Lightweight Hybrid Publish/Subscribe Event Fabric for IPC and Modular Distributed Systems
by: Gkoulis, Dimitris
Published: (2026)
by: Gkoulis, Dimitris
Published: (2026)
GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision
by: Verschelde, Jan
Published: (2023)
by: Verschelde, Jan
Published: (2023)
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
by: Henning, Sören, et al.
Published: (2024)
by: Henning, Sören, et al.
Published: (2024)
SGPRS: Seamless GPU Partitioning Real-Time Scheduler for Periodic Deep Learning Workloads
by: Babaei, Amir Fakhim, et al.
Published: (2024)
by: Babaei, Amir Fakhim, et al.
Published: (2024)
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores
by: Tu, Jiqun, et al.
Published: (2026)
by: Tu, Jiqun, et al.
Published: (2026)
GPU-Accelerated Distributed QAOA on Large-scale HPC Ecosystems
by: Xu, Zhihao, et al.
Published: (2025)
by: Xu, Zhihao, et al.
Published: (2025)
SOLANET: Distributed Neighbor Graph Construction on GPU-Accelerated Systems
by: Iwabuchi, Keita, et al.
Published: (2026)
by: Iwabuchi, Keita, et al.
Published: (2026)
Performance-Aligned LLMs for Generating Fast Code
by: Nichols, Daniel, et al.
Published: (2024)
by: Nichols, Daniel, et al.
Published: (2024)
Similar Items
-
Proven Distributed Memory Parallelization of Particle Methods
by: Pahlke, Johannes, et al.
Published: (2024) -
Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
by: Nichols, Daniel, et al.
Published: (2025) -
Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver
by: Olenik, Gregor, et al.
Published: (2025) -
GoldbachGPU: An Open Source GPU-Accelerated Framework for Verification of Goldbach's Conjecture
by: Llorente-Saguer, Isaac
Published: (2026) -
Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)