:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Singh, Abhinav, Kraatz, Landfried, Yaskovets, Serhii, Incardona, Pietro, Sbalzarini, Ivo F.
Format:	Preprint
Published:	2023
Subjects:	Mathematical Software Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2309.05331
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Proven Distributed Memory Parallelization of Particle Methods
by: Pahlke, Johannes, et al.
Published: (2024)

Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
by: Nichols, Daniel, et al.
Published: (2025)

Investigating Matrix Repartitioning to Address the Over- and Undersubscription Challenge for a GPU-based CFD Solver
by: Olenik, Gregor, et al.
Published: (2025)

GoldbachGPU: An Open Source GPU-Accelerated Framework for Verification of Goldbach's Conjecture
by: Llorente-Saguer, Isaac
Published: (2026)

Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)

Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM
by: Carrica, Vicki, et al.
Published: (2025)

Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
by: Li, Yifan, et al.
Published: (2026)

Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable Frameworks
by: Villalobos, Johansell, et al.
Published: (2025)

Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnects
by: Bellavita, Julian, et al.
Published: (2026)

HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages
by: Chaturvedi, Aman, et al.
Published: (2024)

Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
by: Ringoot, Evelyne, et al.
Published: (2025)

Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators
by: Fridman, Yehonatan, et al.
Published: (2024)

Xabclib:A Fully Auto-tuned Sparse Iterative Solver
by: Katagiri, Takahiro, et al.
Published: (2024)

Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions
by: Ataie, Ehsan, et al.
Published: (2026)

TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes
by: Zhao, Xingzhong, et al.
Published: (2026)

On the energy efficiency of sparse matrix computations on multi-GPU clusters
by: Bernaschi, Massimo, et al.
Published: (2025)

LLOR: Automated Repair of OpenMP Programs
by: Bora, Utpal, et al.
Published: (2024)

Multi-GPU Acceleration of PALABOS Fluid Solver using C++ Standard Parallelism
by: Latt, Jonas, et al.
Published: (2025)

Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs
by: Ringoot, Evelyne, et al.
Published: (2025)

Model-guided Fuzzing of Distributed Systems
by: Gulcan, Ege Berkay, et al.
Published: (2024)

CARISMA: CAR-Integrated Service Mesh Architecture
by: Klein, Kevin, et al.
Published: (2024)

Addressing Reproducibility Challenges in HPC with Continuous Integration
by: Hayot-Sasson, Valérie, et al.
Published: (2025)

GPU Implementations for Midsize Integer Addition and Multiplication
by: Oancea, Cosmin E., et al.
Published: (2024)

Parallel Sparse and Data-Sparse Factorization-based Linear Solvers
by: Li, Xiaoye Sherry, et al.
Published: (2026)

Efficiently Reproducing Distributed Workflows in Notebook-based Systems
by: Azaz, Talha, et al.
Published: (2026)

TraceMesh: Scalable and Streaming Sampling for Distributed Traces
by: Chen, Zhuangbin, et al.
Published: (2024)

Multi-Grained Specifications for Distributed System Model Checking and Verification
by: Ouyang, Lingzhi, et al.
Published: (2024)

Configurable Runtime Orchestration for Dynamic Data Retrieval in Distributed Systems
by: Kandiraju, Abhiram
Published: (2026)

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era
by: Zhang, Lei, et al.
Published: (2026)

A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC
by: Sandås, Petter, et al.
Published: (2026)

LLM-HPC++: Evaluating LLM-Generated Modern C++ and MPI+OpenMP Codes for Scalable Mandelbrot Set Computation
by: Diehl, Patrick, et al.
Published: (2025)

SPUMA: a minimally invasive approach to the GPU porting of OPENFOAM
by: Bnà, Simone, et al.
Published: (2025)

A Lightweight Hybrid Publish/Subscribe Event Fabric for IPC and Modular Distributed Systems
by: Gkoulis, Dimitris
Published: (2026)

GPU Accelerated Newton for Taylor Series Solutions of Polynomial Homotopies in Multiple Double Precision
by: Verschelde, Jan
Published: (2023)

ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks
by: Henning, Sören, et al.
Published: (2024)

SGPRS: Seamless GPU Partitioning Real-Time Scheduler for Periodic Deep Learning Workloads
by: Babaei, Amir Fakhim, et al.
Published: (2024)

Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores
by: Tu, Jiqun, et al.
Published: (2026)

GPU-Accelerated Distributed QAOA on Large-scale HPC Ecosystems
by: Xu, Zhihao, et al.
Published: (2025)

SOLANET: Distributed Neighbor Graph Construction on GPU-Accelerated Systems
by: Iwabuchi, Keita, et al.
Published: (2026)

Performance-Aligned LLMs for Generating Fast Code
by: Nichols, Daniel, et al.
Published: (2024)