Saved in:
| Main Authors: | Kang, Ji-Hoon, Ryu, Hoon |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.13615 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hybrid quantum programming with PennyLane Lightning on HPC platforms
by: Asadi, Ali, et al.
Published: (2024)
by: Asadi, Ali, et al.
Published: (2024)
Automated MPI-X code generation for scalable finite-difference solvers
by: Bisbas, George, et al.
Published: (2023)
by: Bisbas, George, et al.
Published: (2023)
Performance measurements of modern Fortran MPI applications with Score-P
by: Corbin, Gregor
Published: (2025)
by: Corbin, Gregor
Published: (2025)
Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0
by: Derlatka, Kacper, et al.
Published: (2024)
by: Derlatka, Kacper, et al.
Published: (2024)
On the energy efficiency of sparse matrix computations on multi-GPU clusters
by: Bernaschi, Massimo, et al.
Published: (2025)
by: Bernaschi, Massimo, et al.
Published: (2025)
SYCL compute kernels for ExaHyPE
by: Loi, Chung Ming, et al.
Published: (2023)
by: Loi, Chung Ming, et al.
Published: (2023)
A shared compilation stack for distributed-memory parallelism in stencil DSLs
by: Bisbas, George, et al.
Published: (2024)
by: Bisbas, George, et al.
Published: (2024)
Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package
by: Rogowski, Marcin, et al.
Published: (2023)
by: Rogowski, Marcin, et al.
Published: (2023)
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
by: Zhu, Honglin, et al.
Published: (2026)
by: Zhu, Honglin, et al.
Published: (2026)
Integrating Odeint Time Stepping into OpenFPM for Distributed and GPU Accelerated Numerical Solvers
by: Singh, Abhinav, et al.
Published: (2023)
by: Singh, Abhinav, et al.
Published: (2023)
Enabling mixed-precision in spectral element codes
by: Chen, Yanxiang, et al.
Published: (2025)
by: Chen, Yanxiang, et al.
Published: (2025)
Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
High-Performance Star-M SVD for Big Data Compression
by: Hussain, Md Taufique, et al.
Published: (2026)
by: Hussain, Md Taufique, et al.
Published: (2026)
Robustness and Accuracy in Pipelined Bi-Conjugate Gradient Stabilized Method: A Comparative Study
by: Havdiak, Mykhailo, et al.
Published: (2024)
by: Havdiak, Mykhailo, et al.
Published: (2024)
Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)
by: Wang, Hansheng, et al.
Published: (2025)
Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM
by: Carrica, Vicki, et al.
Published: (2025)
by: Carrica, Vicki, et al.
Published: (2025)
Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
by: Ringoot, Evelyne, et al.
Published: (2025)
by: Ringoot, Evelyne, et al.
Published: (2025)
Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable Frameworks
by: Villalobos, Johansell, et al.
Published: (2025)
by: Villalobos, Johansell, et al.
Published: (2025)
Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs
by: Ringoot, Evelyne, et al.
Published: (2025)
by: Ringoot, Evelyne, et al.
Published: (2025)
Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnects
by: Bellavita, Julian, et al.
Published: (2026)
by: Bellavita, Julian, et al.
Published: (2026)
Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations
by: Ham, David A., et al.
Published: (2024)
by: Ham, David A., et al.
Published: (2024)
On the Challenges of Energy-Efficiency Analysis in HPC Systems: Evaluating Synthetic Benchmarks and Gromacs
by: Machado, Rafael Ravedutti Lucio, et al.
Published: (2025)
by: Machado, Rafael Ravedutti Lucio, et al.
Published: (2025)
A new open source framework for multiscale modeling of fibrous materials on heterogeneous supercomputers
by: Merson, Jacob, et al.
Published: (2023)
by: Merson, Jacob, et al.
Published: (2023)
Enabling mixed-precision with the help of tools: A Nekbone case study
by: Chen, Yanxiang, et al.
Published: (2024)
by: Chen, Yanxiang, et al.
Published: (2024)
TTK is Getting MPI-Ready
by: Guillou, Eve Le, et al.
Published: (2023)
by: Guillou, Eve Le, et al.
Published: (2023)
Parallel Sparse and Data-Sparse Factorization-based Linear Solvers
by: Li, Xiaoye Sherry, et al.
Published: (2026)
by: Li, Xiaoye Sherry, et al.
Published: (2026)
Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation
by: Stocks, Mark
Published: (2023)
by: Stocks, Mark
Published: (2023)
Xabclib:A Fully Auto-tuned Sparse Iterative Solver
by: Katagiri, Takahiro, et al.
Published: (2024)
by: Katagiri, Takahiro, et al.
Published: (2024)
NApy: Efficient Statistics in Python for Large-Scale Heterogeneous Data with Enhanced Support for Missing Data
by: Woller, Fabian, et al.
Published: (2025)
by: Woller, Fabian, et al.
Published: (2025)
A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices
by: Katagiri, Takahiro, et al.
Published: (2024)
by: Katagiri, Takahiro, et al.
Published: (2024)
Beating vDSP: A 138 GFLOPS Radix-8 Stockham FFT on Apple Silicon via Two-Tier Register-Threadgroup Memory Decomposition
by: Bergach, Mohamed Amine
Published: (2026)
by: Bergach, Mohamed Amine
Published: (2026)
Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques
by: Panova, Elena, et al.
Published: (2022)
by: Panova, Elena, et al.
Published: (2022)
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores
by: Tu, Jiqun, et al.
Published: (2026)
by: Tu, Jiqun, et al.
Published: (2026)
MPI Implementation Profiling for Better Application Performance
by: Shipley, Riley, et al.
Published: (2024)
by: Shipley, Riley, et al.
Published: (2024)
MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR
by: Karchi, Jad El, et al.
Published: (2024)
by: Karchi, Jad El, et al.
Published: (2024)
pyGinkgo: A Sparse Linear Algebra Operator Framework for Python
by: Tuteja, Keshvi, et al.
Published: (2025)
by: Tuteja, Keshvi, et al.
Published: (2025)
Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement
by: Li, Junjie
Published: (2024)
by: Li, Junjie
Published: (2024)
LLM-HPC++: Evaluating LLM-Generated Modern C++ and MPI+OpenMP Codes for Scalable Mandelbrot Set Computation
by: Diehl, Patrick, et al.
Published: (2025)
by: Diehl, Patrick, et al.
Published: (2025)
Performance Evaluation of General Purpose Large Language Models for Basic Linear Algebra Subprograms Code Generation
by: Mukunoki, Daichi, et al.
Published: (2025)
by: Mukunoki, Daichi, et al.
Published: (2025)
GPU Implementations for Midsize Integer Addition and Multiplication
by: Oancea, Cosmin E., et al.
Published: (2024)
by: Oancea, Cosmin E., et al.
Published: (2024)
Similar Items
-
Hybrid quantum programming with PennyLane Lightning on HPC platforms
by: Asadi, Ali, et al.
Published: (2024) -
Automated MPI-X code generation for scalable finite-difference solvers
by: Bisbas, George, et al.
Published: (2023) -
Performance measurements of modern Fortran MPI applications with Score-P
by: Corbin, Gregor
Published: (2025) -
Enabling MPI communication within Numba/LLVM JIT-compiled Python code using numba-mpi v1.0
by: Derlatka, Kacper, et al.
Published: (2024) -
On the energy efficiency of sparse matrix computations on multi-GPU clusters
by: Bernaschi, Massimo, et al.
Published: (2025)