Saved in:
| Main Authors: | Sai, Ryuichi, Hamon, Francois P., Mellor-Crummey, John, Araya-Polo, Mauricio |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.03452 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures
by: Sai, Ryuichi, et al.
Published: (2023)
by: Sai, Ryuichi, et al.
Published: (2023)
Giga-scale Kernel Matrix Vector Multiplication on GPU
by: Hu, Robert, et al.
Published: (2022)
by: Hu, Robert, et al.
Published: (2022)
The Software Landscape for the Density Matrix Renormalization Group
by: Sehlstedt, Per, et al.
Published: (2025)
by: Sehlstedt, Per, et al.
Published: (2025)
Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
by: Cao, Zijian, et al.
Published: (2025)
by: Cao, Zijian, et al.
Published: (2025)
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
by: Xia, Yuning, et al.
Published: (2026)
by: Xia, Yuning, et al.
Published: (2026)
Recent Extensions of the ZKCM Library for Parallel and Accurate MPS Simulation of Quantum Circuits
by: SaiToh, Akira
Published: (2024)
by: SaiToh, Akira
Published: (2024)
Rapid Variable Resolution Particle Initialization for Complex Geometries
by: Villodi, Navaneet, et al.
Published: (2025)
by: Villodi, Navaneet, et al.
Published: (2025)
Methods for Few-View CT Image Reconstruction
by: Champley, Kyle M., et al.
Published: (2024)
by: Champley, Kyle M., et al.
Published: (2024)
trainsum -- A Python package for quantics tensor trains
by: Haubenwallner, Paul, et al.
Published: (2026)
by: Haubenwallner, Paul, et al.
Published: (2026)
Implementation of McMurchie-Davidson algorithm for Gaussian AO integrals suited for SIMD processors
by: Asadchev, Andrey, et al.
Published: (2025)
by: Asadchev, Andrey, et al.
Published: (2025)
Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals
by: Asadchev, Andrey, et al.
Published: (2022)
by: Asadchev, Andrey, et al.
Published: (2022)
Welding R and C++: A Tale of Two Programming Languages
by: Sepulveda, Mauricio Vargas
Published: (2024)
by: Sepulveda, Mauricio Vargas
Published: (2024)
Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
by: Li, Yifan, et al.
Published: (2026)
by: Li, Yifan, et al.
Published: (2026)
LeanBET: Formally-verified surface area calculations in Lean
by: Ugwuanyi, Ejike D., et al.
Published: (2026)
by: Ugwuanyi, Ejike D., et al.
Published: (2026)
A Performance Portable Matrix Free Dense MTTKRP in GenTen
by: Kosmacher, Gabriel, et al.
Published: (2025)
by: Kosmacher, Gabriel, et al.
Published: (2025)
OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUs
by: Wilfong, Benjamin, et al.
Published: (2024)
by: Wilfong, Benjamin, et al.
Published: (2024)
GenML: A Python Library to Generate the Mittag-Leffler Correlated Noise
by: Qu, Xiang, et al.
Published: (2024)
by: Qu, Xiang, et al.
Published: (2024)
Hyper-reduction methods for accelerating nonlinear finite element simulations: open source implementation and reproducible benchmarks
by: Larsson, Axel, et al.
Published: (2026)
by: Larsson, Axel, et al.
Published: (2026)
Multi-GPU fast Fourier transforms in MATLAB (for large-scale phase-field crystal simulations)
by: Punke, Maik, et al.
Published: (2026)
by: Punke, Maik, et al.
Published: (2026)
f4ncgb: High Performance Gröbner Basis Computations in Free Algebras
by: Heisinger, Maximilian, et al.
Published: (2025)
by: Heisinger, Maximilian, et al.
Published: (2025)
KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation
by: Batley, Reza T., et al.
Published: (2025)
by: Batley, Reza T., et al.
Published: (2025)
Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
by: Ringoot, Evelyne, et al.
Published: (2025)
by: Ringoot, Evelyne, et al.
Published: (2025)
A Constraint-based Mathematical Modeling Library in Prolog with Answer Constraint Semantics
by: Fages, François
Published: (2024)
by: Fages, François
Published: (2024)
Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations
by: Ham, David A., et al.
Published: (2024)
by: Ham, David A., et al.
Published: (2024)
Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores
by: Tu, Jiqun, et al.
Published: (2026)
by: Tu, Jiqun, et al.
Published: (2026)
FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
by: Zhu, Honglin, et al.
Published: (2026)
by: Zhu, Honglin, et al.
Published: (2026)
Deriving Algorithms for Triangular Tridiagonalization a Skew-Symmetric Matrix
by: van de Geijn, Robert, et al.
Published: (2023)
by: van de Geijn, Robert, et al.
Published: (2023)
Sphractal: Estimating the Fractal Dimension of Surfaces Computed from Precise Atomic Coordinates via Box-Counting Algorithm
by: Ting, Jonathan Yik Chang, et al.
Published: (2024)
by: Ting, Jonathan Yik Chang, et al.
Published: (2024)
SeQuant Framework for Symbolic and Numerical Tensor Algebra. I. Core Capabilities
by: Gaudel, Bimal, et al.
Published: (2025)
by: Gaudel, Bimal, et al.
Published: (2025)
Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement
by: Li, Junjie
Published: (2024)
by: Li, Junjie
Published: (2024)
GeoWarp: An automatically differentiable and GPU-accelerated implicit MPM framework for geomechanics based on NVIDIA Warp
by: Zhao, Yidong, et al.
Published: (2025)
by: Zhao, Yidong, et al.
Published: (2025)
Large-Scale Simulations of Turbulent Flows using Lattice Boltzmann Methods on Heterogeneous High Performance Computers
by: Kummerländer, Adrian, et al.
Published: (2025)
by: Kummerländer, Adrian, et al.
Published: (2025)
Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package
by: Rogowski, Marcin, et al.
Published: (2023)
by: Rogowski, Marcin, et al.
Published: (2023)
A Practical GPU-Enhanced Matrix-Free Primal-Dual Method for Large-Scale Conic Programs
by: Lin, Zhenwei, et al.
Published: (2025)
by: Lin, Zhenwei, et al.
Published: (2025)
Odd but Error-Free FastTwoSum: More General Conditions for FastTwoSum as an Error-Free Transformation for Faithful Rounding Modes
by: Park, Sehyeok, et al.
Published: (2026)
by: Park, Sehyeok, et al.
Published: (2026)
Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)
by: Wang, Hansheng, et al.
Published: (2025)
Maestro: Intelligent Execution for Quantum Circuit Simulation
by: Bertomeu, Oriol, et al.
Published: (2025)
by: Bertomeu, Oriol, et al.
Published: (2025)
Hiperwalk: Simulation of Quantum Walks with Heterogeneous High-Performance Computing
by: Motta, Paulo, et al.
Published: (2024)
by: Motta, Paulo, et al.
Published: (2024)
Raising the Performance of the Tinker-HP Molecular Modeling Package [Article v1.0]
by: Jolly, Luc-Henri, et al.
Published: (2019)
by: Jolly, Luc-Henri, et al.
Published: (2019)
Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems
by: Jin, David, et al.
Published: (2025)
by: Jin, David, et al.
Published: (2025)
Similar Items
-
A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures
by: Sai, Ryuichi, et al.
Published: (2023) -
Giga-scale Kernel Matrix Vector Multiplication on GPU
by: Hu, Robert, et al.
Published: (2022) -
The Software Landscape for the Density Matrix Renormalization Group
by: Sehlstedt, Per, et al.
Published: (2025) -
Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
by: Cao, Zijian, et al.
Published: (2025) -
LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
by: Xia, Yuning, et al.
Published: (2026)