:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sai, Ryuichi, Hamon, Francois P., Mellor-Crummey, John, Araya-Polo, Mauricio
Format:	Preprint
Published:	2024
Subjects:	Mathematical Software Computational Physics
Online Access:	https://arxiv.org/abs/2408.03452
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Portable Framework for Accelerating Stencil Computations on Modern Node Architectures
by: Sai, Ryuichi, et al.
Published: (2023)

Giga-scale Kernel Matrix Vector Multiplication on GPU
by: Hu, Robert, et al.
Published: (2022)

The Software Landscape for the Density Matrix Renormalization Group
by: Sehlstedt, Per, et al.
Published: (2025)

Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
by: Cao, Zijian, et al.
Published: (2025)

LEO: Tracing GPU Stall Root Causes via Cross-Vendor Backward Slicing
by: Xia, Yuning, et al.
Published: (2026)

Recent Extensions of the ZKCM Library for Parallel and Accurate MPS Simulation of Quantum Circuits
by: SaiToh, Akira
Published: (2024)

Rapid Variable Resolution Particle Initialization for Complex Geometries
by: Villodi, Navaneet, et al.
Published: (2025)

Methods for Few-View CT Image Reconstruction
by: Champley, Kyle M., et al.
Published: (2024)

trainsum -- A Python package for quantics tensor trains
by: Haubenwallner, Paul, et al.
Published: (2026)

Implementation of McMurchie-Davidson algorithm for Gaussian AO integrals suited for SIMD processors
by: Asadchev, Andrey, et al.
Published: (2025)

Memory-Efficient Recursive Evaluation of 3-Center Gaussian Integrals
by: Asadchev, Andrey, et al.
Published: (2022)

Welding R and C++: A Tale of Two Programming Languages
by: Sepulveda, Mauricio Vargas
Published: (2024)

Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
by: Li, Yifan, et al.
Published: (2026)

LeanBET: Formally-verified surface area calculations in Lean
by: Ugwuanyi, Ejike D., et al.
Published: (2026)

A Performance Portable Matrix Free Dense MTTKRP in GenTen
by: Kosmacher, Gabriel, et al.
Published: (2025)

OpenACC offloading of the MFC compressible multiphase flow solver on AMD and NVIDIA GPUs
by: Wilfong, Benjamin, et al.
Published: (2024)

GenML: A Python Library to Generate the Mittag-Leffler Correlated Noise
by: Qu, Xiang, et al.
Published: (2024)

Hyper-reduction methods for accelerating nonlinear finite element simulations: open source implementation and reproducible benchmarks
by: Larsson, Axel, et al.
Published: (2026)

Multi-GPU fast Fourier transforms in MATLAB (for large-scale phase-field crystal simulations)
by: Punke, Maik, et al.
Published: (2026)

f4ncgb: High Performance Gröbner Basis Computations in Free Algebras
by: Heisinger, Maximilian, et al.
Published: (2025)

KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation
by: Batley, Reza T., et al.
Published: (2025)

Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
by: Ringoot, Evelyne, et al.
Published: (2025)

A Constraint-based Mathematical Modeling Library in Prolog with Answer Constraint Semantics
by: Fages, François
Published: (2024)

Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations
by: Ham, David A., et al.
Published: (2024)

Accelerating High-Order Finite Element Simulations at Extreme Scale with FP64 Tensor Cores
by: Tu, Jiqun, et al.
Published: (2026)

FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication
by: Zhu, Honglin, et al.
Published: (2026)

Deriving Algorithms for Triangular Tridiagonalization a Skew-Symmetric Matrix
by: van de Geijn, Robert, et al.
Published: (2023)

Sphractal: Estimating the Fractal Dimension of Surfaces Computed from Precise Atomic Coordinates via Box-Counting Algorithm
by: Ting, Jonathan Yik Chang, et al.
Published: (2024)

SeQuant Framework for Symbolic and Numerical Tensor Algebra. I. Core Capabilities
by: Gaudel, Bimal, et al.
Published: (2025)

Performant Automatic BLAS Offloading on Unified Memory Architecture with OpenMP First-Touch Style Data Movement
by: Li, Junjie
Published: (2024)

GeoWarp: An automatically differentiable and GPU-accelerated implicit MPM framework for geomechanics based on NVIDIA Warp
by: Zhao, Yidong, et al.
Published: (2025)

Large-Scale Simulations of Turbulent Flows using Lattice Boltzmann Methods on Heterogeneous High Performance Computers
by: Kummerländer, Adrian, et al.
Published: (2025)

Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package
by: Rogowski, Marcin, et al.
Published: (2023)

A Practical GPU-Enhanced Matrix-Free Primal-Dual Method for Large-Scale Conic Programs
by: Lin, Zhenwei, et al.
Published: (2025)

Odd but Error-Free FastTwoSum: More General Conditions for FastTwoSum as an Error-Free Transformation for Faithful Rounding Modes
by: Park, Sehyeok, et al.
Published: (2026)

Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)

Maestro: Intelligent Execution for Quantum Circuit Simulation
by: Bertomeu, Oriol, et al.
Published: (2025)

Hiperwalk: Simulation of Quantum Walks with Heterogeneous High-Performance Computing
by: Motta, Paulo, et al.
Published: (2024)

Raising the Performance of the Tinker-HP Molecular Modeling Package [Article v1.0]
by: Jolly, Luc-Henri, et al.
Published: (2019)

Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems
by: Jin, David, et al.
Published: (2025)