:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pacheco, Daniel, Sousa, Leonel, Ilic, Aleksandar
Format:	Preprint
Published:	2026
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2605.29728
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2024)

AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)

CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis
by: Morgado, José, et al.
Published: (2026)

TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs
by: Graça, Miguel, et al.
Published: (2026)

Predictive Performance of Photonic SRAM-based In-Memory Computing for Tensor Decomposition
by: Wijeratne, Sasindu, et al.
Published: (2025)

Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation
by: Laukemann, Jan, et al.
Published: (2024)

cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
by: Li, Shiju, et al.
Published: (2025)

Federated Learning Using Coupled Tensor Train Decomposition
by: Zhang, Xiangtao, et al.
Published: (2024)

GPU Accelerated Sparse Cholesky Factorization
by: Karsavuran, M. Ozan, et al.
Published: (2024)

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
by: Liu, Zhibang, et al.
Published: (2025)

Run-time application migration using checkpoint/restore in userspace
by: Tošić, Aleksandar
Published: (2023)

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026)

Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
by: Wang, Zhigang, et al.
Published: (2024)

Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores
by: Schieffer, Gabin, et al.
Published: (2024)

Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)

Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
by: Ranawaka, Isuru, et al.
Published: (2024)

SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
by: GU, Qiqi, et al.
Published: (2025)

ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition
by: Helal, Ahmed E., et al.
Published: (2025)

Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition
by: Wang, Hansheng, et al.
Published: (2024)

PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)

Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)

Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)

PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)

Can Tensor Cores Benefit Memory-Bound Kernels? (No!)
by: Zhang, Lingqi, et al.
Published: (2025)

PilotANN: Memory-Bounded GPU Acceleration for Vector Search
by: Gui, Yuntao, et al.
Published: (2025)

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
by: Bellavita, Julian, et al.
Published: (2025)

MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
by: Wolfson-Pou, Jordi, et al.
Published: (2025)

PULSE: Accelerating Distributed Pointer-Traversals on Disaggregated Memory (Extended Version)
by: Tang, Yupeng, et al.
Published: (2023)

Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference
by: He, Zifan, et al.
Published: (2026)

DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory System
by: Klinger, Niklas, et al.
Published: (2026)

AQUA: Network-Accelerated Memory Offloading for LLMs in Scale-Up GPU Domains
by: Kumar, Abhishek Vijaya, et al.
Published: (2024)

LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)

HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
by: Li, Zhonggen, et al.
Published: (2024)

Fused3S: Fast Sparse Attention on Tensor Cores
by: Li, Zitong, et al.
Published: (2025)

GPU-Accelerated Modified Bessel Function of the Second Kind for Gaussian Processes
by: Geng, Zipei, et al.
Published: (2025)

SkimROOT: Accelerating LHC Data Filtering with Near-Storage Processing
by: Batsoyol, Narangerelt, et al.
Published: (2025)

Logarithmic-Time Geodesically Convex Decomposition in Programmable Matter
by: Hillebrandt, Henning, et al.
Published: (2026)

Justin: Hybrid CPU/Memory Elastic Scaling for Distributed Stream Processing
by: Schmitz, Donatien, et al.
Published: (2025)