Saved in:
| Main Authors: | Pacheco, Daniel, Sousa, Leonel, Ilic, Aleksandar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29728 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2024)
by: Wijeratne, Sasindu, et al.
Published: (2024)
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis
by: Morgado, José, et al.
Published: (2026)
by: Morgado, José, et al.
Published: (2026)
TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs
by: Graça, Miguel, et al.
Published: (2026)
by: Graça, Miguel, et al.
Published: (2026)
Predictive Performance of Photonic SRAM-based In-Memory Computing for Tensor Decomposition
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation
by: Laukemann, Jan, et al.
Published: (2024)
by: Laukemann, Jan, et al.
Published: (2024)
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)
by: Li, Zixuan, et al.
Published: (2024)
Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
by: Li, Shiju, et al.
Published: (2025)
by: Li, Shiju, et al.
Published: (2025)
Federated Learning Using Coupled Tensor Train Decomposition
by: Zhang, Xiangtao, et al.
Published: (2024)
by: Zhang, Xiangtao, et al.
Published: (2024)
GPU Accelerated Sparse Cholesky Factorization
by: Karsavuran, M. Ozan, et al.
Published: (2024)
by: Karsavuran, M. Ozan, et al.
Published: (2024)
Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning
by: Liu, Zhibang, et al.
Published: (2025)
by: Liu, Zhibang, et al.
Published: (2025)
Run-time application migration using checkpoint/restore in userspace
by: Tošić, Aleksandar
Published: (2023)
by: Tošić, Aleksandar
Published: (2023)
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026)
by: Liu, Jie, et al.
Published: (2026)
Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
by: Wang, Zhigang, et al.
Published: (2024)
by: Wang, Zhigang, et al.
Published: (2024)
Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores
by: Schieffer, Gabin, et al.
Published: (2024)
by: Schieffer, Gabin, et al.
Published: (2024)
Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
by: Ranawaka, Isuru, et al.
Published: (2024)
by: Ranawaka, Isuru, et al.
Published: (2024)
SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
by: GU, Qiqi, et al.
Published: (2025)
by: GU, Qiqi, et al.
Published: (2025)
ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition
by: Helal, Ahmed E., et al.
Published: (2025)
by: Helal, Ahmed E., et al.
Published: (2025)
Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition
by: Wang, Hansheng, et al.
Published: (2024)
by: Wang, Hansheng, et al.
Published: (2024)
PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)
by: Asquini, Lorenzo, et al.
Published: (2025)
Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
PRISM: Probabilistic Runtime Insights and Scalable Performance Modeling for Large-Scale Distributed Training
by: Golden, Alicia, et al.
Published: (2025)
by: Golden, Alicia, et al.
Published: (2025)
Can Tensor Cores Benefit Memory-Bound Kernels? (No!)
by: Zhang, Lingqi, et al.
Published: (2025)
by: Zhang, Lingqi, et al.
Published: (2025)
PilotANN: Memory-Bounded GPU Acceleration for Vector Search
by: Gui, Yuntao, et al.
Published: (2025)
by: Gui, Yuntao, et al.
Published: (2025)
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
by: Bellavita, Julian, et al.
Published: (2025)
by: Bellavita, Julian, et al.
Published: (2025)
MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
by: Wolfson-Pou, Jordi, et al.
Published: (2025)
by: Wolfson-Pou, Jordi, et al.
Published: (2025)
PULSE: Accelerating Distributed Pointer-Traversals on Disaggregated Memory (Extended Version)
by: Tang, Yupeng, et al.
Published: (2023)
by: Tang, Yupeng, et al.
Published: (2023)
Understand and Accelerate Memory Processing Pipeline for Large Language Model Inference
by: He, Zifan, et al.
Published: (2026)
by: He, Zifan, et al.
Published: (2026)
DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory System
by: Klinger, Niklas, et al.
Published: (2026)
by: Klinger, Niklas, et al.
Published: (2026)
AQUA: Network-Accelerated Memory Offloading for LLMs in Scale-Up GPU Domains
by: Kumar, Abhishek Vijaya, et al.
Published: (2024)
by: Kumar, Abhishek Vijaya, et al.
Published: (2024)
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)
by: Sun, Mingyu, et al.
Published: (2025)
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
by: Li, Zhonggen, et al.
Published: (2024)
by: Li, Zhonggen, et al.
Published: (2024)
Fused3S: Fast Sparse Attention on Tensor Cores
by: Li, Zitong, et al.
Published: (2025)
by: Li, Zitong, et al.
Published: (2025)
GPU-Accelerated Modified Bessel Function of the Second Kind for Gaussian Processes
by: Geng, Zipei, et al.
Published: (2025)
by: Geng, Zipei, et al.
Published: (2025)
SkimROOT: Accelerating LHC Data Filtering with Near-Storage Processing
by: Batsoyol, Narangerelt, et al.
Published: (2025)
by: Batsoyol, Narangerelt, et al.
Published: (2025)
Logarithmic-Time Geodesically Convex Decomposition in Programmable Matter
by: Hillebrandt, Henning, et al.
Published: (2026)
by: Hillebrandt, Henning, et al.
Published: (2026)
Justin: Hybrid CPU/Memory Elastic Scaling for Distributed Stream Processing
by: Schmitz, Donatien, et al.
Published: (2025)
by: Schmitz, Donatien, et al.
Published: (2025)
Similar Items
-
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025) -
Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2024) -
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025) -
CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis
by: Morgado, José, et al.
Published: (2026) -
TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs
by: Graça, Miguel, et al.
Published: (2026)