:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ni, Xiaobing, Ge, Mengke, Ruan, Jiaheng, Chen, Song, Kang, Yi
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2412.11021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference
by: Liu, Di, et al.
Published: (2026)

Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism
by: Qing, Yuhao, et al.
Published: (2025)

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026)

Enhancing ASIC Technology Mapping via Parallel Supergate Computing
by: Cai, Ye, et al.
Published: (2024)

Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
by: Chen, Fahao, et al.
Published: (2024)

Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
by: Ranawaka, Isuru, et al.
Published: (2024)

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
by: Zhou, Qihui, et al.
Published: (2025)

Mapping Gemma3 onto an Edge Dataflow Architecture
by: Du, Shouyu, et al.
Published: (2026)

RL over Commodity Networks: Overcoming the Bandwidth Barrier with Lossless Sparse Deltas
by: Ruan, Chaoyi, et al.
Published: (2026)

Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)

SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
by: Abubaker, Nabil, et al.
Published: (2024)

GPU Accelerated Sparse Cholesky Factorization
by: Karsavuran, M. Ozan, et al.
Published: (2024)

Is Sparse Matrix Reordering Effective for Sparse Matrix-Vector Multiplication?
by: Asudeh, Omid, et al.
Published: (2025)

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
by: Wang, Haoxuan, et al.
Published: (2026)

PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
by: Yang, Xiang, et al.
Published: (2022)

Wireless MapReduce Arrays for Coded Distributed Computing
by: Peter, Elizabath, et al.
Published: (2024)

Improving Locality in Sparse and Dense Matrix Multiplications
by: Dezfuli, Mohammad Mahdi Salehi, et al.
Published: (2024)

Cooperative Inference with Interleaved Operator Partitioning for CNNs
by: Liu, Zhibang, et al.
Published: (2024)

Sparse Checkpointing for Fast and Reliable MoE Training
by: Gandhi, Swapnil, et al.
Published: (2024)

Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
by: Brock, Benjamin, et al.
Published: (2023)

A More Scalable Sparse Dynamic Data Exchange
by: Geyko, Andrew, et al.
Published: (2023)

Transformer-Based Sparse CSI Estimation for Non-Stationary Channels
by: Mohsin, Muhammad Ahmed, et al.
Published: (2025)

PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026)

Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3
by: Shah, Milan, et al.
Published: (2026)

Selection of Supervised Learning-based Sparse Matrix Reordering Algorithms
by: Tang, Tao, et al.
Published: (2025)

Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
by: Qian, Matthew, et al.
Published: (2026)

LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME Architectures
by: Lei, Kelun, et al.
Published: (2025)

Opt-GPTQ: An Optimized GPTQ Combining Sparse Attention and Quantization Techniques
by: Kong, Jie, et al.
Published: (2025)

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
by: Li, Shiju, et al.
Published: (2025)

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
by: Deng, Xiaoge, et al.
Published: (2021)

Belief Propagation Converges to Gaussian Distributions in Sparsely-Connected Factor Graphs
by: Yates, Tom, et al.
Published: (2026)

A Structure-Aware Irregular Blocking Method for Sparse LU Factorization
by: Hu, Zhen, et al.
Published: (2025)

SpArch: Efficient Architecture for Sparse Matrix Multiplication
by: Zhang, Zhekai, et al.
Published: (2020)

AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)

AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)

FlashSketch: Sketch-Kernel Co-Design for Fast Sparse Sketching on GPUs
by: Dwaraknath, Rajat Vadiraj, et al.
Published: (2026)

SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
by: GU, Qiqi, et al.
Published: (2025)

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
by: Bellavita, Julian, et al.
Published: (2025)

MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
by: Wolfson-Pou, Jordi, et al.
Published: (2025)