Saved in:
| Main Authors: | Ni, Xiaobing, Ge, Mengke, Ruan, Jiaheng, Chen, Song, Kang, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.11021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference
by: Liu, Di, et al.
Published: (2026)
by: Liu, Di, et al.
Published: (2026)
Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism
by: Qing, Yuhao, et al.
Published: (2025)
by: Qing, Yuhao, et al.
Published: (2025)
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026)
by: Liu, Jie, et al.
Published: (2026)
Enhancing ASIC Technology Mapping via Parallel Supergate Computing
by: Cai, Ye, et al.
Published: (2024)
by: Cai, Ye, et al.
Published: (2024)
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
by: Chen, Fahao, et al.
Published: (2024)
by: Chen, Fahao, et al.
Published: (2024)
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication
by: Ranawaka, Isuru, et al.
Published: (2024)
by: Ranawaka, Isuru, et al.
Published: (2024)
SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
by: Zhou, Qihui, et al.
Published: (2025)
by: Zhou, Qihui, et al.
Published: (2025)
Mapping Gemma3 onto an Edge Dataflow Architecture
by: Du, Shouyu, et al.
Published: (2026)
by: Du, Shouyu, et al.
Published: (2026)
RL over Commodity Networks: Overcoming the Bandwidth Barrier with Lossless Sparse Deltas
by: Ruan, Chaoyi, et al.
Published: (2026)
by: Ruan, Chaoyi, et al.
Published: (2026)
Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
by: Abubaker, Nabil, et al.
Published: (2024)
by: Abubaker, Nabil, et al.
Published: (2024)
GPU Accelerated Sparse Cholesky Factorization
by: Karsavuran, M. Ozan, et al.
Published: (2024)
by: Karsavuran, M. Ozan, et al.
Published: (2024)
Is Sparse Matrix Reordering Effective for Sparse Matrix-Vector Multiplication?
by: Asudeh, Omid, et al.
Published: (2025)
by: Asudeh, Omid, et al.
Published: (2025)
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
by: Wang, Haoxuan, et al.
Published: (2026)
by: Wang, Haoxuan, et al.
Published: (2026)
PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
by: Yang, Xiang, et al.
Published: (2022)
by: Yang, Xiang, et al.
Published: (2022)
Wireless MapReduce Arrays for Coded Distributed Computing
by: Peter, Elizabath, et al.
Published: (2024)
by: Peter, Elizabath, et al.
Published: (2024)
Improving Locality in Sparse and Dense Matrix Multiplications
by: Dezfuli, Mohammad Mahdi Salehi, et al.
Published: (2024)
by: Dezfuli, Mohammad Mahdi Salehi, et al.
Published: (2024)
Cooperative Inference with Interleaved Operator Partitioning for CNNs
by: Liu, Zhibang, et al.
Published: (2024)
by: Liu, Zhibang, et al.
Published: (2024)
Sparse Checkpointing for Fast and Reliable MoE Training
by: Gandhi, Swapnil, et al.
Published: (2024)
by: Gandhi, Swapnil, et al.
Published: (2024)
Accelerating Sparse MTTKRP for Small Tensor Decomposition on GPU
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
by: Brock, Benjamin, et al.
Published: (2023)
by: Brock, Benjamin, et al.
Published: (2023)
A More Scalable Sparse Dynamic Data Exchange
by: Geyko, Andrew, et al.
Published: (2023)
by: Geyko, Andrew, et al.
Published: (2023)
Transformer-Based Sparse CSI Estimation for Non-Stationary Channels
by: Mohsin, Muhammad Ahmed, et al.
Published: (2025)
by: Mohsin, Muhammad Ahmed, et al.
Published: (2025)
PRISM: Processing-In-Memory Sparse MTTKRP for Tensor Decomposition Acceleration
by: Pacheco, Daniel, et al.
Published: (2026)
by: Pacheco, Daniel, et al.
Published: (2026)
Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3
by: Shah, Milan, et al.
Published: (2026)
by: Shah, Milan, et al.
Published: (2026)
Selection of Supervised Learning-based Sparse Matrix Reordering Algorithms
by: Tang, Tao, et al.
Published: (2025)
by: Tang, Tao, et al.
Published: (2025)
Sparsity-Aware Roofline Models for Sparse Matrix-Matrix Multiplication
by: Qian, Matthew, et al.
Published: (2026)
by: Qian, Matthew, et al.
Published: (2026)
LOw-cOst yet High-Performant Sparse Matrix-Matrix Multiplication on Arm SME Architectures
by: Lei, Kelun, et al.
Published: (2025)
by: Lei, Kelun, et al.
Published: (2025)
Opt-GPTQ: An Optimized GPTQ Combining Sparse Attention and Quantization Techniques
by: Kong, Jie, et al.
Published: (2025)
by: Kong, Jie, et al.
Published: (2025)
Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
by: Li, Shiju, et al.
Published: (2025)
by: Li, Shiju, et al.
Published: (2025)
Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
by: Deng, Xiaoge, et al.
Published: (2021)
by: Deng, Xiaoge, et al.
Published: (2021)
Belief Propagation Converges to Gaussian Distributions in Sparsely-Connected Factor Graphs
by: Yates, Tom, et al.
Published: (2026)
by: Yates, Tom, et al.
Published: (2026)
A Structure-Aware Irregular Blocking Method for Sparse LU Factorization
by: Hu, Zhen, et al.
Published: (2025)
by: Hu, Zhen, et al.
Published: (2025)
SpArch: Efficient Architecture for Sparse Matrix Multiplication
by: Zhang, Zhekai, et al.
Published: (2020)
by: Zhang, Zhekai, et al.
Published: (2020)
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)
by: Park, Seongyeon, et al.
Published: (2024)
AMPED: Accelerating MTTKRP for Billion-Scale Sparse Tensor Decomposition on Multiple GPUs
by: Wijeratne, Sasindu, et al.
Published: (2025)
by: Wijeratne, Sasindu, et al.
Published: (2025)
FlashSketch: Sketch-Kernel Co-Design for Fast Sparse Sketching on GPUs
by: Dwaraknath, Rajat Vadiraj, et al.
Published: (2026)
by: Dwaraknath, Rajat Vadiraj, et al.
Published: (2026)
SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
by: GU, Qiqi, et al.
Published: (2025)
by: GU, Qiqi, et al.
Published: (2025)
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra
by: Bellavita, Julian, et al.
Published: (2025)
by: Bellavita, Julian, et al.
Published: (2025)
MAGNUS: Generating Data Locality to Accelerate Sparse Matrix-Matrix Multiplication on CPUs
by: Wolfson-Pou, Jordi, et al.
Published: (2025)
by: Wolfson-Pou, Jordi, et al.
Published: (2025)
Similar Items
-
AB-Sparse: Sparse Attention with Adaptive Block Size for Accurate and Efficient Long-Context Inference
by: Liu, Di, et al.
Published: (2026) -
Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism
by: Qing, Yuhao, et al.
Published: (2025) -
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026) -
Enhancing ASIC Technology Mapping via Parallel Supergate Computing
by: Cai, Ye, et al.
Published: (2024) -
Communication-Efficient Sparsely-Activated Model Training via Sequence Migration and Token Condensation
by: Chen, Fahao, et al.
Published: (2024)