:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pandey, Santosh, Wang, Zhibin, Zhong, Sheng, Tian, Chen, Zheng, Bolong, Li, Xiaoye, Li, Lingda, Hoisie, Adolfy, Ding, Caiwen, Li, Dong, Liu, Hang
Format:	Preprint
Published:	2021
Subjects:	Distributed, Parallel, and Cluster Computing Social and Information Networks
Online Access:	https://arxiv.org/abs/2103.08053
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs
by: Xie, Xi, et al.
Published: (2024)

Managing Multi Instance GPUs for High Throughput and Energy Savings
by: Saraha, Abhijeet, et al.
Published: (2025)

FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression
by: Mechels, Ben, et al.
Published: (2026)

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
by: Peng, Hongwu, et al.
Published: (2023)

Agent-Based Triangle Counting: Unlocking Truss Decomposition, Triangle Centrality, and Local Clustering Coefficient
by: Chand, Prabhat Kumar, et al.
Published: (2024)

cuSZ-$i$: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation
by: Liu, Jinyang, et al.
Published: (2023)

Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)

Adaptive Cache Management for Complex Storage Systems Using CNN-LSTM-Based Spatiotemporal Prediction
by: Wang, Xiaoye, et al.
Published: (2024)

Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs
by: Wang, Pinhuan, et al.
Published: (2025)

ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs
by: Zhang, Lixing, et al.
Published: (2026)

How to Rent GPUs on a Budget
by: Li, Zhouzi, et al.
Published: (2024)

Scalable Graph Indexing using GPUs for Approximate Nearest Neighbor Search
by: Li, Zhonggen, et al.
Published: (2025)

RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs
by: Li, Aiying, et al.
Published: (2026)

HP-MDR: High-performance and Portable Data Refactoring and Progressive Retrieval with Advanced GPUs
by: Li, Yanliang, et al.
Published: (2025)

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs
by: Chen, Aodong, et al.
Published: (2023)

APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
by: Fan, Jiakun, et al.
Published: (2025)

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
by: Li, Shiju, et al.
Published: (2025)

Distributed Triangle Enumeration in Hypergraphs
by: Adamson, Duncan, et al.
Published: (2026)

BOA Constrictor: Squeezing Performance out of GPUs in the Cloud via Budget-Optimal Allocation
by: Li, Zhouzi, et al.
Published: (2026)

Improved Massively Parallel Triangle Counting in $O(1)$ Rounds
by: Liu, Quanquan C., et al.
Published: (2024)

AI Surrogate Model for Distributed Computing Workloads
by: Park, David K., et al.
Published: (2024)

Alternative Mixed Integer Linear Programming Optimization for Joint Job Scheduling and Data Allocation in Grid Computing
by: Feng, Shengyu, et al.
Published: (2025)

An Adaptive Distributed Stencil Abstraction for GPUs
by: Bhosale, Aditya, et al.
Published: (2025)

Accelerating Maximal Biclique Enumeration on GPUs
by: Hsieh, Chou-Ying, et al.
Published: (2024)

Parallelizing Maximal Clique Enumeration on GPUs
by: Almasri, Mohammad, et al.
Published: (2022)

Optimizing sDTW for AMD GPUs
by: Latta-Lin, Daniel, et al.
Published: (2024)

Accelerating Biclique Counting on GPU
by: Qiu, Linshan, et al.
Published: (2024)

Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding
by: Wang, Zhibin, et al.
Published: (2025)

TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs
by: Wu, Shixun, et al.
Published: (2024)

STAR: Decode-Phase Rescheduling for LLM Inference
by: Wang, Zhibin, et al.
Published: (2025)

FlashMP: Fast Discrete Transform-Based Solver for Preconditioning Maxwell's Equations on GPUs
by: Zhang, Haoyuan, et al.
Published: (2025)

Serving Compound Inference Systems on Datacenter GPUs
by: Devata, Sriram, et al.
Published: (2026)

Fast Kronecker Matrix-Matrix Multiplication on GPUs
by: Jangda, Abhinav, et al.
Published: (2024)

Optimal Workload Placement on Multi-Instance GPUs
by: Turkkan, Bekir, et al.
Published: (2024)

ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL
by: Gao, Wei, et al.
Published: (2026)

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
by: Chang, Li-Wen, et al.
Published: (2024)

Dynamic Scheduling Strategies for Resource Optimization in Computing Environments
by: Wang, Xiaoye
Published: (2024)

Straggler Tolerant and Resilient DL Training on Homogeneous GPUs
by: Zhang, Zeyu, et al.
Published: (2025)

RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
by: Brock, Benjamin, et al.
Published: (2023)

Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs
by: Plesner, Andreas, et al.
Published: (2024)