:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jacobson, John, Burtscher, Martin, Gopalakrishnan, Ganesh
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2401.04701
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SIMT/GPU Data Race Verification using ISCC and Intermediary Code Representations: A Case Study
by: Osterhout, Andrew, et al.
Published: (2025)

Fast Topology-Aware Lossy Data Compression with Full Preservation of Critical Points and Local Order
by: Fallin, Alex, et al.
Published: (2026)

Lessons Learned on the Path to Guaranteeing the Error Bound in Lossy Quantizers
by: Fallin, Alex, et al.
Published: (2024)

A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates
by: Mao, Zirui, et al.
Published: (2023)

HiCR, an Abstract Model for Distributed Heterogeneous Programming
by: Martin, Sergio Miguel, et al.
Published: (2025)

Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems
by: Knorr, Fabian, et al.
Published: (2025)

HMTRace: Hardware-Assisted Memory-Tagging based Dynamic Data Race Detection
by: Shastri, Jaidev, et al.
Published: (2024)

Data Race Satisfiability on Array Elements
by: Shim, Junhyung, et al.
Published: (2025)

FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing
by: Agarwal, Aarush, et al.
Published: (2025)

Understanding GPU Resource Interference One Level Deeper
by: Elvinger, Paul, et al.
Published: (2025)

Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)

VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
by: He, Zijian, et al.
Published: (2026)

TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data
by: Agarwal, Tripti, et al.
Published: (2026)

Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures
by: Shan, Baodi, et al.
Published: (2024)

cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)

TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
by: Wu, Shixun, et al.
Published: (2024)

FastTrack: GPU-Accelerated Tracking for Visual SLAM
by: Khabiri, Kimia, et al.
Published: (2025)

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
by: Yi, Xinyao
Published: (2024)

GPU Programming for AI Workflow Development on AWS SageMaker: An Instructional Approach
by: Srinivasan, Sriram, et al.
Published: (2025)

AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)

Taking GPU Programming Models to Task for Performance Portability
by: Davis, Joshua H., et al.
Published: (2024)

HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data
by: Agarwal, Tripti, et al.
Published: (2024)

HiCCL: A Hierarchical Collective Communication Library
by: Hidayetoglu, Mert, et al.
Published: (2024)

Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration
by: Wang, Tianyu, et al.
Published: (2024)

Cppless: Single-Source and High-Performance Serverless Programming in C++
by: Copik, Marcin, et al.
Published: (2024)

HiCoCS: High Concurrency Cross-Sharding on Permissioned Blockchains
by: Yang, Lingxiao, et al.
Published: (2025)

What Operations can be Performed Directly on Compressed Arrays, and with What Error?
by: Agarwal, Tripti, et al.
Published: (2024)

Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)

Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
by: Shi, Jiabo, et al.
Published: (2025)

Accelerating Biclique Counting on GPU
by: Qiu, Linshan, et al.
Published: (2024)

GPU Sharing with Triples Mode
by: Byun, Chansup, et al.
Published: (2024)

ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
by: Lee, Munkyu, et al.
Published: (2024)

Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs
by: Sojoodi, Amirhossein, et al.
Published: (2026)

GPU Accelerated Sparse Cholesky Factorization
by: Karsavuran, M. Ozan, et al.
Published: (2024)

Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)

DuaLip-GPU Technical Report
by: Dexter, Gregory, et al.
Published: (2026)

Incidence Constraints in Hypergraph Partitioning on GPU
by: Ronzani, Marco, et al.
Published: (2026)

Predictable LLM Serving on GPU Clusters
by: Darzi, Erfan, et al.
Published: (2025)

CheckMate: LLM-Powered Approximate Intermittent Computing
by: Sayyid-Ali, Abdur-Rahman Ibrahim, et al.
Published: (2024)